

Such digitization not only improves music heritage preservation and dissemination , but it also enables the use of computer-based tools which allow indexing, analysis, and retrieval, among many other tasks . Our initial premise is, therefore, validated, thus opening avenues for further research in multimodal OMR-AMT transcription.īringing music sources into a structured digital representation, typically known as transcription, remains as one of the key, yet challenging, tasks in the Music Information Retrieval (MIR) field . In general, the multimodal framework clearly outperforms the single recognition modalities, attaining a relative improvement close to \(40\%\) in the best case. We assess several experimental scenarios with monophonic music pieces to evaluate our approach under different conditions of the individual transcription systems. To evaluate this hypothesis, this paper presents a multimodal framework that combines the predictions from two neural end-to-end OMR and AMT systems by considering a local alignment approach. While these fields have traditionally evolved independently, the fact that both tasks may share the same output representation poses the question of whether they could be combined in a synergistic manner to exploit the individual transcription advantages depicted by each modality. Optical Music Recognition (OMR) and Automatic Music Transcription (AMT) stand for the research fields that aim at obtaining a structured digital representation from sheet music images and acoustic recordings, respectively.
