End-to-End Neural Optical Music Recognition of Monophonic Scores

被引:37
|
作者
Calvo-Zaragoza, Jorge [1 ,2 ]
Rizo, David [3 ,4 ]
机构
[1] McGill Univ, Schulich Sch Mus, Montreal, PQ H3A 1E3, Canada
[2] Univ Politecn Valencia, PRHLT Res Ctr, E-46022 Valencia, Spain
[3] Inst Super Ensenanzas Artist, Alicante 03690, Spain
[4] Univ Alicante, Dept Lenguajes & Sistemas Informat, Alicante 03690, Spain
来源
APPLIED SCIENCES-BASEL | 2018年 / 8卷 / 04期
关键词
Optical Music Recognition; end-to-end recognition; Deep Learning; music score images; REMOVAL;
D O I
10.3390/app8040606
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Images of Music Staves (PrIMuS) dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Residual Recurrent CRNN for End-to-End Optical Music Recognition on Monophonic Scores
    Liu, Aozhi
    Zhang, Lipei
    Mei, Yaqi
    Han, Baoqiang
    Cai, Zifeng
    Zhu, Zhaohua
    Xiao, Jing
    [J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 23 - 27
  • [2] Approaching End-to-End Optical Music Recognition for Homophonic Scores
    Alfaro-Contreras, Maria
    Calvo-Zaragoza, Jorge
    Inesta, Jose M.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 : 147 - 158
  • [3] End-to-end optical music recognition for pianoform sheet music
    Rios-Vila, Antonio
    Rizo, David
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (03) : 347 - 362
  • [4] End-to-end optical music recognition for pianoform sheet music
    Antonio Ríos-Vila
    David Rizo
    José M. Iñesta
    Jorge Calvo-Zaragoza
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 347 - 362
  • [5] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
  • [6] On the Use of Transformers for End-to-End Optical Music Recognition
    Rios-Vila, Antonio
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 470 - 481
  • [7] Decoupling music notation to improve end-to-end Optical Music Recognition
    Alfaro-Contreras, Maria
    Rios-Vila, Antonio
    Valero-Mas, Jose J.
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    [J]. PATTERN RECOGNITION LETTERS, 2022, 158 : 157 - 163
  • [8] End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization
    He, Ruichen
    Yao, Junfeng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 400 - 411
  • [9] End-to-end Music-mixed Speech Recognition
    Woo, Jeongwoo
    Mimura, Masato
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 800 - 804
  • [10] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    [J]. INTERSPEECH 2021, 2021, : 4079 - 4083