On the Use of Transformers for End-to-End Optical Music Recognition

被引:9
|
作者
Rios-Vila, Antonio [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, UI Comp Res, Alicante, Spain
关键词
Optical Music Recognition; Transformers; Connectionist Temporal Classification; Image-to-sequence;
D O I
10.1007/978-3-031-04881-4_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.
引用
收藏
页码:470 / 481
页数:12
相关论文
共 50 条
  • [1] End-to-end optical music recognition for pianoform sheet music
    Rios-Vila, Antonio
    Rizo, David
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (03) : 347 - 362
  • [2] Practical End-to-End Optical Music Recognition for Pianoform Music
    Mayer, Jiri
    Straka, Milan
    Hajic, Jan
    Pecina, Pavel
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 55 - 73
  • [3] End-to-end optical music recognition for pianoform sheet music
    Antonio Ríos-Vila
    David Rizo
    José M. Iñesta
    Jorge Calvo-Zaragoza
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 347 - 362
  • [4] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
  • [5] Decoupling music notation to improve end-to-end Optical Music Recognition
    Alfaro-Contreras, Maria
    Rios-Vila, Antonio
    Valero-Mas, Jose J.
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    PATTERN RECOGNITION LETTERS, 2022, 158 : 157 - 163
  • [6] End-to-End Neural Optical Music Recognition of Monophonic Scores
    Calvo-Zaragoza, Jorge
    Rizo, David
    APPLIED SCIENCES-BASEL, 2018, 8 (04):
  • [7] Approaching End-to-End Optical Music Recognition for Homophonic Scores
    Alfaro-Contreras, Maria
    Calvo-Zaragoza, Jorge
    Inesta, Jose M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 : 147 - 158
  • [8] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [9] Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription
    Rios-Vila, Antonio
    Calvo-Zaragoza, Jorge
    Paquet, Thierry
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 20 - 37
  • [10] End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization
    He, Ruichen
    Yao, Junfeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 400 - 411