On the Use of Transformers for End-to-End Optical Music Recognition

被引:9
|
作者
Rios-Vila, Antonio [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, UI Comp Res, Alicante, Spain
关键词
Optical Music Recognition; Transformers; Connectionist Temporal Classification; Image-to-sequence;
D O I
10.1007/978-3-031-04881-4_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.
引用
收藏
页码:470 / 481
页数:12
相关论文
共 50 条
  • [21] End-to-End Human Pose and Mesh Reconstruction with Transformers
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1954 - 1963
  • [22] Chasing Sparsity in Vision Transformers: An End-to-End Exploration
    Chen, Tianlong
    Cheng, Yu
    Gan, Zhe
    Yuan, Lu
    Zhang, Lei
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] End-to-End diagnosis of breast biopsy images with transformers
    Mehta, Sachin
    Lu, Ximing
    Wu, Wenjun
    Weaver, Donald
    Hajishirzi, Hannaneh
    Elmore, Joann G.
    Shapiro, Linda G.
    MEDICAL IMAGE ANALYSIS, 2022, 79
  • [24] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [25] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
    Rui, Yang
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [26] Towards End-to-End Image Compression and Analysis with Transformers
    Bai, Yuanchao
    Yang, Xu
    Liu, Xianming
    Jiang, Junjun
    Wang, Yaowei
    Ji, Xiangyang
    Gao, Wen
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 104 - 112
  • [27] TRANSBUILDING: AN END-TO-END POLYGONAL BUILDING EXTRACTION WITH TRANSFORMERS
    Zhang, Mingming
    Liu, Qingjie
    Wang, Wei
    Wang, Yunhong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 460 - 464
  • [28] REGTR: End-to-end Point Cloud Correspondences with Transformers
    Yew, Zi Jian
    Lee, Gim Hee
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6667 - 6676
  • [29] VPDETR: End-to-End Vanishing Point DEtection TRansformers
    Chen, Taiyan
    Ying, Xianghua
    Yang, Jinfa
    Wang, Ruibin
    Guo, Ruohao
    Xing, Bowei
    Shi, Ji
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1192 - 1200
  • [30] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 2140 - 2144