On the Use of Transformers for End-to-End Optical Music Recognition

被引:9
|
作者
Rios-Vila, Antonio [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, UI Comp Res, Alicante, Spain
关键词
Optical Music Recognition; Transformers; Connectionist Temporal Classification; Image-to-sequence;
D O I
10.1007/978-3-031-04881-4_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art end-to-end Optical Music Recognition (OMR) systems use Recurrent Neural Networks to produce music transcriptions, as these models retrieve a sequence of symbols from an input staff image. However, recent advances in Deep Learning have led other research fields that process sequential data to use a new neural architecture: the Transformer, whose popularity has increased over time. In this paper, we study the application of the Transformer model to the end-to-end OMR systems. We produced several models based on all the existing approaches in this field and tested them on various corpora with different types of encodings for the output. The obtained results allow us to make an in-depth analysis of the advantages and disadvantages of applying this architecture to these systems. This discussion leads us to conclude that Transformers, as they were conceived, do not seem to be appropriate to perform end-to-end OMR, so this paper raises interesting lines of future research to get the full potential of this architecture in this field.
引用
收藏
页码:470 / 481
页数:12
相关论文
共 50 条
  • [41] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Xu, Zhanpeng
    Li, Jianhua
    Yang, Zhaopeng
    Li, Shiliang
    Li, Honglin
    JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [42] Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    INTERSPEECH 2021, 2021, : 2097 - 2101
  • [43] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Zhanpeng Xu
    Jianhua Li
    Zhaopeng Yang
    Shiliang Li
    Honglin Li
    Journal of Cheminformatics, 14
  • [44] End-to-End Referring Video Object Segmentation with Multimodal Transformers
    Botach, Adam
    Zheltonozhskii, Evgenii
    Baskin, Chaim
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
  • [45] End-to-End Human-Gaze-Target Detection with Transformers
    Tu, Danyang
    Min, Xiongkuo
    Duan, Huiyu
    Guo, Guodong
    Zhai, Guangtao
    Shen, Wei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2192 - 2200
  • [46] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
  • [47] SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Ahmed, Faisal
    Gan, Zhe
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17928 - 17937
  • [48] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [49] Deeply Tensor Compressed Transformers for End-to-End Object Detection
    Zhen, Peining
    Gao, Ziyang
    Hou, Tianshu
    Cheng, Yuan
    Chen, Hai-Bao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4716 - 4724
  • [50] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647