End-to-end optical music recognition for pianoform sheet music

被引:5
|
作者
Rios-Vila, Antonio [1 ]
Rizo, David [1 ,2 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, UI Comp Res, Alicante, Spain
[2] Inst Super Ensenanzas Artist Comun Valenciana ISEA, Alicante, Spain
关键词
Optical music recognition; Polyphonic music scores; GrandStaff; Neural networks; REMOVAL; NETWORK; IMAGE;
D O I
10.1007/s10032-023-00432-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-end solutions have brought about significant advances in the field of Optical Music Recognition. These approaches directly provide the symbolic representation of a given image of a musical score. Despite this, several documents, such as pianoform musical scores, cannot yet benefit from these solutions since their structural complexity does not allow their effective transcription. This paper presents a neural method whose objective is to transcribe these musical scores in an end-to-end fashion. We also introduce the GrandStaff dataset, which contains 53,882 single-system piano scores in common western modern notation. The sources are encoded in both a standard digital music representation and its adaptation for current transcription technologies. The method proposed in this paper is trained and evaluated using this dataset. The results show that the approach presented is, for the first time, able to effectively transcribe pianoform notation in an end-to-end manner.
引用
收藏
页码:347 / 362
页数:16
相关论文
共 50 条
  • [41] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [42] End-to-end trust starts with recognition
    Seigneur, JM
    Farrell, S
    Jensen, CD
    Gray, E
    Chen, Y
    [J]. SECURITY IN PERVASIVE COMPUTING, 2004, 2802 : 130 - 142
  • [43] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [44] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Xu, Zhanpeng
    Li, Jianhua
    Yang, Zhaopeng
    Li, Shiliang
    Li, Honglin
    [J]. JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [45] In the End, Its Music
    Hyun-goo, Lee
    [J]. SPACE, 2011, (527): : 112 - 112
  • [46] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Zhanpeng Xu
    Jianhua Li
    Zhaopeng Yang
    Shiliang Li
    Honglin Li
    [J]. Journal of Cheminformatics, 14
  • [47] The End of New Music
    von Blumroeder, Christoph
    [J]. MUSIKFORSCHUNG, 2019, 72 (03): : 201 - 213
  • [48] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [49] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    [J]. 2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [50] End-to-End Detection and Recognition of Arithmetic Expressions
    Wan, Jiangpeng
    Zhao, Mengbiao
    Yin, Fei
    Zhang, Xu-Yao
    Huang, LinLin
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 505 - 517