Decoupling music notation to improve end-to-end Optical Music Recognition

被引:5
|
作者
Alfaro-Contreras, Maria [1 ]
Rios-Vila, Antonio [1 ]
Valero-Mas, Jose J. [1 ]
Inesta, Jose M. [1 ]
Calvo-Zaragoza, Jorge [1 ]
机构
[1] Univ Alicante, Inst Univ Invest Informat, Ap 99, E-03080 Alicante, Spain
关键词
Optical music recognition; Deep learning; Connectionist temporal classification; Sequence labeling;
D O I
10.1016/j.patrec.2022.04.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inspired by the Text Recognition field, end-to-end schemes based on Convolutional Recurrent Neural Networks (CRNN) trained with the Connectionist Temporal Classification (CTC) loss function are considered one of the current state-of-the-art techniques for staff-level Optical Music Recognition (OMR). Unlike text symbols, music-notation elements may be defined as a combination of (i) a shape primitive located in (ii) a certain position in a staff. However, this double nature is generally neglected in the learning process, as each combination is treated as a single token. In this work, we study whether exploiting such particularity of music notation actually benefits the recognition performance and, if so, which approach is the most appropriate. For that, we thoroughly review existing specific approaches that explore this premise and propose different combinations of them. Furthermore, considering the limitations observed in such approaches, a novel decoding strategy specifically designed for OMR is proposed. The results obtained with four different corpora of historical manuscripts show the relevance of leveraging this double nature of music notation since it outperforms the standard approaches where it is ignored. In addition, the proposed decoding leads to significant reductions in the error rates with respect to the other cases.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
引用
收藏
页码:157 / 163
页数:7
相关论文
共 50 条
  • [1] End-to-end optical music recognition for pianoform sheet music
    Rios-Vila, Antonio
    Rizo, David
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (03) : 347 - 362
  • [2] End-to-end optical music recognition for pianoform sheet music
    Antonio Ríos-Vila
    David Rizo
    José M. Iñesta
    Jorge Calvo-Zaragoza
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 347 - 362
  • [3] Data Augmentation for End-to-End Optical Music Recognition
    Lopez-Gutierrez, Juan C.
    Valero-Mas, Jose J.
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 59 - 73
  • [4] On the Use of Transformers for End-to-End Optical Music Recognition
    Rios-Vila, Antonio
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 470 - 481
  • [5] End-to-End Neural Optical Music Recognition of Monophonic Scores
    Calvo-Zaragoza, Jorge
    Rizo, David
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (04):
  • [6] Approaching End-to-End Optical Music Recognition for Homophonic Scores
    Alfaro-Contreras, Maria
    Calvo-Zaragoza, Jorge
    Inesta, Jose M.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 : 147 - 158
  • [7] Exploring the two-dimensional nature of music notation for score recognition with end-to-end approaches
    Rios-Vila, Antonio
    Calvo-Zaragoza, Jorge
    Inesta, Jose M.
    [J]. 2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 193 - 198
  • [8] Residual Recurrent CRNN for End-to-End Optical Music Recognition on Monophonic Scores
    Liu, Aozhi
    Zhang, Lipei
    Mei, Yaqi
    Han, Baoqiang
    Cai, Zifeng
    Zhu, Zhaohua
    Xiao, Jing
    [J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 23 - 27
  • [9] End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization
    He, Ruichen
    Yao, Junfeng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 400 - 411
  • [10] End-to-end Music-mixed Speech Recognition
    Woo, Jeongwoo
    Mimura, Masato
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 800 - 804