Non-intrusive method for audio quality assessment of lossy-compressed music recordings using convolutional neural networks

被引:0
|
作者
Kasperuk, Aleksandra [1 ]
Zielinski, Slawomir Krzysztof [1 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Bialystok, Poland
关键词
- objective audio quality assessment; non-intrusive audio quality evaluation; convolutional neural networks; MODEL;
D O I
10.24425/ijet.2024.149549
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
of the existing algorithms for the objective audio quality assessment are intrusive, as they require access both to an unimpaired reference recording and an evaluated signal. This feature excludes them from many practical applications. In this paper, we introduce a non-intrusive audio quality assessment method. The proposed method is intended to account for audio artefacts arising from the lossy compression of music signals. During its development, 250 high-quality uncompressed music recordings were collated. They were subsequently processed using the selection of five popular audio codecs, resulting in the repository of 13,000 audio excerpts representing various levels of audio quality. The proposed non-intrusive method was trained with the data obtained employing a well-established intrusive model (ViSQOL v3). Next, the performance of the trained model was evaluated utilizing the quality scores obtained in the subjective listening tests undertaken remotely over the Internet. The listening tests were carried out in compliance with the MUSHRA recommendation (ITU-R BS.1534-3). In this study, the following three convolutional neural networks were compared: (1) a model employing 1D convolutional filters, (2) an Inception-based model, and (3) a VGG-based model. The last-mentioned model outperformed the model employing 1D convolutional filters in terms of predicting the scores from the listening tests, reaching framework, recently introduced by Mumtaz et al. (2022).
引用
收藏
页码:331 / 339
页数:9
相关论文
共 50 条
  • [21] A non-intrusive efficiency estimation method for in-service induction motors using neural networks
    Kargar, A.
    Engineering Intelligent Systems, 2008, 16 (04): : 215 - 219
  • [22] Neural network-based non-intrusive speech quality assessment using attention pooling function
    Miao Liu
    Jing Wang
    Weiming Yi
    Fang Liu
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [23] Neural network-based non-intrusive speech quality assessment using attention pooling function
    Liu, Miao
    Wang, Jing
    Yi, Weiming
    Liu, Fang
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [24] Non-intrusive reduced order modeling of nonlinear problems using neural networks
    Hesthaven, J. S.
    Ubbiali, S.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 363 : 55 - 78
  • [25] CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
    Liu, Yuchen
    Yang, Li-Chia
    Pawlicki, Alex
    Stamenovic, Marko
    INTERSPEECH 2022, 2022, : 3318 - 3322
  • [26] Parametric-based non-intrusive speech quality assessment by deep neural network
    Yang, Haemin
    Byun, Kyungguen
    Kang, Hong-Goo
    Kwak, Youngsu
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 99 - 103
  • [27] A vehicle classification system based on a non-intrusive sensor's binary image and convolutional neural networks
    Barreyro, Joaquin
    Yoshioka, Leopoldo Rideki
    Marte, Claudio Luiz
    2021 14TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRY APPLICATIONS (INDUSCON), 2021, : 813 - 819
  • [28] NON-INTRUSIVE SPEECH QUALITY ASSESSMENT FOR SUPER-WIDEBAND SPEECH COMMUNICATION NETWORKS
    Mittag, Gabriel
    Moeller, Sebastian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7125 - 7129
  • [29] Non-intrusive speech quality assessment using several combinations of auditory features
    Dubey, Rajesh
    Kumar, Arun
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2013, 16 (01) : 89 - 101
  • [30] NSQM: A non-intrusive assessment of speech quality using normalized energies of the neurogram
    Jassim, Wissam A.
    Zilany, Muhammad S.
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 260 - 279