Deep Learning Approaches for Sung Vowel Classification

被引:0
|
作者
Carlson, Parker [1 ,2 ]
Donnelly, Patrick J. [2 ]
机构
[1] UC Santa Barbara, Santa Barbara, CA 93106 USA
[2] Oregon State Univ, Corvallis, OR 97331 USA
关键词
Sung Vowels; Phoneme Classification; Raw Audio; Automatic Speech Recognition; CNN; LSTM; Transformer; VocalSet; FORMANT; FEATURES;
D O I
10.1007/978-3-031-56992-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phoneme classification is an important part of automatic speech recognition systems. However, attempting to classify phonemes during singing has been significantly less studied. In this work, we investigate sung vowel classification, a subset of the phoneme classification problem. Many prior approaches that attempt to classify spoken or sung vowels rely upon spectral feature extraction, such as formants or Melfrequency cepstral coefficients. We explore classifying sung vowels with deep neural networks trained directly on raw audio. Using VocalSet, a singing voice dataset performed by professional singers, we compare three neural models and two spectral models for classifying five sung Italian vowels performed in a variety of vocal techniques. We find that our neural models achieved accuracies between 68.4% and 79.6%, whereas our spectral models failed to discern vowels. Of the neural models, we find that a fine-tuned transformer performed the strongest; however, a convolutional or recurrent model may provide satisfactory results in resource-limited scenarios. This result implies that neural approaches trained directly on raw audio, without extracting spectral features, are viable approaches for singing phoneme classification and deserve further exploration.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [41] Deep Learning Approaches for Classification of Emotion Recognition based on Facial Expressions
    Qutub, Ahmed Adnan Hameed
    Atay, Yilmaz
    NEXO REVISTA CIENTIFICA, 2023, 36 (05): : 1 - 18
  • [42] Deep Learning Approaches for Classroom Audio Classification Using Mel Spectrograms
    Mou, Afsana
    Milanova, Mariofanna
    Baillie, Mark
    NEW APPROACHES FOR MULTIDIMENSIONAL SIGNAL PROCESSING, NAMSP 2022, 2023, 332 : 23 - 30
  • [43] Exploring deep learning approaches for Urdu text classification in product manufacturing
    Akhter, Muhammad Pervez
    Jiangbin, Zheng
    Naqvi, Irfan Raza
    Abdelmajeed, Mohammed
    Fayyaz, Muhammad
    ENTERPRISE INFORMATION SYSTEMS, 2022, 16 (02) : 223 - 248
  • [44] A study of deep learning approaches for classification and detection chromosomes in metaphase images
    Maria F. S. Andrade
    Lucas V. Dias
    Valmir Macario
    Fabiana F. Lima
    Suy F. Hwang
    Júlio C. G. Silva
    Filipe R. Cordeiro
    Machine Vision and Applications, 2020, 31
  • [45] Deep learning-based approaches for robust classification of cervical cancer
    Pacal, Ishak
    Kilicarslan, Serhat
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18813 - 18828
  • [46] Deep and Transfer Learning Approaches for Pedestrian Identification and Classification in Autonomous Vehicles
    Mounsey, Alex
    Khan, Asiya
    Sharma, Sanjay
    ELECTRONICS, 2021, 10 (24)
  • [47] A comparative analysis on question classification task based on deep learning approaches
    Zulqarnain, Muhammad
    Alsaedi, Ahmed Khalaf Zager
    Ghazali, Rozaida
    Ghouse, Muhammad Ghulam
    Sharif, Wareesa
    Husaini, Noor Aida
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [48] Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification
    Erkan, Ali
    Gungor, Tunga
    IEEE ACCESS, 2023, 11 : 134951 - 134968
  • [49] Automatic Modulation Classification: Convolutional Deep Learning Neural Networks Approaches
    Hussein, Hany S.
    Essai Ali, Mohamed Hassan
    Ismeil, Mohammed
    Shaaban, Mohamed N.
    Mohamed, Mona Lotfy
    Atallah, Hany A.
    IEEE ACCESS, 2023, 11 : 98695 - 98705
  • [50] Probabilistic and deep learning approaches for conductivity-driven nanocomposite classification
    Gazehi, Wejden
    Loukil, Rania
    Besbes, Mongi
    SCIENTIFIC REPORTS, 2025, 15 (01):