Deep Learning Approaches for Sung Vowel Classification

被引:0
|
作者
Carlson, Parker [1 ,2 ]
Donnelly, Patrick J. [2 ]
机构
[1] UC Santa Barbara, Santa Barbara, CA 93106 USA
[2] Oregon State Univ, Corvallis, OR 97331 USA
关键词
Sung Vowels; Phoneme Classification; Raw Audio; Automatic Speech Recognition; CNN; LSTM; Transformer; VocalSet; FORMANT; FEATURES;
D O I
10.1007/978-3-031-56992-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phoneme classification is an important part of automatic speech recognition systems. However, attempting to classify phonemes during singing has been significantly less studied. In this work, we investigate sung vowel classification, a subset of the phoneme classification problem. Many prior approaches that attempt to classify spoken or sung vowels rely upon spectral feature extraction, such as formants or Melfrequency cepstral coefficients. We explore classifying sung vowels with deep neural networks trained directly on raw audio. Using VocalSet, a singing voice dataset performed by professional singers, we compare three neural models and two spectral models for classifying five sung Italian vowels performed in a variety of vocal techniques. We find that our neural models achieved accuracies between 68.4% and 79.6%, whereas our spectral models failed to discern vowels. Of the neural models, we find that a fine-tuned transformer performed the strongest; however, a convolutional or recurrent model may provide satisfactory results in resource-limited scenarios. This result implies that neural approaches trained directly on raw audio, without extracting spectral features, are viable approaches for singing phoneme classification and deserve further exploration.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [21] Classification of Synchronized Brainwave Recordings using Machine Learning and Deep Learning Approaches
    Srujan, K. S.
    2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2018, : 877 - 881
  • [22] Different traces of the same sung vowel.
    Marage
    COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES, 1908, 147 : 921 - 923
  • [23] Deep Reinforcement Learning for Articulatory Synthesis in a Vowel-to-Vowel Imitation Task
    Shitov, Denis
    Pirogova, Elena
    Wysocki, Tadeusz A.
    Lech, Margaret
    SENSORS, 2023, 23 (07)
  • [24] Ensemble learning of deep learning and traditional machine learning approaches for skin lesion segmentation and classification
    Khan, Adil H.
    Iskandar, Dayang NurFatimah Awang
    Al-Asad, Jawad F.
    Mewada, Hiren
    Sherazi, Muhammad Abid
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (13):
  • [25] Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification
    Kamath, Cannannore Nidhi
    Bukhari, Syed Saqib
    Dengel, Andreas
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [26] Classification of Impact Echo Signals Using Explainable Deep Learning and Transfer Learning Approaches
    Torlapati, Rahul
    Azari, Hoda
    Shokouhi, Parisa
    TRANSPORTATION RESEARCH RECORD, 2023, 2677 (09) : 464 - 477
  • [27] Sentiment classification: Feature selection based approaches versus deep learning
    Uysal, Alper Kursat
    Murphey, Yi Lu
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2017, : 23 - 30
  • [28] Advancing diabetic retinopathy classification using ensemble deep learning approaches
    Biswas, Ankur
    Banik, Rita
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
  • [29] Deep Learning Approaches in Tight Gas Field Pay Zone Classification
    Hu, Zijian
    Bai, Fengjun
    Wang, Huajie
    Sun, Chuanhui
    Li, Pinwei
    Li, Haoyan
    Fu, Yunlong
    Zhang, Jie
    Luo, Yin
    Society of Petroleum Engineers - SPE Argentina Exploration and Production of Unconventional Resources Symposium, LAUR 2023, 2023,
  • [30] Deep learning-based approaches for robust classification of cervical cancer
    Ishak Pacal
    Serhat Kılıcarslan
    Neural Computing and Applications, 2023, 35 : 18813 - 18828