Spoken Language Identification with Deep Convolutional Neural Network and Data Augmentation

被引:0
|
作者
Korkut, Can [1 ]
Haznedaroglu, Ali [1 ]
Arslan, Levent M. [1 ,2 ]
机构
[1] Sestek, Istanbul, Turkey
[2] Bogazici Univ, Elekt Elekt Muhendisligi Bolumu, Istanbul, Turkey
关键词
Spoken Language Identification; CNN; Data Augmentation; SPEECH;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a spoken language detection system based on deep convolutional neural networks is presented. The neural network model is trained and tested on a speech dataset containing five languages. Speech signals are first converted into mel-spectrogram features and these features are fed into the deep convolutional neural network. Flattened outputs of the deep convolutional network are then fed into a recurrent layer, and a dense layer with softmax activation function is used as an output layer to predict the output language probabilities. This network results in 0.89 F1-score in our test data. We also used a data augmentation method, namely Spec Augment, which increased the F1-score to 0.94.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Convolutional Neural Network with Data Augmentation for Robust Myoelectric Control
    Luo, Tong
    Zhang, Xu
    Wu, Le
    Chen, Xi
    Chen, Xiang
    Chen, Xun
    2019 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (CIVEMSA 2019), 2019, : 129 - 133
  • [22] Data Augmentation Using Contour Image for Convolutional Neural Network
    Hwang, Seung-Yeon
    Kim, Jeong-Joon
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4669 - 4680
  • [23] Convolutional Neural Network With Data Augmentation for SAR Target Recognition
    Ding, Jun
    Chen, Bo
    Liu, Hongwei
    Huang, Mengyuan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (03) : 364 - 368
  • [24] A Convolutional Neural Network for Leaves Recognition Using Data Augmentation
    Zhang, Chaoyun
    Zhou, Pan
    Li, Chenghua
    Liu, Lijun
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 2147 - 2154
  • [25] Hierarchical Discriminative Model for Spoken Language Understanding Based on Convolutional Neural Network
    Svec, Jan
    Chylek, Adam
    Smidl, Lubos
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1864 - 1868
  • [26] Skin Identification Using Deep Convolutional Neural Network
    Oghaz, Mahdi Maktab Dar
    Argyriou, Vasileios
    Monekosso, Dorothy
    Remagnino, Paolo
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 181 - 193
  • [27] Augmentation Embedded Deep Convolutional Neural Network for Predominant Instrument Recognition
    Zhang, Jian
    Bai, Na
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [28] The Effect of Synthetic Voice Data Augmentation on Spoken Language Identification on Indian Languages
    Ambili, A. R.
    Roy, Rajesh Cherian
    IEEE ACCESS, 2023, 11 : 102391 - 102407
  • [29] Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language
    Kereni, Gil
    Deng, Jun
    Pohjalainen, Jouni
    Schuller, Bjoern
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2393 - 2397
  • [30] Data Augmentation for Dementia Detection in Spoken Language
    Hledikova, Anna
    Woszczyk, Dominika
    Acman, Alican
    Demetriou, Soteris
    Schuller, Bjoern
    INTERSPEECH 2022, 2022, : 2858 - 2862