An End-to-end Approach to Language Identification in Short Utterances using Convolutional Neural Networks

被引:0
|
作者
Lozano-Diez, Alicia [1 ]
Zazo-Candil, Ruben [1 ]
Gonzalez-Dominguez, Javier [1 ]
Toledano, Doroteo T. [1 ]
Gonzalez-Rodriguez, Joaquin [1 ]
机构
[1] Univ Autonoma Madrid, ATVS Biometr Recognit Grp, Madrid, Spain
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose an end-to-end approach to the language identification (LID) problem based on Convolutional Deep Neural Networks (CDNNs). The use of CDNNs is mainly motivated by the ability they have shown when modeling speech signals, and their relatively low-cost with respect to other deep architectures in terms of number of free parameters. We evaluate different configurations in a subset of 8 languages within the NIST Language Recognition Evaluation 2009 Voice of America (VOA) dataset, for the task of short test durations (segments up to 3 seconds of speech). The proposed CDNN-based systems achieve comparable performances to our baseline i-vector system, while reducing drastically the number of parameters to tune (at least 100 times fewer parameters). Then, we combine these CDNN-based systems and the i-vector baseline with a simple fusion at score level. This combination outperforms our best standalone system (up to 11% of relative improvement in terms of EER).
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [11] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Tao, Wen
    Jiang, Feng
    Zhang, Shengping
    Ren, Jie
    Shi, Wuzhen
    Zuo, Wangmeng
    Guo, Xun
    Zhao, Debin
    2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 463 - 463
  • [12] Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short-duration speech
    Monteiro, Joao
    Alam, Jahangir
    Falk, Tiago H.
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 364 - 376
  • [13] EXPLORING END-TO-END ATTENTION-BASED NEURAL NETWORKS FOR NATIVE LANGUAGE IDENTIFICATION
    Ubale, Rutuja
    Qian, Yao
    Evanini, Keelan
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 84 - 91
  • [14] End-to-end recognition of slab identification numbers using a deep convolutional neural network
    Lee, Sang Jun
    Yun, Jong Pil
    Koo, Gyogwon
    Kim, Sang Woo
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 1 - 10
  • [15] END-TO-END PHOTOPLETHYSMOGRAPHY (PPG) BASED BIOMETRIC AUTHENTICATION BY USING CONVOLUTIONAL NEURAL NETWORKS
    Luque, Jordi
    Cortes, Guillem
    Segura, Carlos
    Maravilla, Alexandre
    Esteban, Javier
    Fabregat, Joan
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 538 - 542
  • [16] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [17] Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5248 - 5256
  • [18] Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks
    Kofler, Andreas
    Wald, Christian
    Schaeffter, Tobias
    Haltmeier, Markus
    Kolbitsch, Christoph
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1213 - 1217
  • [19] End-to-end face parsing via interlinked convolutional neural networks
    Zi Yin
    Valentin Yiu
    Xiaolin Hu
    Liang Tang
    Cognitive Neurodynamics, 2021, 15 : 169 - 179
  • [20] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26