Deep End-to-End Representation Learning for Food Type Recognition from Speech

被引:1
|
作者
Sertolli, Benjamin [1 ]
Cummins, Nicholas [1 ]
Sengur, Abdulkadir [2 ]
Schuller, Bjorn W. [1 ,3 ]
机构
[1] Univ Augsburg, ZDB Chair Embedded Intelligence Hlth Care & Wellb, Augsburg, Germany
[2] Firat Univ, Technol Fac, Elect & Elect Engn Dept, Elazig, Turkey
[3] Imperial Coll London, GLAM, London, England
关键词
Eating Condition; Deep Representation Learning; End-to-End Learning; Compact Bilinear Pooling; Recurrent Neural Networks; EMOTION RECOGNITION;
D O I
10.1145/3242969.3243683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The use of Convolutional Neural Networks (CNN) pre-trained for a particular task, as a feature extractor for an alternate task, is a standard practice in many image classification paradigms. However, to date there have been comparatively few works exploring this technique for speech classification tasks. Herein, we utilise a pre-trained end-to-end Automatic Speech Recognition CNN as a feature extractor for the task of food-type recognition from speech. Furthermore, we also explore the benefits of Compact Bilinear Pooling for combining multiple feature representations extracted from the CNN. Key results presented indicate the suitability of this approach. When combined with a Recurrent Neural Network classifier, our strongest system achieves, for a seven-class food-type classification task an unweighted average recall of 73.3 % on the test set of the IHEARu-EAT database.
引用
收藏
页码:574 / 578
页数:5
相关论文
共 50 条
  • [1] Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
    Sertolli, Benjamin
    Ren, Zhao
    Schuller, Björn W.
    Cummins, Nicholas
    [J]. Computer Speech and Language, 2021, 68
  • [2] Arabic speech recognition using end-to-end deep learning
    Alsayadi, Hamzah A.
    Abdelhamid, Abdelaziz A.
    Hegazy, Islam
    Fayed, Zaki T.
    [J]. IET SIGNAL PROCESSING, 2021, 15 (08) : 521 - 534
  • [3] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [4] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    [J]. INTERSPEECH 2020, 2020, : 1025 - 1026
  • [5] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    [J]. INTERSPEECH 2022, 2022, : 3819 - 3823
  • [6] An End-to-End Deep Learning Framework with Speech Emotion Recognition of Atypical Individuals
    Tang, Dengke
    Zeng, Junlin
    Li, Ming
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 162 - 166
  • [7] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [8] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540
  • [9] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    Amodei, Dario
    Ananthanarayanan, Sundaram
    Anubhai, Rishita
    Bai, Jingliang
    Battenberg, Eric
    Case, Carl
    Casper, Jared
    Catanzaro, Bryan
    Cheng, Qiang
    Chen, Guoliang
    Chen, Jie
    Chen, Jingdong
    Chen, Zhijie
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Greg
    Ding, Ke
    Du, Niandong
    Elsen, Erich
    Engel, Jesse
    Fang, Weiwei
    Fan, Linxi
    Fougner, Christopher
    Gao, Liang
    Gong, Caixia
    Hannun, Awni
    Han, Tony
    Johannes, Lappi Vaino
    Jiang, Bing
    Ju, Cai
    Jun, Billy
    LeGresley, Patrick
    Lin, Libby
    Liu, Junjie
    Liu, Yang
    Li, Weigao
    Li, Xiangang
    Ma, Dongpeng
    Narang, Sharan
    Ng, Andrew
    Ozair, Sherjil
    Peng, Yiping
    Prenger, Ryan
    Qian, Sheng
    Quan, Zongfeng
    Raiman, Jonathan
    Rao, Vinay
    Satheesh, Sanjeev
    Seetapun, David
    Sengupta, Shubho
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823