Deep End-to-End Representation Learning for Food Type Recognition from Speech

被引:1
|
作者
Sertolli, Benjamin [1 ]
Cummins, Nicholas [1 ]
Sengur, Abdulkadir [2 ]
Schuller, Bjorn W. [1 ,3 ]
机构
[1] Univ Augsburg, ZDB Chair Embedded Intelligence Hlth Care & Wellb, Augsburg, Germany
[2] Firat Univ, Technol Fac, Elect & Elect Engn Dept, Elazig, Turkey
[3] Imperial Coll London, GLAM, London, England
关键词
Eating Condition; Deep Representation Learning; End-to-End Learning; Compact Bilinear Pooling; Recurrent Neural Networks; EMOTION RECOGNITION;
D O I
10.1145/3242969.3243683
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The use of Convolutional Neural Networks (CNN) pre-trained for a particular task, as a feature extractor for an alternate task, is a standard practice in many image classification paradigms. However, to date there have been comparatively few works exploring this technique for speech classification tasks. Herein, we utilise a pre-trained end-to-end Automatic Speech Recognition CNN as a feature extractor for the task of food-type recognition from speech. Furthermore, we also explore the benefits of Compact Bilinear Pooling for combining multiple feature representations extracted from the CNN. Key results presented indicate the suitability of this approach. When combined with a Recurrent Neural Network classifier, our strongest system achieves, for a seven-class food-type classification task an unweighted average recall of 73.3 % on the test set of the IHEARu-EAT database.
引用
收藏
页码:574 / 578
页数:5
相关论文
共 50 条
  • [31] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [32] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Stafylakis, Themos
    Ma, Pingchuan
    Cai, Feipeng
    Tzimiropoulos, Georgios
    Pantic, Maja
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
  • [33] Multichannel End-to-end Speech Recognition
    Ochiai, Tsubasa
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [34] End-to-end Accented Speech Recognition
    Viglino, Thibault
    Motlicek, Petr
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 2140 - 2144
  • [35] End-to-end speech recognition from raw speech: Multi time-frequency resolution CNN architecture for efficient representation learning
    Eledath, Dhanya
    Inbarajan, P.
    Biradar, Anurag
    Mahadeva, Sathwick
    Ramasubramanian, V
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 536 - 540
  • [36] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [37] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
    Tzirakis, Panagiotis
    Zhang, Jiehao
    Schuller, Bjoern W.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
  • [38] Representation transfer learning from deep end -to -end speech recognition networks for the classi fi cation of health states from speech
    Sertolli, Benjamin
    Ren, Zhao
    Schuller, Bjoern W.
    Cummins, Nicholas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 68
  • [39] Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition
    Kim, June-Woo
    Chung, Hoon
    Jung, Ho-Young
    [J]. MATHEMATICS, 2023, 11 (03)
  • [40] Traffic Signal Recognition Using End-to-End Deep Learning
    Sarker, Tonmoy
    Meng, Xiangyu
    [J]. TRAN-SET 2022, 2022, : 182 - 191