A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination

被引:0
|
作者
Narwekar, Abhishek [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Indian Inst Technol Madras, Dept Elect Engg, Madras 600036, Tamil Nadu, India
[2] Indian Inst Sci, Dept Elect Engg, Bangalore 560012, Karnataka, India
关键词
audio-visual corpus; speech articulators; visual articulators; mutual information;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Several studies in the past have shown that the features based on the kinematics of speech articulators improve the phonetic recognition accuracy when combined with the acoustic features. It is also known that the audio-visual speech recognition performance is better than that of the audio-only recognition, which, in turn, indicates that the information from the visible articulators is complementary to that provided by the acoustic features. Typically, visible articulators can be extracted directly from a facial video. On the other hand, the speech articulators are recorded using electromagnetic articulography (EMA), which requires highly specialized equipment. Thus, the latter is not directly available in practice and hence usually estimated from speech using acoustic-to-articulatory inversion. In this work, we compare the information provided by the visible and the estimated articulators about different phonetic classes when used with and without acoustic features. The information provided by different visible, articulatory, acoustic and combined features is quantified by the mutual information (MI). For this study, we have created a large phonetically rich audio-visual (PRAV) dataset comprising of 9000 TIMIT sentences spoken by four subjects. Experiments using PRAV corpus reveal that the articulatory features estimated by inversion are more informative than the visible features but less informative than the acoustic features. This suggests that the advantage of visible articulatory features in recognition could be achieved by recovering them from the acoustic signal itself.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] A Comparison of Acoustic Features for Articulatory Inversion
    Qin, Chao
    Carreira-Perpinan, Miguel A.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2280 - 2283
  • [32] Deep Acoustic-to-Articulatory Inversion Mapping with Latent Trajectory Modeling
    Tobing, Patrick Lumban
    Kameoka, Hirokazu
    Toda, Tomoki
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1233 - 1236
  • [33] Generalized Variable Parameter HMMs Based Acoustic-to-articulatory Inversion
    Xie, Xurong
    Liu, Xunying
    Wang, Lan
    Su, Rongfeng
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 279 - 283
  • [34] Acquisition of vowel articulation in childhood investigated by acoustic-to-articulatory inversion
    Oohashi, Hiroki
    Watanabe, Hama
    Taga, Gentaro
    [J]. INFANT BEHAVIOR & DEVELOPMENT, 2017, 46 : 178 - 193
  • [35] Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy
    Sun, Yifan
    Huang, Qinlong
    Wu, Xihong
    [J]. INTERSPEECH 2022, 2022, : 4656 - 4660
  • [36] ACOUSTIC-TO-ARTICULATORY INVERSION BASED ON SPEECH DECOMPOSITION AND AUXILIARY FEATURE
    Wang, Jianrong
    Liu, Jinyu
    Zhao, Longxuan
    Wang, Shanyu
    Yu, Ruiguo
    Liu, Li
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4808 - 4812
  • [37] Acoustic-to-articulatory Speech Inversion with Multi-task Learning
    Siriwardena, Yashish M.
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    [J]. INTERSPEECH 2022, 2022, : 5020 - 5024
  • [38] Improving the performance of acoustic-to-articulatory inversion by removing the training loss of noncritical portions of articulatory channels dynamically
    Fang, Qiang
    [J]. INTERSPEECH 2020, 2020, : 1371 - 1375
  • [39] An episodic memory-based solution for the acoustic-to-articulatory inversion problem
    Demange, Sebastien
    Ouni, Slim
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (05): : 2921 - 2930
  • [40] Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition
    Afshan, Amber
    Ghosh, Prasanta Kumar
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5395 - 5399