A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination

被引：0

作者：

Narwekar, Abhishek ^{[1
]}

Ghosh, Prasanta Kumar ^{[2
]}

机构：

[1] Indian Inst Technol Madras, Dept Elect Engg, Madras 600036, Tamil Nadu, India

[2] Indian Inst Sci, Dept Elect Engg, Bangalore 560012, Karnataka, India

来源：

2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM) | 2016年

关键词：

audio-visual corpus; speech articulators; visual articulators; mutual information;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Several studies in the past have shown that the features based on the kinematics of speech articulators improve the phonetic recognition accuracy when combined with the acoustic features. It is also known that the audio-visual speech recognition performance is better than that of the audio-only recognition, which, in turn, indicates that the information from the visible articulators is complementary to that provided by the acoustic features. Typically, visible articulators can be extracted directly from a facial video. On the other hand, the speech articulators are recorded using electromagnetic articulography (EMA), which requires highly specialized equipment. Thus, the latter is not directly available in practice and hence usually estimated from speech using acoustic-to-articulatory inversion. In this work, we compare the information provided by the visible and the estimated articulators about different phonetic classes when used with and without acoustic features. The information provided by different visible, articulatory, acoustic and combined features is quantified by the mutual information (MI). For this study, we have created a large phonetically rich audio-visual (PRAV) dataset comprising of 9000 TIMIT sentences spoken by four subjects. Experiments using PRAV corpus reveal that the articulatory features estimated by inversion are more informative than the visible features but less informative than the acoustic features. This suggests that the advantage of visible articulatory features in recognition could be achieved by recovering them from the acoustic signal itself.

引用

页数：5

共 50 条

[1] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
Potard, Blaise
Laprie, Yves
Ouni, Slim
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
[2] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
Potard, Blaise
Laprie, Yves
Ouni, Slim
[J]. Journal of the Acoustical Society of America, 2008, 123 (04): : 2310 - 2323
[3] A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH
Illa, Aravind
Meenakshi, Nisha G.
Ghosh, Prasanta Kumar
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5075 - 5079
[4] Jerk Minimization for Acoustic-To-Articulatory Inversion
Rajpal, Avni
Patil, Hemant A.
[J]. 9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, : 82 - 87
[5] Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet
Bozorg, Narjes
Johnson, Michael T.
[J]. INTERSPEECH 2020, 2020, : 3725 - 3729
[6] Formant Trajectories for Acoustic-to-Articulatory Inversion
Ozbek, I. Yuecel
Hasegawa-Johnson, Mark
Demirekler, Muebeccel
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
[7] Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion
Ghosh, Prasanta Kumar
Narayanan, Shrikanth
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 130 (04): : EL251 - EL257
[8] Acoustic-to-articulatory inversion from infants' vowel vocalizations
Oohashi, Hiroki
Watanabe, Hama
Taga, Gentaro
[J]. NEUROSCIENCE RESEARCH, 2011, 71 : E286 - E286
[9] A study of emotional information present in articulatory movements estimated using acoustic-to-articulatory inversion
Kim, Jangwon
Ghosh, Prasanta
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[10] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion
Ouni, S
Laprie, Y
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (01): : 444 - 460

← 1 2 3 4 5 →