A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination

被引:0
|
作者
Narwekar, Abhishek [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Indian Inst Technol Madras, Dept Elect Engg, Madras 600036, Tamil Nadu, India
[2] Indian Inst Sci, Dept Elect Engg, Bangalore 560012, Karnataka, India
关键词
audio-visual corpus; speech articulators; visual articulators; mutual information;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Several studies in the past have shown that the features based on the kinematics of speech articulators improve the phonetic recognition accuracy when combined with the acoustic features. It is also known that the audio-visual speech recognition performance is better than that of the audio-only recognition, which, in turn, indicates that the information from the visible articulators is complementary to that provided by the acoustic features. Typically, visible articulators can be extracted directly from a facial video. On the other hand, the speech articulators are recorded using electromagnetic articulography (EMA), which requires highly specialized equipment. Thus, the latter is not directly available in practice and hence usually estimated from speech using acoustic-to-articulatory inversion. In this work, we compare the information provided by the visible and the estimated articulators about different phonetic classes when used with and without acoustic features. The information provided by different visible, articulatory, acoustic and combined features is quantified by the mutual information (MI). For this study, we have created a large phonetically rich audio-visual (PRAV) dataset comprising of 9000 TIMIT sentences spoken by four subjects. Experiments using PRAV corpus reveal that the articulatory features estimated by inversion are more informative than the visible features but less informative than the acoustic features. This suggests that the advantage of visible articulatory features in recognition could be achieved by recovering them from the acoustic signal itself.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
  • [2] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    [J]. Journal of the Acoustical Society of America, 2008, 123 (04): : 2310 - 2323
  • [3] A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH
    Illa, Aravind
    Meenakshi, Nisha G.
    Ghosh, Prasanta Kumar
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5075 - 5079
  • [4] Jerk Minimization for Acoustic-To-Articulatory Inversion
    Rajpal, Avni
    Patil, Hemant A.
    [J]. 9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, : 82 - 87
  • [5] Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet
    Bozorg, Narjes
    Johnson, Michael T.
    [J]. INTERSPEECH 2020, 2020, : 3725 - 3729
  • [6] Formant Trajectories for Acoustic-to-Articulatory Inversion
    Ozbek, I. Yuecel
    Hasegawa-Johnson, Mark
    Demirekler, Muebeccel
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
  • [7] Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 130 (04): : EL251 - EL257
  • [8] Acoustic-to-articulatory inversion from infants' vowel vocalizations
    Oohashi, Hiroki
    Watanabe, Hama
    Taga, Gentaro
    [J]. NEUROSCIENCE RESEARCH, 2011, 71 : E286 - E286
  • [9] A study of emotional information present in articulatory movements estimated using acoustic-to-articulatory inversion
    Kim, Jangwon
    Ghosh, Prasanta
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [10] Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion
    Ouni, S
    Laprie, Y
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (01): : 444 - 460