DNN-based Acoustic-to-Articulatory Inversion using Ultrasound Tongue Imaging

被引：0

作者：

Porras, Dagoberto ^{[1
]}

Sepulveda-Sepulveda, Alexander ^{[1
]}

Csapo, Tamas Gabor ^{[2
,3
]}

机构：

[1] Univ Ind Santander, Escuela Ingn Elect Elect & Telecomunicac, Bucaramanga, Santander, Colombia

[2] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

[3] MTA ELTE Lendiilet Lingual Articulat Res Grp, Budapest, Hungary

来源：

2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年

关键词：

articulatory; ultrasound; deep neural networks; inversion; SILENT SPEECH RECOGNITION; MOVEMENTS; FEATURES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech sounds are produced as the coordinated movement of the speaking organs. There are several available methods to model the relation of articulatory movements and the resulting speech signal. The reverse problem is often called as acoustic-to-articulatory inversion (AAI). In this paper we have implemented several different Deep Neural Networks (DNNs) to estimate the articulatory information from the acoustic signal. There are several previous works related to performing this task, but most of them are using ElectroMagnetic Articulography (EMA) for tracking the articulatory movement. Compared to EMA, Ultrasound Tongue Imaging (UTI) is a technique of higher cost-benefit if we take into account equipment cost, portability, safety and visualized structures. Seeing that, our goal is to train a DNN to obtain UT images, when using speech as input. We also test two approaches to represent the articulatory information: 1) the EigenTongue space and 2) the raw ultrasound image. As an objective quality measure for the reconstructed UT images, we use MSE, Structural Similarity Index (SSIM) and Complex-Wavelet SSIM (CW-SSIM). Our experimental results show that CW-SSIM is the most useful error measure in the UTI context. We tested three different system configurations: a) simple DNN composed of 2 hidden layers with 64x64 pixels of an UTI file as target; b) the same simple DNN but with ultrasound images projected to the EigenTongue space as the target; c) and a more complex DNN composed of 5 hidden layers with UTI files projected to the EigenTongue space. In a subjective experiment the subjects found that the neural networks with two hidden layers were more suitable for this inversion task.

引用

页数：8

共 50 条

[21] REPRESENTATION LEARNING USING CONVOLUTION NEURAL NETWORK FOR ACOUSTIC-TO-ARTICULATORY INVERSION
Illa, Aravind
Ghosh, Prasanta Kumar
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5931 - 5935
[22] Information theoretic acoustic feature selection for acoustic-to-articulatory inversion
Ghosh, Prasanta Kumar
Narayanan, Shrikanth S.
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3176 - 3180
[23] Speaker conditioned acoustic-to-articulatory inversion using x-vectors
Illa, Aravind
Ghosh, Prasanta Kumar
[J]. INTERSPEECH 2020, 2020, : 1376 - 1380
[24] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
Xie, Xurong
Liu, Xunying
Wang, Lan
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
[25] Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion
Sun, Guolun
Huang, Zhihua
Wang, Li
Zhang, Pengyuan
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (19):
[26] Is average RMSE appropriate for evaluating acoustic-to-articulatory inversion?
Fang, Qiang
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 997 - 1003
[27] Improved subject-independent acoustic-to-articulatory inversion
National Institute of Technology, Karnataka , Mangalore
575025, India
不详
560012, India
[J]. Speech Commun, (1-16):
[28] Acoustic-to-articulatory inversion from infants' vowel vocalizations
Oohashi, Hiroki
Watanabe, Hama
Taga, Gentaro
[J]. NEUROSCIENCE RESEARCH, 2011, 71 : E286 - E286
[29] Improved subject-independent acoustic-to-articulatory inversion
Afshan, Amber
Ghosh, Prasanta Kumar
[J]. SPEECH COMMUNICATION, 2015, 66 : 1 - 16
[30] Multi-corpus Acoustic-to-articulatory Speech Inversion
Seneviratne, Nadee
Sivaraman, Ganesh
Espy-Wilson, Carol
[J]. INTERSPEECH 2019, 2019, : 859 - 863

← 1 2 3 4 5 →