Distinctive Phonetic Features Modeling and Extraction Using Deep Neural Networks

被引：7

作者：

Seddiq, Yasser ^{[1
]}

Alotaibi, Yousef A. ^{[2
]}

Selouani, Sid-Ahmed ^{[3
]}

Meftah, Ali Hamid ^{[2
]}

机构：

[1] KACST, Riyadh 11442, Saudi Arabia

[2] King Saud Univ, Coll Comp & Informat Sci, Riyadh 4545, Saudi Arabia

[3] Univ Moncton, LARIHS Lab, Shippegan, NB E8S 1P6, Canada

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Modern standard Arabic; distinctive phonetic features; speech processing; deep belief networks; restricted Boltzmann machine;

D O I：

10.1109/ACCESS.2019.2924014

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Feature extraction is a critical stage of digital speech processing systems. Quality of features is of great importance to provide a solid foundation upon which the subsequent stages stand. Distinctive phonetic features (DPFs) are one of the most representative features of the speech signals. The significance of DPFs is in their ability to provide abstract description of the places and manners of articulation of the language phonemes. A phoneme's DPF element reflects unique articulatory information about that phoneme. Therefore, there is a need to discover and investigate each DPF element individually in order to achieve a deeper understanding and to come up with a descriptive model for each one. Such fine-grained modeling will satisfy the uniqueness of each DPF element. In this paper, the problem of DPF modeling and extraction of modern standard Arabic is tackled. Due to the remarkable success of deep neural networks (DNNs) that are initialized using deep belief networks (DBNs) in serving DSP applications and its capability of extracting highly representative features from the raw data, we exploit its modeling power to investigate and model the DPF elements. DNN models are compared with the classical multilayer perceptron (MLP) models. The representativeness of several acoustic cues for different DPF elements was also measured. This paper is based on formalizing DPF modeling problem as a binary classification problem. Because the DPF elements are highly imbalanced data, evaluating the quality of models is a very tricky process. This paper addresses the proper evaluation measures satisfying the imbalanced nature of the DPF elements. After modeling each element individually, the two top-level DPF extractors are designed: MLP- and DNN-based extractors. The results show the quality of DNN models and their superiority over MLPs with accuracies of 89.0% and 86.7%, respectively.

引用

页码：81382 / 81396

页数：15

共 50 条

[1] Predicting historical phonetic features using deep neural networks: A case study of the phonetic system of Proto-Indo-European
Hartmann, Frederik
[J]. 1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 98 - 108
[2] Object Recognition Using Deep Neural Network with Distinctive Features
Song, Hyun Chul
Akram, Farhan
Choi, Kwang Nam
[J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING (ICVIP 2018), 2018, : 203 - 207
[3] Distinctive Phonetic Feature (DPF) Based Phone Segmentation using Hybrid Neural Networks
Nurul, Huda Mohammad
Ghulam, Muhammad
Horikawa, Junsei
Nitta, Tsuneo
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1337 - 1340
[4] DEEP NEURAL NETWORKS BASED SPEAKER MODELING AT DIFFERENT LEVELS OF PHONETIC GRANULARITY
Tian, Yao
He, Liang
Cai, Meng
Zhang, Wei-Qiang
Liu, Jia
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5440 - 5444
[5] DISTINCTIVE FEATURES - PROBLEMS OF PHONETIC ADEQUACY
UGUZZONI, A
[J]. LINGUA E STILE, 1980, 15 (02) : 233 - 280
[6] "Phonetic bases of distinctive features": Introduction
Clements, G. N.
Halle, P. A.
[J]. JOURNAL OF PHONETICS, 2010, 38 (01) : 3 - 9
[7] Using of deep convolutional neural networks for visual features extraction in multiple objects tracking task
Meshcheriakov, Andrei
Popov, Sergey
[J]. 2020 VI INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND NANOTECHNOLOGY (IEEE ITNT-2020), 2020,
[8] LEARNING PHONETIC FEATURES USING CONNECTIONIST NETWORKS
WATROUS, RL
SHASTRI, L
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S93 - S94
[9] Distinctive Features of Asymmetric Neural Networks with Gabor Filters
Ishii, Naohiro
Deguchi, Toshinori
Kawaguchi, Masashi
Sasaki, Hiroshi
[J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 185 - 196
[10] A greenhouse modeling and control using deep neural networks
Salah, Latifa Belhaj
Fourati, Fathi
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1905 - 1929

← 1 2 3 4 5 →