Distinctive Phonetic Features Modeling and Extraction Using Deep Neural Networks

被引:7
|
作者
Seddiq, Yasser [1 ]
Alotaibi, Yousef A. [2 ]
Selouani, Sid-Ahmed [3 ]
Meftah, Ali Hamid [2 ]
机构
[1] KACST, Riyadh 11442, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Riyadh 4545, Saudi Arabia
[3] Univ Moncton, LARIHS Lab, Shippegan, NB E8S 1P6, Canada
关键词
Modern standard Arabic; distinctive phonetic features; speech processing; deep belief networks; restricted Boltzmann machine;
D O I
10.1109/ACCESS.2019.2924014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature extraction is a critical stage of digital speech processing systems. Quality of features is of great importance to provide a solid foundation upon which the subsequent stages stand. Distinctive phonetic features (DPFs) are one of the most representative features of the speech signals. The significance of DPFs is in their ability to provide abstract description of the places and manners of articulation of the language phonemes. A phoneme's DPF element reflects unique articulatory information about that phoneme. Therefore, there is a need to discover and investigate each DPF element individually in order to achieve a deeper understanding and to come up with a descriptive model for each one. Such fine-grained modeling will satisfy the uniqueness of each DPF element. In this paper, the problem of DPF modeling and extraction of modern standard Arabic is tackled. Due to the remarkable success of deep neural networks (DNNs) that are initialized using deep belief networks (DBNs) in serving DSP applications and its capability of extracting highly representative features from the raw data, we exploit its modeling power to investigate and model the DPF elements. DNN models are compared with the classical multilayer perceptron (MLP) models. The representativeness of several acoustic cues for different DPF elements was also measured. This paper is based on formalizing DPF modeling problem as a binary classification problem. Because the DPF elements are highly imbalanced data, evaluating the quality of models is a very tricky process. This paper addresses the proper evaluation measures satisfying the imbalanced nature of the DPF elements. After modeling each element individually, the two top-level DPF extractors are designed: MLP- and DNN-based extractors. The results show the quality of DNN models and their superiority over MLPs with accuracies of 89.0% and 86.7%, respectively.
引用
收藏
页码:81382 / 81396
页数:15
相关论文
共 50 条
  • [1] Predicting historical phonetic features using deep neural networks: A case study of the phonetic system of Proto-Indo-European
    Hartmann, Frederik
    [J]. 1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 98 - 108
  • [2] Object Recognition Using Deep Neural Network with Distinctive Features
    Song, Hyun Chul
    Akram, Farhan
    Choi, Kwang Nam
    [J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING (ICVIP 2018), 2018, : 203 - 207
  • [3] Distinctive Phonetic Feature (DPF) Based Phone Segmentation using Hybrid Neural Networks
    Nurul, Huda Mohammad
    Ghulam, Muhammad
    Horikawa, Junsei
    Nitta, Tsuneo
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1337 - 1340
  • [4] DEEP NEURAL NETWORKS BASED SPEAKER MODELING AT DIFFERENT LEVELS OF PHONETIC GRANULARITY
    Tian, Yao
    He, Liang
    Cai, Meng
    Zhang, Wei-Qiang
    Liu, Jia
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5440 - 5444
  • [5] DISTINCTIVE FEATURES - PROBLEMS OF PHONETIC ADEQUACY
    UGUZZONI, A
    [J]. LINGUA E STILE, 1980, 15 (02) : 233 - 280
  • [6] "Phonetic bases of distinctive features": Introduction
    Clements, G. N.
    Halle, P. A.
    [J]. JOURNAL OF PHONETICS, 2010, 38 (01) : 3 - 9
  • [7] Using of deep convolutional neural networks for visual features extraction in multiple objects tracking task
    Meshcheriakov, Andrei
    Popov, Sergey
    [J]. 2020 VI INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND NANOTECHNOLOGY (IEEE ITNT-2020), 2020,
  • [8] LEARNING PHONETIC FEATURES USING CONNECTIONIST NETWORKS
    WATROUS, RL
    SHASTRI, L
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S93 - S94
  • [9] Distinctive Features of Asymmetric Neural Networks with Gabor Filters
    Ishii, Naohiro
    Deguchi, Toshinori
    Kawaguchi, Masashi
    Sasaki, Hiroshi
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 185 - 196
  • [10] A greenhouse modeling and control using deep neural networks
    Salah, Latifa Belhaj
    Fourati, Fathi
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1905 - 1929