3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

被引:0
|
作者
Chao, Fu-An [1 ]
Lo, Tien-Hong [1 ,2 ]
Wu, Tzu-I [2 ]
Sung, Yao-Ting [3 ]
Chen, Berlin [2 ]
机构
[1] Natl Taiwan Normal Univ, Res Ctr Psychol & Educ Testing, Taipei, Taiwan
[2] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[3] Natl Taiwan Normal Univ, Dept Educ Psychol & Counseling, Taipei, Taiwan
关键词
computer-assisted pronunciation training; pronunciation assessment; goodness of pronunciation; segmental and suprasegmental features; self-supervised learning; MISPRONUNCIATION DETECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic and phonological features to provide a multi-view, multi-granularity, and multi-aspect (3M) pronunciation modeling. Specifically, we augment GOP with prosodic and self-supervised learning (SSL) features, and meanwhile develop a vowel/consonant positional embedding for a more phonology-aware automatic pronunciation assessment. A series of experiments conducted on the publicly-available speechocean762 dataset show that our approach can obtain significant improvements on several assessment granularities in comparison with previous work, especially on the assessment of speaking fluency and speech prosody.
引用
收藏
页码:575 / 582
页数:8
相关论文
共 50 条
  • [1] Gradformer: A Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment
    Pei, Hao-Chen
    Fang, Hao
    Luo, Xin
    Xu, Xin-Shun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 554 - 563
  • [2] TRANSFORMER-BASED MULTI-ASPECT MULTI-GRANULARITY NON-NATIVE ENGLISH SPEAKER PRONUNCIATION ASSESSMENT
    Gong, Yuan
    Chen, Ziyi
    Chu, Iek-Heng
    Chang, Peng
    Glass, James
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7262 - 7266
  • [3] Multi-view Multi-aspect Neural Networks for Next-basket Recommendation
    Deng, Zhiying
    Li, Jianjun
    Guo, Zhiqiang
    Liu, Wei
    Zou, Li
    Li, Guohui
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1283 - 1292
  • [4] Aspect-Oriented Multi-View Modeling
    Kienzle, Joerg
    Al Abed, Wisam
    Klein, Jacques
    [J]. AOSD'09: 8TH INTERNATIONAL CONFERENCE ON ASPECT-ORIENTED SOFTWARE DEVELOPMENT, 2009, : 87 - 98
  • [5] Multimodal Aspect-Level Sentiment Analysis Based on Multi-Granularity View Dynamic Fusion
    Yang, Ying
    Qian, Xinyu
    Wang, Hening
    [J]. Computer Engineering and Applications, 2024, 60 (22) : 172 - 183
  • [6] M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER
    Wang, Jie
    Yang, Yan
    Liu, Keyu
    Zhu, Zhiping
    Liu, Xiaorong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 111 - 120
  • [7] A Bayesian Approach to Multi-view 4D Modeling
    Chun-Hao Huang
    Cedric Cagniart
    Edmond Boyer
    Slobodan Ilic
    [J]. International Journal of Computer Vision, 2016, 116 : 115 - 135
  • [8] A Bayesian Approach to Multi-view 4D Modeling
    Huang, Chun-Hao
    Cagniart, Cedric
    Boyer, Edmond
    Ilic, Slobodan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 116 (02) : 115 - 135
  • [9] Effective exploitation of multi-view data through the iterative multi-scaling method - An experimental assessment
    Donelli, M.
    Franceschini, D.
    Franceschini, G.
    Massa, A.
    [J]. PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER, 2005, 54 : 137 - 154
  • [10] M3VSNET: UNSUPERVISED MULTI-METRIC MULTI-VIEW STEREO NETWORK
    Huang, Baichuan
    Yi, Hongwei
    Huang, Can
    He, Yijia
    Liu, Jingbin
    Liu, Xiao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3163 - 3167