3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

被引:0
|
作者
Chao, Fu-An [1 ]
Lo, Tien-Hong [1 ,2 ]
Wu, Tzu-I [2 ]
Sung, Yao-Ting [3 ]
Chen, Berlin [2 ]
机构
[1] Natl Taiwan Normal Univ, Res Ctr Psychol & Educ Testing, Taipei, Taiwan
[2] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[3] Natl Taiwan Normal Univ, Dept Educ Psychol & Counseling, Taipei, Taiwan
关键词
computer-assisted pronunciation training; pronunciation assessment; goodness of pronunciation; segmental and suprasegmental features; self-supervised learning; MISPRONUNCIATION DETECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic and phonological features to provide a multi-view, multi-granularity, and multi-aspect (3M) pronunciation modeling. Specifically, we augment GOP with prosodic and self-supervised learning (SSL) features, and meanwhile develop a vowel/consonant positional embedding for a more phonology-aware automatic pronunciation assessment. A series of experiments conducted on the publicly-available speechocean762 dataset show that our approach can obtain significant improvements on several assessment granularities in comparison with previous work, especially on the assessment of speaking fluency and speech prosody.
引用
收藏
页码:575 / 582
页数:8
相关论文
共 50 条
  • [21] A MULTI-VIEW INTEGRATION MODELING APPROACH FOR CYBER-PHYSICAL ROBOT SYSTEM
    Li, Fang
    Wan, Jiafu
    Zhang, Ping
    Li, Di
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 387 - 392
  • [22] A MODEL DRIVEN APPROACH SUPPORTING MULTI-VIEW SERVICES MODELING AND VARIABILITY MANAGEMENT
    Chakir, Boutaina
    Fredj, Mounia
    [J]. ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 3, 2011, : 247 - 252
  • [23] M3V: Multi-modal Multi-view Context Embedding for Repair Operator Prediction
    Xu, Xuezheng
    Wang, Xudong
    Xue, Jingling
    [J]. CGO 2022 - Proceedings of the 2022 IEEE/ACM International Symposium on Code Generation and Optimization, 2022, : 266 - 277
  • [24] M3V: Multi-modal Multi-view Context Embedding for Repair Operator Prediction
    Xu, Xuezheng
    Wang, Xudong
    Xue, Jingling
    [J]. CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2022, : 266 - 277
  • [25] A multi-view feature fusion approach for effective malware classification using Deep Learning
    Chaganti, Rajasekhar
    Ravi, Vinayakumar
    Pham, Tuan D.
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2023, 72
  • [26] A Progressive Multi-View Learning Approach for Multi-Loss Optimization in 3D Object Recognition
    Prasad, Shitala
    Li, Yiqun
    Lin, Dongyun
    Dong, Sheng
    Nwe, Ma Tin Lay
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 707 - 711
  • [27] Modeling of Multi-View 3D Freehand Radio Frequency Ultrasound
    Klein, T.
    Hansson, M.
    Navab, Nassir
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2012, PT I, 2012, 7510 : 422 - 429
  • [28] A Progressive Multi-View Learning Approach for Multi-Loss Optimization in 3D Object Recognition
    Prasad, Shitala
    Li, Yiqun
    Lin, Dongyun
    Dong, Sheng
    Nwe, Ma Tin Lay
    [J]. IEEE Signal Processing Letters, 2022, 29 : 707 - 711
  • [29] Active 3D Modeling via Online Multi-View Stereo
    Song, Soohwan
    Kim, Daekyum
    Jo, Sungho
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5284 - 5291
  • [30] Improved Modeling of 3D Shapes with Multi-view Depth Maps
    Gupta, Kamal
    Jabbireddy, Susmija
    Shah, Ketul
    Shrivastava, Abhinav
    Zwicker, Matthias
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 71 - 80