EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS

被引：1

作者：

Cui, Jia ^{[1
]}

Lu, Heng ^{[1
,3
]}

Wang, Wenjie ^{[2
]}

Kang, Shiyin ^{[1
,4
]}

He, Liqiang ^{[1
]}

Li, Guangzhi ^{[1
]}

Yu, Dong ^{[1
]}

机构：

[1] Tencent AI Lab, Seattle, WA 98004 USA

[2] Emory Univ, Atlanta, GA 30322 USA

[3] Ximalaya Inc, Shanghai, Peoples R China

[4] Huya Inc, Guangzhou, Peoples R China

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

Text analysis; TTS frontend; G2P; text normalization; punctuation; weakly supervised learning; phrase-based attention;

D O I：

10.1109/SLT54892.2023.10022565

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the application of pre-trained BERT model in three classic text analysis tasks: Chinese grapheme-tophoneme(G2P), text normalization(TN) and sentence punctuation annotation. Even though the full-sized BERT has prominent modeling power, there are two challenges for it in real applications: the requirement for annotated training data and the considerable computational cost. In this paper, we propose BERT-based low-latency solutions. To collect sufficient training corpus for G2P, we transfer knowledge from existing rule-based system to BERT through a large amount of unlabeled corpus. The new model could convert all characters directly from raw texts with higher accuracy. We also propose a hybrid two-stage text normalization pipeline which reduces the sentence error rate by 25% compared to the rule-based system. We offer both supervised and weakly supervised versions and find that the latter has only 1% accuracy drop from the former.

引用

页码：671 / 676

页数：6

共 50 条

[1] Text Detoxification using Large Pre-trained Neural Models
Dale, David
Voronov, Anton
Dementieva, Daryna
Logacheva, Varvara
Kozlova, Olga
Semenov, Nikita
Panchenko, Alexander
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7979 - 7996
[2] An efficient brain tumor detection and classification using pre-trained convolutional neural network models
Rao, K. Nishanth
Khalaf, Osamah Ibrahim
Krishnasree, V.
Kumar, Aruru Sai
Alsekait, Deema Mohammed
Priyanka, S. Siva
Alattas, Ahmed Saleh
AbdElminaam, Diaa Salama
[J]. HELIYON, 2024, 10 (17)
[3] Hippocampus segmentation and classification for dementia analysis using pre-trained neural network models
Priyanka, Ahana
Ganesan, Kavitha
[J]. BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2021, 66 (06): : 581 - 592
[4] Dynamic Convolutional Neural Networks as Efficient Pre-Trained Audio Models
Schmid, Florian
Koutini, Khaled
Widmer, Gerhard
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2227 - 2241
[5] Classification of Atrial Fibrillation with Pre-Trained Convolutional Neural Network Models
Qayyum, Abdul
Meriaudeau, Fabrice
Chan, Genevieve C. Y.
[J]. 2018 IEEE-EMBS CONFERENCE ON BIOMEDICAL ENGINEERING AND SCIENCES (IECBES), 2018, : 594 - 599
[6] Text clustering based on pre-trained models and autoencoders
Xu, Qiang
Gu, Hao
Ji, ShengWei
[J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 17
[7] Pre-Trained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
[J]. ACM COMPUTING SURVEYS, 2024, 56 (09)
[8] On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining
Meng, Yu
Huang, Jiaxin
Zhang, Yu
Han, Jiawei
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4052 - 4053
[9] Efficient Aspect Object Models Using Pre-trained Convolutional Neural Networks
Wilkinson, Eric
Takahashi, Takeshi
[J]. 2015 IEEE-RAS 15TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2015, : 284 - 289
[10] Unsupervised pre-trained filter learning approach for efficient convolution neural network
Rehman, Sadaqat Ur
Tu, Shanshan
Waqas, Muhammad
Huang, Yongfeng
Rehman, Obaid Ur
Ahmad, Basharat
Ahmad, Salman
[J]. NEUROCOMPUTING, 2019, 365 : 171 - 190

← 1 2 3 4 5 →