Research on Chinese Audio and Text Alignment Algorithm Based on AIC-FCM and Doc2Vec

被引：0

作者：

Chen, Keliang ^{[1
]}

Huang, Jianming ^{[1
]}

Cui, Yansong ^{[1
]}

Ren, Weizheng ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Elect Engn, 10 Xitucheng Rd, Beijing 100876, Peoples R China

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Audio and text alignment; fuzzy C-means clustering algorithm; akaike information criterion; Doc2vec; dual threshold endpoint detection; MENTION HYPERGRAPH; WORD2VEC; MODEL;

D O I：

10.1145/3532852

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

"Audiobook" is a multimedia-based reading technology that has emerged in recent years. Realizing the alignment of e-book text and book audio is the most important part of its processing. This article describes an audio and text alignment algorithm using deep learning and neural network technology to improve the efficiency and quality of audiobook production. The algorithm first uses dual-threshold endpoint detection technology to segment long audio into short audio with sentence dimensions and recognizes it as short text. The threshold is calculated by AIC-FCM optimized based on simulated annealing genetic algorithm. Then the algorithm uses Doc2vec optimized by the threshold prediction method based on the average length of the short text to calculate the text similarity. Finally, proofread and output the text sequence and audio segment aligned in the time dimension to meet the needs of audiobook production. Experiments show that compared to traditional audio and text alignment algorithms, the proposed algorithm is closer to the ideal segmentation result in long audio segmentation, and the alignment effect is basically the same as Doc2vec and the time complexity is reduced by about 35%.

引用

页数：22