Real-time Synchronization of Live speech with Its Transcription

被引：0

作者：

Lertwongkhanakool, Nat ^{[1
]}

Punyabukkana, Proadpran ^{[1
]}

Suchato, Atiwong ^{[1
]}

机构：

[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Spoken Language Syst Res Grp, Bangkok, Thailand

来源：

2013 10TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON) | 2013年

关键词：

Automatic speech-text synchronization; Syllable Detection; Real-Time alignment; Live speech and transcription alignment; Endpoint Detection;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Most of the researches in synchronization of audio and text have been focusing on the synchronization at the level of utterance. However, to generate audio books in unstructured language like Thai from live speech, a finer level of synchronization is necessary. We propose an algorithm to synchronize live speech with its corresponding transcription in real time at syllabic unit. The proposed algorithm employs the syllable endpoint detection method and the syllable landmark detection method with bandlimited intensity as features. The experiment was conducted with LOTUS datasets and the results were compared with baseline ASR-based syllable detection. We evaluated our algorithm by measuring its error through error aberration, which is the difference of the actual number of syllables and the detected syllables for each phrase, and found average total error aberration of the proposed algorithm to outperform that of the baseline. The average total error aberrations are 11.54 and 34.21 for the proposed method and the baseline respectively. We also found the reference deviation from our method to be better than that of the baseline as well.

引用

页数：5

共 50 条

[1] An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach
Lertwongkhanakool, Nat
Kertkeidkachorn, Natthawut
Punyabukkana, Proadpran
Suchato, Atiwong
ENGINEERING JOURNAL-THAILAND, 2015, 19 (05): : 81 - 99
[2] Delay computation for real-time synchronization of speech and its converted text
Ali, Hamida Qunber
Ahmed, Jameel
Siyal, Mohammed Yakoob
2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 233 - +
[3] Using Speech Recognition for Real-Time Captioning and Lecture Transcription in the Classroom
Ranchal, Rohit
Taber-Doughty, Teresa
Guo, Yiren
Bain, Keith
Martin, Heather
Robinson, J. Paul
Duerstock, Bradley S.
IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2013, 6 (04): : 299 - 311
[4] The Intersection of "Live' and "Real-time'
Hagan, Kerry L.
ORGANISED SOUND, 2016, 21 (02) : 138 - 146
[5] Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
Lu, Yuanxun
Chai, Jinxiang
Cao, Xun
ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
[6] Real-Time Imaging of Transcription and Transport of Single Arc mRNA in Live Neurons
Moon, Hyungseok C.
Das, Sulagna
Singer, Robert H.
Park, Hye Yoon
BIOPHYSICAL JOURNAL, 2020, 118 (03) : 545A - 545A
[7] SEEING SPEECH IN REAL-TIME
FLETCHER, SG
IEEE SPECTRUM, 1982, 19 (04) : 42 - 45
[8] REAL-TIME SPEECH CODING
CROCHIERE, RE
COX, RV
JOHNSTON, JD
IEEE TRANSACTIONS ON COMMUNICATIONS, 1982, 30 (04) : 621 - 634
[9] REAL-TIME SPEECH RECOGNITION
CAELEN, J
CASTAN, S
PERENNOU, G
AUTOMATISME, 1972, 17 (03): : 87 - &
[10] REAL-TIME SPEECH SYNTHESIS
COHEN, MM
MASSARO, DW
BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1976, 8 (02): : 189 - 196

← 1 2 3 4 5 →