Real-time Synchronization of Live speech with Its Transcription

被引:0
|
作者
Lertwongkhanakool, Nat [1 ]
Punyabukkana, Proadpran [1 ]
Suchato, Atiwong [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Spoken Language Syst Res Grp, Bangkok, Thailand
关键词
Automatic speech-text synchronization; Syllable Detection; Real-Time alignment; Live speech and transcription alignment; Endpoint Detection;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Most of the researches in synchronization of audio and text have been focusing on the synchronization at the level of utterance. However, to generate audio books in unstructured language like Thai from live speech, a finer level of synchronization is necessary. We propose an algorithm to synchronize live speech with its corresponding transcription in real time at syllabic unit. The proposed algorithm employs the syllable endpoint detection method and the syllable landmark detection method with bandlimited intensity as features. The experiment was conducted with LOTUS datasets and the results were compared with baseline ASR-based syllable detection. We evaluated our algorithm by measuring its error through error aberration, which is the difference of the actual number of syllables and the detected syllables for each phrase, and found average total error aberration of the proposed algorithm to outperform that of the baseline. The average total error aberrations are 11.54 and 34.21 for the proposed method and the baseline respectively. We also found the reference deviation from our method to be better than that of the baseline as well.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach
    Lertwongkhanakool, Nat
    Kertkeidkachorn, Natthawut
    Punyabukkana, Proadpran
    Suchato, Atiwong
    ENGINEERING JOURNAL-THAILAND, 2015, 19 (05): : 81 - 99
  • [2] Delay computation for real-time synchronization of speech and its converted text
    Ali, Hamida Qunber
    Ahmed, Jameel
    Siyal, Mohammed Yakoob
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 233 - +
  • [3] Using Speech Recognition for Real-Time Captioning and Lecture Transcription in the Classroom
    Ranchal, Rohit
    Taber-Doughty, Teresa
    Guo, Yiren
    Bain, Keith
    Martin, Heather
    Robinson, J. Paul
    Duerstock, Bradley S.
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2013, 6 (04): : 299 - 311
  • [4] The Intersection of "Live' and "Real-time'
    Hagan, Kerry L.
    ORGANISED SOUND, 2016, 21 (02) : 138 - 146
  • [5] Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
    Lu, Yuanxun
    Chai, Jinxiang
    Cao, Xun
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
  • [6] Real-Time Imaging of Transcription and Transport of Single Arc mRNA in Live Neurons
    Moon, Hyungseok C.
    Das, Sulagna
    Singer, Robert H.
    Park, Hye Yoon
    BIOPHYSICAL JOURNAL, 2020, 118 (03) : 545A - 545A
  • [7] SEEING SPEECH IN REAL-TIME
    FLETCHER, SG
    IEEE SPECTRUM, 1982, 19 (04) : 42 - 45
  • [8] REAL-TIME SPEECH CODING
    CROCHIERE, RE
    COX, RV
    JOHNSTON, JD
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1982, 30 (04) : 621 - 634
  • [9] REAL-TIME SPEECH RECOGNITION
    CAELEN, J
    CASTAN, S
    PERENNOU, G
    AUTOMATISME, 1972, 17 (03): : 87 - &
  • [10] REAL-TIME SPEECH SYNTHESIS
    COHEN, MM
    MASSARO, DW
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1976, 8 (02): : 189 - 196