Automatic Speech Recognition for Live TV Subtitling for Hearing-Impaired People

被引:0
|
作者
Obach, Michael [1 ]
Lehr, Maider [1 ]
Arruti, Andoni
机构
[1] VICOMTech Visual Interact & Commun Technol Ctr, E-20009 Donostia San Sebastian, Spain
来源
关键词
Subtitling; Live Subtitling; Closed Captioning; Automatic Speech Recognition; Hearing Impaired; Deaf and Hard of Hearing; Teletext;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most Spanish TV channels offer subtitles (closed captions) only for some of their pre-recorded programmes, and mainly due to the costs of specially trained stenographers and fast typists, subtitles are rarely available for live programmes like news broadcasts, sports events, and others. Progress in automatic speech recognition (ASR) opens a new way for live subtitling, but only works well when trained to recognise a single voice and when trained previously with material related to the contents of the programmes. We developed a prototype based on ASR that could be applied to generate automatically live subtitles as teletext for Spanish news broadcasts without human participation. The main goal was to evaluate the feasibility of using this technology to improve the quality of life of millions of hearing-impaired people, in accordance with applicable and future Spanish legislation. State-of-the-art speech recognition software for dictation as literal transcription of speech and a commercial teletext generator conforming to Spanish standards were integrated with our modules for improved pre-processing of the audio signal, voice normalization for speaker independence, speech/non-speech segmentation, and tools for the generation and update of dictionaries. The prototype was validated in cooperation with a TV broadcaster, which provided audiovisual material for the generation of the language corpus and specific dictionaries. System outputs were evaluated by organizations of the deaf and the hard of hearing. Results indicate that ASR is (still) not suitable for fully automated live subtitling. A delay of several seconds between speech and subtitle was observed. A limited word recognition rate, mainly caused by a huge number of named entities and variability of speakers and acoustic conditions, made understanding of the news sometimes impossible. We identified the lack of automatic punctuation as a major problem that decreased the readability of the contents of subtitles and also affected recognition quality. Many results are valid for other languages and other areas of subtitling than television.
引用
收藏
页码:286 / 291
页数:6
相关论文
共 50 条
  • [1] SUBTITLING LIVE TELEVISION PROGRAMS FOR THE HEARING-IMPAIRED
    NEWELL, AF
    HUTT, PR
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1979, 11 (06): : 693 - 699
  • [2] AUTOMATIC SPEECH RECOGNITION TO AID THE HEARING-IMPAIRED - PROSPECTS FOR THE AUTOMATIC-GENERATION OF CUED SPEECH
    UCHANSKI, RM
    DELHORNE, LA
    DIX, AK
    BRAIDA, LD
    REED, CM
    DURLACH, NI
    [J]. JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT, 1994, 31 (01): : 20 - 41
  • [3] Telephony Speech Enhancement for Hearing-Impaired People
    Prasad, N.
    Kumar, Praveen E.
    Sitaramanjaneyulu, P.
    Raju, Srinivasa G. R. L. V. N.
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [4] RECOGNITION OF SYNTHETIC SPEECH BY HEARING-IMPAIRED ELDERLY LISTENERS
    HUMES, LE
    NELSON, KJ
    PISONI, DB
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1991, 34 (05): : 1180 - 1184
  • [5] Audio and Text Synchronization for TV news Subtitling based on Automatic Speech Recognition
    Enrique Garcia, Jose
    Ortega, Alfonso
    Lleida, Eduardo
    Lozano, Tomas
    Bernues, Emiliano
    Sanchez, Daniel
    [J]. BMSB: 2009 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, VOLS 1 AND 2, 2009, : 277 - +
  • [6] Multimodal Name Recognition in Live TV Subtitling
    Hruz, Marek
    Prazak, Ales
    Busta, Michal
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3529 - 3532
  • [7] VISIBLE SPEECH FOR THE HEARING-IMPAIRED
    SNELL, RC
    DICKSON, BC
    INGRAM, JCL
    [J]. COMPUTERS & EDUCATION, 1984, 8 (04) : 441 - 444
  • [9] SOME EFFECTS OF TRAINING ON SPEECH RECOGNITION BY HEARING-IMPAIRED ADULTS
    WALDEN, BE
    ERDMAN, SA
    MONTGOMERY, AA
    SCHWARTZ, DM
    PROSEK, RA
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1981, 24 (02): : 207 - 216