A prototype of Audio-Visual Broadcast Transcription System

被引:0
|
作者
Chaloupka, Josef [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec, Czech Republic
关键词
automatic broadcast transcription; LVCSR; machine vision;
D O I
10.1109/tsp.2019.8769103
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper focuses on the use of methods and algorithms from the area of speech processing and recognition and from the area of machine vision for designing of system for automatic audio-visual broadcast transcription. The resulting audio-visual system has been designed and created mainly for transcription of huge video databases with TV recordings in this work. The visual signal processing and recognition is usually several times computationally more demanding than audio signal processing and recognition. Therefore, all applied machine vision methods and algorithms were considered with respect to low computing time as well as the highest possible recognition rate. Our proposed broadcast transcription system was extended by several modules for visual signal segmentation, for TV channel identification, for face detection and identification and for Optical Character Recognition (OCR).
引用
收藏
页码:543 / 547
页数:5
相关论文
共 50 条
  • [1] Design of Audio-Visual TV Broadcast News Transcription System Prototype
    Chaloupka, Josef
    53RD INTERNATIONAL SYMPOSIUM ELMAR-2011, 2011, : 209 - 212
  • [2] Logo Detection and Identification in System for Audio-Visual Broadcast Transcription
    Palecek, Karel
    Chaloupka, Josef
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 357 - 360
  • [3] Optical Character Recognition for Audio-Visual Broadcast Transcription System
    Chaloupka, Josef
    Palecek, Karel
    Cerva, Petr
    Zdansky, Jindrich
    2020 11TH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2020), 2020, : 229 - 232
  • [4] Audio-visual speaker recognition for video broadcast news
    Maison, B
    Neti, C
    Senior, A
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
  • [5] Audio-Visual Speaker Recognition for Video Broadcast News
    Benoît Maison
    Chalapathy Neti
    Andrew Senior
    Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
  • [6] Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3509 - 3513
  • [7] NEW AUDIO-VISUAL SYSTEM
    不详
    EDUCATIONAL TECHNOLOGY, 1967, 7 (16) : 11 - 13
  • [8] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [9] Audio-visual vibraphone transcription in real time
    Tavares, Tiago F.
    Odowichuck, Gabrielle
    Zehtabi, Sonmaz
    Tzanetakis, George
    2012 IEEE 14TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2012, : 215 - 220
  • [10] ADVANCES IN ONLINE AUDIO-VISUAL MEETING TRANSCRIPTION
    Yoshioka, Takuya
    Abramovski, Igor
    Aksoylar, Cem
    Chen, Zhuo
    David, Moshe
    Dimitriadis, Dimitrios
    Gong, Yifan
    Gurvich, Ilya
    Huang, Xuedong
    Huang, Yan
    Hurvitz, Aviv
    Jiang, Li
    Koubi, Sharon
    Krupka, Eyal
    Leichter, Ido
    Liu, Changliang
    Parthasarathy, Partha
    Vinnikov, Alon
    Wu, Lingfeng
    Xiao, Xiong
    Xiong, Wayne
    Wang, Huaming
    Wang, Zhenghao
    Zhang, Jun
    Zhao, Yong
    Zhou, Tianyan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 276 - 283