A Bayesian framework for fusing multiple word knowledge models in videotext recognition

被引:0
|
作者
Zhang, DQ [1 ]
Chang, SF [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
关键词
videotext recognition; video OCR; video indexing; information fusing; multimodal recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. A proposed post processing method also improves videotext detection significantly, with precision at 91.8% and recall at 95.6%.
引用
收藏
页码:528 / 533
页数:6
相关论文
共 50 条
  • [1] Fusing multiple Bayesian knowledge sources
    Santos, Eugene, Jr.
    Wilkinson, John T.
    Santos, Eunice E.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2011, 52 (07) : 935 - 947
  • [2] HUMAN ACTION RECOGNITION FRAMEWORK BY FUSING MULTIPLE FEATURES
    Xiao, Qian
    Cheng, Jun
    2013 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2013, : 985 - 990
  • [3] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
    Shiota, Sayaka
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948
  • [4] Fusing magnitude and phase features with multiple face models for robust face recognition
    Yan Li
    Shiguang Shan
    Ruiping Wang
    Zhen Cui
    Xilin Chen
    Frontiers of Computer Science, 2018, 12 : 1173 - 1191
  • [5] Fusing magnitude and phase features with multiple face models for robust face recognition
    Li, Yan
    Shan, Shiguang
    Wang, Ruiping
    Cui, Zhen
    Chen, Xilin
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (06) : 1173 - 1191
  • [6] MODELS OF WORD RECOGNITION
    ADAMS, MJ
    COGNITIVE PSYCHOLOGY, 1979, 11 (02) : 133 - 176
  • [7] A BAYESIAN FRAMEWORK FOR FACE RECOGNITION
    Daliri, Mohammad Reza
    Saraf, Morteza
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (7A): : 4591 - 4603
  • [8] Models of visual word recognition
    Norris, Dennis
    TRENDS IN COGNITIVE SCIENCES, 2013, 17 (10) : 517 - 524
  • [9] CONSTRAINING MODELS OF WORD RECOGNITION
    SEIDENBERG, MS
    COGNITION, 1985, 20 (02) : 169 - 190
  • [10] Fusing multiple colour images for texturing models
    Bannai, N
    Agathos, A
    Fisher, RB
    2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 558 - 565