A Bayesian framework for fusing multiple word knowledge models in videotext recognition

被引:0
|
作者
Zhang, DQ [1 ]
Chang, SF [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
关键词
videotext recognition; video OCR; video indexing; information fusing; multimodal recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. A proposed post processing method also improves videotext detection significantly, with precision at 91.8% and recall at 95.6%.
引用
收藏
页码:528 / 533
页数:6
相关论文
共 50 条
  • [21] Dynamic Bayesian Networks for Handwritten Arabic Word Recognition
    Ghanmi, Nabil
    Awal, Amhad-Montaser
    Kooli, Nihel
    2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2017, : 104 - 108
  • [22] EXPLANATORY ADEQUACY AND MODELS OF WORD RECOGNITION
    SEIDENBERG, MS
    BEHAVIORAL AND BRAIN SCIENCES, 1985, 8 (04) : 724 - 726
  • [23] Pseudohomophone effects and models of word recognition
    Seidenberg, MS
    Peersen, A
    MacDonald, MC
    Plaut, DC
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1996, 22 (01) : 48 - 62
  • [24] Models of spoken-word recognition
    Weber, Andrea
    Scharenborg, Odette
    WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2012, 3 (03) : 387 - 401
  • [25] NONWORD PRONUNCIATION AND MODELS OF WORD RECOGNITION
    SEIDENBERG, MS
    PLAUT, DC
    PETERSEN, AS
    MCCLELLAND, JL
    MCRAE, K
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1994, 20 (06) : 1177 - 1196
  • [26] A Bayesian framework for word segmentation: Exploring the effects of context
    Goldwater, Sharon
    Griffiths, Thomas L.
    Johnson, Mark
    COGNITION, 2009, 112 (01) : 21 - 54
  • [27] Overview of models on spoken word recognition
    Amano, S
    JAPANESE JOURNAL OF PSYCHOLOGY, 1999, 70 (03): : 228 - 240
  • [28] A novel fault diagnosis method for Bayesian networks fusing models and data
    Wang, Jinhua
    Ma, Xuehua
    Jie, Cao
    Liu, Yunqiang
    Li, Chen
    NUCLEAR ENGINEERING AND DESIGN, 2024, 426
  • [29] Human interaction recognition fusing multiple features of depth sequences
    Li, Jianjun
    Mao, Xia
    Chen, Lijiang
    Wang, Lan
    IET COMPUTER VISION, 2017, 11 (07) : 560 - 566
  • [30] Word segmentation and recognition for Web document framework
    Chi, CH
    Ding, C
    Lim, A
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 458 - 465