A Bayesian framework for fusing multiple word knowledge models in videotext recognition

被引：0

作者：

Zhang, DQ ^{[1
]}

Chang, SF ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA

来源：

2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS | 2003年

关键词：

videotext recognition; video OCR; video indexing; information fusing; multimodal recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. A proposed post processing method also improves videotext detection significantly, with precision at 91.8% and recall at 95.6%.

引用

页码：528 / 533

页数：6

共 50 条

[1] Fusing multiple Bayesian knowledge sources
Santos, Eugene, Jr.
Wilkinson, John T.
Santos, Eunice E.
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2011, 52 (07) : 935 - 947
[2] HUMAN ACTION RECOGNITION FRAMEWORK BY FUSING MULTIPLE FEATURES
Xiao, Qian
Cheng, Jun
2013 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2013, : 985 - 990
[3] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
Shiota, Sayaka
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948
[4] Fusing magnitude and phase features with multiple face models for robust face recognition
Yan Li
Shiguang Shan
Ruiping Wang
Zhen Cui
Xilin Chen
Frontiers of Computer Science, 2018, 12 : 1173 - 1191
[5] Fusing magnitude and phase features with multiple face models for robust face recognition
Li, Yan
Shan, Shiguang
Wang, Ruiping
Cui, Zhen
Chen, Xilin
FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (06) : 1173 - 1191
[6] MODELS OF WORD RECOGNITION
ADAMS, MJ
COGNITIVE PSYCHOLOGY, 1979, 11 (02) : 133 - 176
[7] A BAYESIAN FRAMEWORK FOR FACE RECOGNITION
Daliri, Mohammad Reza
Saraf, Morteza
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (7A): : 4591 - 4603
[8] Models of visual word recognition
Norris, Dennis
TRENDS IN COGNITIVE SCIENCES, 2013, 17 (10) : 517 - 524
[9] CONSTRAINING MODELS OF WORD RECOGNITION
SEIDENBERG, MS
COGNITION, 1985, 20 (02) : 169 - 190
[10] Fusing multiple colour images for texturing models
Bannai, N
Agathos, A
Fisher, RB
2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 558 - 565

← 1 2 3 4 5 →