A Bayesian framework for fusing multiple word knowledge models in videotext recognition

被引：0

作者：

Zhang, DQ ^{[1
]}

Chang, SF ^{[1
]}

机构：

[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA

来源：

2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS | 2003年

关键词：

videotext recognition; video OCR; video indexing; information fusing; multimodal recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. A proposed post processing method also improves videotext detection significantly, with precision at 91.8% and recall at 95.6%.

引用

页码：528 / 533

页数：6

共 50 条

[21] Dynamic Bayesian Networks for Handwritten Arabic Word Recognition
Ghanmi, Nabil
Awal, Amhad-Montaser
Kooli, Nihel
2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2017, : 104 - 108
[22] EXPLANATORY ADEQUACY AND MODELS OF WORD RECOGNITION
SEIDENBERG, MS
BEHAVIORAL AND BRAIN SCIENCES, 1985, 8 (04) : 724 - 726
[23] Pseudohomophone effects and models of word recognition
Seidenberg, MS
Peersen, A
MacDonald, MC
Plaut, DC
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1996, 22 (01) : 48 - 62
[24] Models of spoken-word recognition
Weber, Andrea
Scharenborg, Odette
WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2012, 3 (03) : 387 - 401
[25] NONWORD PRONUNCIATION AND MODELS OF WORD RECOGNITION
SEIDENBERG, MS
PLAUT, DC
PETERSEN, AS
MCCLELLAND, JL
MCRAE, K
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1994, 20 (06) : 1177 - 1196
[26] A Bayesian framework for word segmentation: Exploring the effects of context
Goldwater, Sharon
Griffiths, Thomas L.
Johnson, Mark
COGNITION, 2009, 112 (01) : 21 - 54
[27] Overview of models on spoken word recognition
Amano, S
JAPANESE JOURNAL OF PSYCHOLOGY, 1999, 70 (03): : 228 - 240
[28] A novel fault diagnosis method for Bayesian networks fusing models and data
Wang, Jinhua
Ma, Xuehua
Jie, Cao
Liu, Yunqiang
Li, Chen
NUCLEAR ENGINEERING AND DESIGN, 2024, 426
[29] Human interaction recognition fusing multiple features of depth sequences
Li, Jianjun
Mao, Xia
Chen, Lijiang
Wang, Lan
IET COMPUTER VISION, 2017, 11 (07) : 560 - 566
[30] Word segmentation and recognition for Web document framework
Chi, CH
Ding, C
Lim, A
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 458 - 465

← 1 2 3 4 5 →