A comparison of character n-grams and dictionaries used for script recognition

被引:3
|
作者
Brakensiek, A [1 ]
Rigoll, G [1 ]
机构
[1] Univ Duisburg Gesamthsch, Fac Elect Engn, Dept Comp Sci, D-47057 Duisburg, Germany
关键词
D O I
10.1109/ICDAR.2001.953791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper an off-line script recognition system is described, which makes use of a language model, that consists of backoff character n-grams. The performance of this open vocabulary recognition is compared with the use of closed dictionaries. The system is based on Hidden Markov Models (HMMs) using a hybrid modeling technique, which depends on a neural vector quantizer The presented recognition results refer to the SEDAL-database of degraded English documents such as photocopy or fax and a writer-dependent handwritten database of cursive German script samples. Our resulting system for character recognition yields significantly better recognition results for an unlimited vocabulary using language models.
引用
收藏
页码:241 / 245
页数:3
相关论文
共 50 条
  • [1] Unconstrained Offline Handwriting Recognition using Connectionist Character N-grams
    Zamora-Martinez, F.
    Castro-Bleda, M. J.
    Espana-Boquera, S.
    Gorbe-Moya, J.
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [2] Comparing word, character, and phoneme n-grams for subjective utterance recognition
    Wilson, Theresa
    Raaijmakers, Stephan
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1614 - +
  • [3] Handwritten address recognition with open vocabulary using character n-grams
    Brakensiek, A
    Rottland, J
    Rigoll, G
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 357 - 362
  • [4] Statistical Analysis of the Indus Script Using n-Grams
    Yadav, Nisha
    Joglekar, Hrishikesh
    Rao, Rajesh P. N.
    Vahia, Mayank N.
    Adhikari, Ronojoy
    Mahadevan, Iravatham
    PLOS ONE, 2010, 5 (03):
  • [5] Detection of Opinion Spam with Character n-grams
    Hernandez Fusilier, Donato
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Guzman Cabrera, Rafael
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 285 - 294
  • [6] Spam detection using character N-grams
    Kanaris, Ioannis
    Kanaris, Konstantinos
    Stamatatos, Efstathios
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 95 - 104
  • [7] Improved degraded document recognition with hybrid modeling techniques and character n-grams
    Brakensiek, A
    Willett, D
    Rigoll, G
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 438 - 441
  • [8] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
    Lecluze, Charlotte
    Rigouste, Lois
    Giguet, Emmanuel
    Lucas, Nadine
    CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481
  • [9] Mining generalized character n-grams in large corpora
    Marques, Nuno C.
    Braud, Agnès
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2902 : 419 - 423
  • [10] Character N-Grams for Detecting Deceptive Controversial Opinions
    Sanchez-Junquera, Javier
    Villasenor-Pineda, Luis
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2018), 2018, 11018 : 135 - 140