Language identification of on-line documents using word shapes

被引:0
|
作者
Nobile, N
Bergler, S
Suen, CY
Khoury, S
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have extended existing methods to identify the language of an on-line document after the characters have been coded using 10 character classes based on visual characteristics, In particular, we exploit word bigrams and trigrams in both a linear combination of score values and an expert systems approach. Knowledge about each language is acquired from a large number of on-line texts. Using a small set of rules, the expert system outperforms the linear combination in accuracy and shows more stability when parameter settings are varied.
引用
收藏
页码:258 / 262
页数:5
相关论文
共 50 条
  • [1] A developmental investigation of word length effects in reading using a new on-line word identification paradigm
    Bijeljac-Babic R.
    Millogo V.
    Farioli F.
    Grainger J.
    Reading and Writing, 2004, 17 (4) : 411 - 431
  • [2] Structure in on-line documents
    Jain, AK
    Namboodiri, AM
    Subrahmonia, J
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 844 - 848
  • [3] On-line learning of language models with word error probability distributions
    Gretter, R
    Riccardi, G
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 557 - 560
  • [4] On-line damage assessment using operating deflection shapes
    Pascual, R
    Golinval, JC
    Razeto, M
    IMAC - PROCEEDINGS OF THE 17TH INTERNATIONAL MODAL ANALYSIS CONFERENCE, VOLS I AND II, 1999, 3727 : 238 - 243
  • [5] Categorization of On-line Handwritten Documents
    Saldarriaga, Sebastian Pena
    Morin, Emmanuel
    Viard-Gaudin, Christian
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 95 - +
  • [6] FBA puts documents on-line
    Official Board Markets, 2001, 77 (40):
  • [7] Writing on-line mathematical documents
    Mauser, B.
    Magdic, A.
    Essert, M.
    Annals of DAAAM for 2004 & Proceedings of the 15th International DAAAM Symposium: INTELLIGNET MANUFACTURING & AUTOMATION: GLOBALISATION - TECHNOLOGY - MEN - NATURE, 2004, : 283 - 284
  • [8] On-line handwritten documents segmentation
    Blanchard, J
    Artières, T
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 148 - 153
  • [9] Theta-role assignment in on-line processing of a free word order language
    Stojanovic, D
    ANNUAL WORKSHOP ON FORMAL APPROACHES TO SLAVIC LINGUISTICS: THE CONNECTICUT MEETING 1997, 1998, 43 : 287 - 302
  • [10] Language identification in web documents using discrete HMMs
    Xafopoulos, A
    Kotropoulos, C
    Almpanidis, G
    Pitas, I
    PATTERN RECOGNITION, 2004, 37 (03) : 583 - 594