n-Gram-based indexing for Korean text retrieval

被引:8
|
作者
Lee, JH
Cho, HY
Park, HR
机构
[1] Soongsil Univ, Sch Comp, Dongjak Gu, Seoul 156743, South Korea
[2] Korea Adv Inst Sci & Technol, Korea Res & Dev Informat Ctr, Taejon 305600, South Korea
关键词
information retrieval; indexing method; Korean text; n-Gram;
D O I
10.1016/S0306-4573(98)00050-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two groups of indexing methods and morpheme-based indexing have been investigated in the literature of Korean text retrieval. The word-based indexing eliminates the suffix of a word, and generates its remaining stem as an index term. The index term is often a compound noun, which results in the serious decrease of retrieval effectiveness. The morpheme-based indexing overcomes the problem of compound nouns by decomposing a compound noun into simple nouns. It, however, requires a large dictionary and complex linguistic knowledge. In this paper we propose a new indexing method based on n-gram-based indexing is considerably faster than the morpheme-based indexing, and also provides better retrieval effectiveness. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:427 / 441
页数:15
相关论文
共 50 条
  • [31] Comparing neural- and N-gram-based language models for word segmentation
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2019, 70 (02) : 187 - 197
  • [32] A text based indexing system for mammographic image retrieval and classification
    Farruggia, Alfonso
    Magro, Rosario
    Vitabile, Salvatore
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 37 : 243 - 251
  • [33] Comparison and system combination of n-gram-based and syntax-based machine translation systems
    Khalilov, Maxim
    Fonollosa, Jose A. R.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 259 - 266
  • [34] STUDY OF N-GRAM-BASED REJECT CORRECTION METHOD FOR OPTICAL CHARACTER READER.
    Sugimura, Toshiaki
    Saito, Tamaki
    Denki Tsushin Kenkyujo kenkyu jitsuyoka hokoku, 1985, 34 (06): : 1029 - 1038
  • [35] Features Related to Patient Portal User Satisfaction: N-Gram-Based Analysis of Users' Feedback
    Al-Ramahi, Mohammad
    Wahbeh, Abdullah
    Noteboom, Cherie
    SIGMIS-CPR'18: PROCEEDINGS OF THE 2018 ACM SIGMIS CONFERENCE ON COMPUTERS AND PEOPLE RESEARCH, 2018, : 152 - 152
  • [36] Character N-Gram Tokenization for European Language Text Retrieval
    Paul McNamee
    James Mayfield
    Information Retrieval, 2004, 7 : 73 - 97
  • [37] The Operation Sequence ModelCombining N-Gram-Based and Phrase-Based Statistical Machine Translation
    Durrani, Nadir
    Schmid, Helmut
    Fraser, Alexander
    Koehn, Philipp
    Schuetze, Hinrich
    COMPUTATIONAL LINGUISTICS, 2015, 41 (02) : 185 - 214
  • [38] Character N-gram tokenization for European language text retrieval
    McNamee, P
    Mayfield, J
    INFORMATION RETRIEVAL, 2004, 7 (1-2): : 73 - 97
  • [39] Evaluation of N-Gram Conflation Approaches for Arabic Text Retrieval
    Ahmed, Farag
    Nuernberger, Andreas
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (07): : 1448 - 1465
  • [40] N-gram and local context analysis for Persian text retrieval
    Aleahmad, Abolfazl
    Hakimian, Parsia
    Mahdikhani, Farzad
    Oroumchian, Farhad
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 284 - 287