n-Gram-based indexing for Korean text retrieval

被引:8
|
作者
Lee, JH
Cho, HY
Park, HR
机构
[1] Soongsil Univ, Sch Comp, Dongjak Gu, Seoul 156743, South Korea
[2] Korea Adv Inst Sci & Technol, Korea Res & Dev Informat Ctr, Taejon 305600, South Korea
关键词
information retrieval; indexing method; Korean text; n-Gram;
D O I
10.1016/S0306-4573(98)00050-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two groups of indexing methods and morpheme-based indexing have been investigated in the literature of Korean text retrieval. The word-based indexing eliminates the suffix of a word, and generates its remaining stem as an index term. The index term is often a compound noun, which results in the serious decrease of retrieval effectiveness. The morpheme-based indexing overcomes the problem of compound nouns by decomposing a compound noun into simple nouns. It, however, requires a large dictionary and complex linguistic knowledge. In this paper we propose a new indexing method based on n-gram-based indexing is considerably faster than the morpheme-based indexing, and also provides better retrieval effectiveness. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:427 / 441
页数:15
相关论文
共 50 条
  • [1] n-Gram-Based Text Compression
    Nguyen, Vu H.
    Nguyen, Hien T.
    Duong, Hieu N.
    Snasel, Vaclav
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016
  • [2] Research on N-Gram-Based Mongolian Information Retrieval Unit
    Yue Jun-ying
    Gao Guang-lai
    Lin Min
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTROMECHANICAL CONTROL TECHNOLOGY AND TRANSPORTATION, 2015, 41 : 439 - 445
  • [3] EVALUATION AND IMPLEMENTATION OF N-GRAM-BASED ALGORITHM FOR FAST TEXT COMPARISON
    Wielgosz, Maciej
    Szczepka, Pawel
    Russek, Pawel
    Jamro, Ernest
    Wiatr, Kazimierz
    Pietron, Marcin
    Zurek, Dominik
    COMPUTING AND INFORMATICS, 2017, 36 (04) : 887 - 907
  • [4] Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints
    Parapar, Javier
    Barreiro, Alvaro
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 645 - 653
  • [5] N-gram-based machine translation
    Marino, Jose B.
    Banchs, Rafael E.
    Crego, Josep M.
    de Gispert, Adria
    Lambert, Patrik
    Fonollosa, Jose A. R.
    Costa-jussa, Marta R.
    COMPUTATIONAL LINGUISTICS, 2006, 32 (04) : 527 - 549
  • [6] n-gram-based approach to composer recognition
    Wolkowicz, Jacek
    Kulka, Zbigniew
    Keselj, Vlado
    ARCHIVES OF ACOUSTICS, 2008, 33 (01) : 43 - 55
  • [7] Character contiguity in N-gram-based word matching:: the case for Arabic text searching
    Mustafa, SH
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (04) : 819 - 827
  • [8] Reordering experiments for N-gram-based SMT
    Crego, Josep M.
    Marino, Jose B.
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 242 - +
  • [9] An automatic indexing of compound words based on mutual information for Korean text retrieval
    Kim, PK
    LIBRARY AND INFORMATION SCIENCE, 1995, (34): : 29 - 38
  • [10] N-gram-based detection of new malicious code
    Abou-Assaleh, T
    Cercone, N
    Keselj, V
    Sweidan, R
    PROCEEDINGS OF THE 28TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATION CONFERENCE, WORKSHOP AND FAST ABSTRACTS, 2004, : 41 - 42