n-Gram-based indexing for Korean text retrieval

被引:8
|
作者
Lee, JH
Cho, HY
Park, HR
机构
[1] Soongsil Univ, Sch Comp, Dongjak Gu, Seoul 156743, South Korea
[2] Korea Adv Inst Sci & Technol, Korea Res & Dev Informat Ctr, Taejon 305600, South Korea
关键词
information retrieval; indexing method; Korean text; n-Gram;
D O I
10.1016/S0306-4573(98)00050-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two groups of indexing methods and morpheme-based indexing have been investigated in the literature of Korean text retrieval. The word-based indexing eliminates the suffix of a word, and generates its remaining stem as an index term. The index term is often a compound noun, which results in the serious decrease of retrieval effectiveness. The morpheme-based indexing overcomes the problem of compound nouns by decomposing a compound noun into simple nouns. It, however, requires a large dictionary and complex linguistic knowledge. In this paper we propose a new indexing method based on n-gram-based indexing is considerably faster than the morpheme-based indexing, and also provides better retrieval effectiveness. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:427 / 441
页数:15
相关论文
共 50 条
  • [21] Entropy-based indexing term selection for N-gram text search system
    Yamamoto, H
    Ohmi, S
    Tsuji, H
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 4852 - 4857
  • [22] Detecting Duplicate Bug Report Using Character N-Gram-Based Features
    Sureka, Ashish
    Jalote, Pankaj
    17TH ASIA PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2010), 2010, : 366 - 374
  • [23] ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes
    Brian R King
    Chittibabu Guda
    Genome Biology, 8
  • [24] Comparing n-gram-based functional categories in original versus translated texts
    Ebeling, Jarle
    Ebeling, Signe O.
    CORPORA, 2018, 13 (03) : 347 - 370
  • [25] Enhancing N-Gram-Based Summary Evaluation Using Information Content and a Taxonomy
    Kabadjov, Mijail
    Steinberger, Josef
    Steinberger, Ralf
    Poesio, Massimo
    Pouliquen, Bruno
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 662 - +
  • [26] Process-annotated service discovery facilitated by an n-gram-based index
    Mahleko, B
    Wombacher, A
    Fankhauser, P
    2005 IEEE INTERNATIONAL CONFERENCE ON E-TECHNOLOGY, E-COMMERCE AND E-SERVICE, PROCEEDINGS, 2005, : 2 - 8
  • [27] ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes
    King, Brian R.
    Guda, Chittibabu
    GENOME BIOLOGY, 2007, 8 (05)
  • [28] Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing
    Vishwakarma, Santosh K.
    Lakhtaria, Karna Ljit I.
    Bhatnagar, Divya
    Sharma, Akhilesh K.
    3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 815 - 820
  • [29] A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition
    Lin, Ching-Sheng
    Jwo, Jung-Sing
    Lee, Cheng-Hsiung
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [30] Region based image indexing and retrieval inspired by text search
    Amato, Giuseppe
    Magionami, Vanessa
    Savino, Pasquale
    14TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING WORKSHOPS, PROCEEDINGS, 2007, : 101 - 106