Using Topic Identification in Chinese Information Retrieval

被引:0
|
作者
Yeh, Ching-Long [1 ]
Chen, Yi-Chun [1 ]
机构
[1] Tatung Univ, Dept Comp Sci & Engn, Taipei, Taiwan
来源
JOURNAL OF INTERNET TECHNOLOGY | 2009年 / 10卷 / 02期
关键词
Natural Language Processing; Shallow Parsing; Topic Identification; Information Retrieval;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval is to identify documents, from text collections, which are relevant with respect to some query. In current information retrieval systems, users can query with an unordered set of keywords, a question or a sentence. A list of document links matching the query can be retrieved and ordered by relevancy between the query and the documents. In this article, we are concerned with a hypothesis that the discourse-level element, topic, could be used to contribute the calculations of information retrieval. Due to the phenomenon of zero anaphora frequently occurring in Chinese texts, the topics may be omitted and are not expressed on the surface text. The key elements of the centering model of local discourse coherence are employed to extract structures of discourse segments. We propose a topic identification method using the local discourse structure to recover the omissions of topics and identify the topics of documents in the text collection. Then the topic information is inserted into the text for creating better indices. The experiment results are demonstrated on a test collection which is taken from Chinese Information Retrieval Benchmark, version 3.0.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 50 条
  • [41] Protein Identification as an Information Retrieval Problem
    Yang, Yiming
    Ganapathy, Subramaniam
    Harpale, Abhay
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 752 - 753
  • [42] Soft approaches to information retrieval and information access on the Web: An introduction to the special topic section
    Herrera-Viedma, E
    Pasi, G
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (04): : 511 - 514
  • [43] Information retrieval based writer identification
    Bensefia, A
    Paquet, T
    Heutte, L
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 946 - 950
  • [44] Automatic information retrieval of Chinese business card
    Shi, GS
    Pan, W
    Jin, JM
    DOCUMENT RECOGNITION AND RETRIEVAL X, 2003, 5010 : 241 - 248
  • [45] Employing multiple representations for Chinese information retrieval
    Computer Science Department, Queens College, City University of New York, Flushing, NY 11367, United States
    J. Am. Soc. Inf. Sci., 8 (709-723):
  • [46] Employing multiple representations for Chinese information retrieval
    Kwok, KL
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1999, 50 (08): : 709 - 723
  • [47] Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
    Anh Tuan Nguyen
    Tung Thanh Nguyen
    Nguyen, Tien N.
    Lo, David
    Sun, Chengnian
    2012 PROCEEDINGS OF THE 27TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2012, : 70 - 79
  • [48] Semantic Information Retrieval based on Topic Modeling and Community Interests Mining
    Rajapaksha, Minuri
    Silva, Thushari
    2019 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) / 5TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2019, : 60 - 65
  • [49] Query performance prediction for information retrieval based on covering topic score
    Lang, Hao
    Wang, Bin
    Jones, Gareth
    Li, Jin-Tao
    Ding, Fan
    Liu, Yi-Xuan
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2008, 23 (04) : 590 - 601
  • [50] Query Performance Prediction for Information Retrieval Based on Covering Topic Score
    Hao Lang
    Bin Wang
    Gareth Jones
    Jin-Tao Li
    Fan Ding
    Yi-Xuan Liu
    Journal of Computer Science and Technology, 2008, 23 : 590 - 601