Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval

被引:0
|
作者
LaBouve, Eric [1 ]
Stanchev, Lubomir [1 ]
机构
[1] Calif Polytech State Univ San Luis Obispo, Dept Comp Sci & Software Engn, San Luis Obispo, CA 93407 USA
关键词
Semantic Analysis; Document Retrieval; Query Expansion; Term Proximity; Search; Okapi BM25;
D O I
10.1109/ICSC.2019.00034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document retrieval systems recover documents from a database and order them according to their perceived relevance to a user's search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user's literal query and a user's true intentions. The main goal of this study is to modify the Okapi BM25 document retrieval system to improve search results for textual queries and unstructured, textual corpora. This research hypothesizes that Okapi BM25 is not taking full advantage of the structure of text inside documents. This structure holds valuable semantic information that can be used to increase the model's accuracy. Modifications that account for a term's part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation set was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision.
引用
收藏
页码:150 / 153
页数:4
相关论文
共 50 条
  • [1] Query expansion and query reduction in document retrieval
    Zukerman, I
    Raskutti, B
    Wen, YY
    [J]. 15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 552 - 559
  • [2] Document expansion for speech retrieval
    Singhal, A
    Pereira, F
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 34 - 41
  • [3] Query expansion for document retrieval by mining additional query terms
    National Taiwan University of Science and Technology, Taiwan
    不详
    不详
    不详
    [J]. Int J Inf Manage Sci, 2008, 1 (17-30):
  • [4] Phonetic Query Expansion for Spoken Document Retrieval
    Mamou, Jonathan
    Ramabhadran, Bhuvana
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2106 - +
  • [5] Phonetic query expansion for spoken document retrieval
    Reyes-Barragan, Alejandro
    Villasenor-Pineda, Luis
    Montes-y-Gomez, Manuel
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 57 - 64
  • [6] THE LIMITATIONS OF TERM COOCCURRENCE DATA FOR QUERY EXPANSION IN DOCUMENT-RETRIEVAL SYSTEMS
    PEAT, HJ
    WILLETT, P
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1991, 42 (05): : 378 - 383
  • [7] A study of the effect of term proximity on query expansion
    Vechtomova, Olga
    Wang, Ying
    [J]. JOURNAL OF INFORMATION SCIENCE, 2006, 32 (04) : 324 - 333
  • [8] Effects of Query Expansion for Spoken Document Passage Retrieval
    Akiba, Tomoyosi
    Honda, Koichiro
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2148 - 2151
  • [9] A personalized query expansion approach for engineering document retrieval
    Hahm, Gyeong June
    Yi, Mun Yong
    Lee, Jae Hyun
    Suh, Hyo Won
    [J]. ADVANCED ENGINEERING INFORMATICS, 2014, 28 (04) : 344 - 359
  • [10] THE RETRIEVAL EFFECTS OF QUERY EXPANSION ON A FEEDBACK DOCUMENT-RETRIEVAL SYSTEM
    SMEATON, AF
    VANRIJSBERGEN, CJ
    [J]. COMPUTER JOURNAL, 1983, 26 (03): : 239 - 246