Enhancing search engine quality using concept-based text retrieval

被引:8
|
作者
Shehata, Shady [1 ]
Karray, Fakhri [1 ]
Kamel, Mohamed [1 ]
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
D O I
10.1109/WI.2007.132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the common techniques in text retrieval are based on the statistical analysis of a term either as a word or a phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying representation should indicate terms that capture the semantics of text. In this case, the representation can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based representation, called Conceptual Ontological Graph (COG), where a concept can be either a word or a phrase and totally dependent on the sentence semantics, is introduced. The aim of the proposed representation is to extract the most important terms in a sentence and a document with respect to the meaning of the text. The COG representation analyzes each term at both the sentence and the document levels. This is different from the classical approach of analyzing terms at the document level. First, the proposed representation denotes the terms which contribute to the sentence semantics. Then, each term is chosen based on its position within the COG representation. Lastly, the selected terms are associated to their documents as features for the purpose of indexing before text retrieval. The COG representation can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the key concepts that represent the sentence meaning. Large sets of experiments using the proposed COG representation on different datasets in text retrieval are conducted. Experimental results demonstrate the substantial enhancement of the text retrieval quality using the COG representation over the traditional techniques. The evaluation of results relies on two quality measures, the bpref and P(10). Both the quality measures improved when the newly developed COG representation is used to enhance the quality of the text retrieval results.
引用
收藏
页码:26 / 32
页数:7
相关论文
共 50 条
  • [1] An efficient concept-based retrieval model for enhancing text retrieval quality
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (02) : 411 - 434
  • [2] An efficient concept-based retrieval model for enhancing text retrieval quality
    Shady Shehata
    Fakhri Karray
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2013, 35 : 411 - 434
  • [3] Essie: A concept-based search engine for structured biomedical text
    Ide, Nicholas C.
    Loane, Russell F.
    Demner-Fushman, Dina
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (03) : 253 - 263
  • [4] Enhancing text clustering using concept-based mining model
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1043 - +
  • [5] A Concept-based Model for Enhancing Text Categorization
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 629 - 637
  • [6] Enhancing concept-based retrieval based on minimal term sets
    Alsaffar, AH
    Deogun, JS
    Raghavan, VV
    Sever, H
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2000, 14 (2-3) : 155 - 173
  • [7] Enhancing Concept-Based Retrieval Based on Minimal Term Sets
    A.H. Alsaffar
    J.S. Deogun
    V.V. Raghavan
    H. Sever
    [J]. Journal of Intelligent Information Systems, 2000, 14 : 155 - 173
  • [8] Using EuroWordNet in a concept-based approach to cross-language text retrieval
    Gonzalo, J
    Verdejo, F
    Chugur, I
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 1999, 13 (07) : 647 - 678
  • [10] Concept-based Web communities for Google™ search engine
    Tomiyama, T
    Ohgaya, R
    Shinmura, A
    Kawabata, T
    Takagi, T
    Nikravesh, M
    [J]. PROCEEDINGS OF THE 12TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1 AND 2, 2003, : 1122 - 1128