BAYESIAN RETRIEVAL USING A SIMILARITY-BASED LEMMATIZER

被引:0
|
作者
Maragoudakis, Manolis [1 ]
Lyras, Dimitrios P. [2 ]
Sgarbas, Kyriakos [2 ]
机构
[1] Univ Aegean, Dept Informat & Commun Syst Engn, Samos, Greece
[2] Univ Patras, Dept Elect & Comp Engn, Wire Commun Lab, Artificial Intelligence Grp, GR-26500 Patras, Greece
关键词
Bayesian networks; modern Greek; AhR; Ad-hoc retrieval; lemmatization; AUTOMATIC LEMMATIZATION; INFORMATION-RETRIEVAL; MODERN GREEK; MODEL;
D O I
10.1142/S0218213012500248
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present paper describes a Bayesian network approach to Information Retrieval (IR) from Web documents. The network structure provides an intuitive representation of uncertainty relationships and the embedded conditional probability table is used by inference algorithms in an attempt to identify documents that are relevant to the user's needs, expressed in the form of Boolean queries. Our research has been directed in constructing a probabilistic IR framework that focus on assisting users to perform Ad-hoc retrieval of documents from the various domains such as economics, news, sports, etc. Furthermore, users can integrate feedback regarding the relevance of the retrieved documents in an attempt to improve performance on upcoming requests. Towards these goals, we have expanded the traditional Bayesian network IR system and tested it on several Greek web corpora on different application domains. We have developed two different approaches with regards to the structure: a simple one, where the structure is manually provided, and an automated one, where data mining is used in order to extract the network's structure. Results have depicted competitive performance against successful IR models of different theoretical backgrounds, such as the vector space utilizing tf-idf and the probabilistic model of BM25 in terms of precision-recall curves. In order to further improve the performance of the IR system, we have implemented a novel similarity-based lemmatization framework, reducing thus the ambiguity posed by the plethora of morphological variations of the languages in question. The employed lemmatization framework comprises of 3 core components (i.e. the word segregation, the data cleansing and the lemmatization modules) and is language-independent (i.e. can be applied to other languages with morphological peculiarities and thus improve Ad-hoc retrieval) since it achieves the mapping of an input word to its normalized form by employing two state-of-the-art language independent distance metric models, meaning the Levenshtein Edit distance and the Dice coefficient similarity measure, combined with a language model describing the most frequent inflectional suffixes of the examined language. Experimental results support our claim on the significance of this incorporation to Greek texts web retrieval as results improve by a factor of 4% to 11%.
引用
下载
收藏
页数:32
相关论文
共 50 条
  • [1] Similarity-based retrieval of images using color histograms
    Chen, KS
    Demko, S
    Xie, RF
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VII, 1998, 3656 : 643 - 652
  • [2] Case-based classification using similarity-based retrieval
    Jurisica, I
    Glasgow, J
    EIGHTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1996, : 410 - 419
  • [3] Similarity-based queries for information retrieval
    Aguilera, AI
    Subero, AR
    Tineo, LJ
    DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2001, 1966 : 148 - 156
  • [4] Similarity-based virtual screening using bayesian inference network
    A Abdo
    N Salim
    Chemistry Central Journal, 3 (Suppl 1)
  • [5] Surface similarity-based retrieval: in default or by default?
    Raynal, Lucas
    Sander, Emmanuel
    Clement, Evelyne
    ANNEE PSYCHOLOGIQUE, 2024, 124 (01): : 137 - 158
  • [6] Similarity-based Distant Supervision for Definition Retrieval
    Jiang, Jiepu
    Allan, James
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 527 - 536
  • [7] MAC/FAC - A MODEL FOR SIMILARITY-BASED RETRIEVAL
    FORBUS, KD
    GENTNER, D
    LAW, K
    COGNITIVE SCIENCE, 1995, 19 (02) : 141 - 205
  • [8] Code Tagging and Similarity-based Retrieval with myCBR
    Roth-Berghofer, Thomas R.
    Bahls, Daniel
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXV, 2009, : 19 - +
  • [9] Similarity-based object retrieval using appearance and geometric feature combination
    Borras, Agnes
    Llados, Josep
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2007, 4478 : 33 - +
  • [10] Surface similarity-based molecular query-retrieval
    Rahul Singh
    BMC Cell Biology, 8