FUZZY SEMANTIC MATCHING IN (SEMI-)STRUCTURED XML DOCUMENTS Indexation of Noisy Documents

被引:0
|
作者
Renard, Arnaud [1 ]
Calabretto, Sylvie [1 ]
Rumpler, Beatrice [1 ]
机构
[1] Univ Lyon, CNRS, UMR5205, INSA Lyon,LIRIS, F-69621 Villeurbanne, France
关键词
Information retrieval; (semi-)Structured documents; XML; Fuzzy semantic matching; Semantic resource; Thesaurus; Ontology; Error correction; OCR; SIMILARITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, semantics is one of the greatest challenges in IR systems evolution, as well as when it comes to (semi-)structured IR systems which are considered here. Usually, this challenge needs an additional external semantic resource related to the documents collection. In order to compare concepts and from a wider point of view to work with semantic resources, it is necessary to have semantic similarity measures. Similarity measures assume that concepts related to the terms have been identified without ambiguity. Therefore, misspelled terms interfere in term to concept matching process. So, existing semantic aware (semi )structured IR systems lay on basic concept identification but don't care about terms spelling uncertainty. We choose to deal with this last aspect and we suggest a way to detect and correct misspelled terms through a fuzzy semantic weighting formula which can be integrated in an IR system. In order to evaluate expected gains, we have developed a prototype which first results on small datasets seem interesting.
引用
收藏
页码:253 / 260
页数:8
相关论文
共 50 条
  • [1] Towards a Better Semantic Matching for Indexation Improvement of Error-Prone (Semi-)Structured XML Documents
    Renard, Arnaud
    Calabretto, Sylvie
    Rumpler, Beatrice
    [J]. WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2011, 75 : 286 - 298
  • [2] A Semantic Kernel for semi-structured documents
    Aseervatham, Sujeevan
    Viennet, Emmanuel
    Bennani, Younes
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 403 - 408
  • [3] Semantic annotation of semi-structured documents
    Ranganathan, Girish R.
    Biletskiy, Yevgen
    Kaltchenko, Alexey
    [J]. 2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 877 - +
  • [4] Semantic Clustering of XML Documents
    Tagarelli, Andrea
    Greco, Sergio
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (01)
  • [5] Semantic Search for XML Documents
    Song Ling
    Lv Qiangi
    Tang Xiaobing
    [J]. MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION, PTS 1 AND 2, 2011, 48-49 : 1028 - +
  • [6] Supporting Semantic Search on Heterogeneous Semi-structured Documents
    Mrabet, Yassine
    Bennacer, Nacera
    Pernelle, Nathalie
    Thiam, Mouhamadou
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2010, 6051 : 224 - +
  • [7] A semantic network approach to semi-structured documents repositories
    Christophides, V
    Dorr, M
    Fundulaki, I
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 1997, 1324 : 305 - 324
  • [8] An approach to semantic information retrieval in heterogeneous semi-structured documents
    Mrabet, Yassine
    Bennacer, Nacéra
    Pernelle, Nathalie
    Thiam, Mouhamadou
    [J]. CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 195 - 210
  • [9] Transformation rules from semi-structured XML documents to database model
    Badr, Y
    Sayah, M
    Laforest, F
    Flory, A
    [J]. ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 181 - 184
  • [10] Fuzzy Query Model for XML Documents
    Seto, Jeany
    Clement, Shane
    Duong, David
    Kianmehr, Keivan
    Alhajj, Reda
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, PROCEEDINGS, 2009, 5788 : 333 - 340