FUZZY SEMANTIC MATCHING IN (SEMI-)STRUCTURED XML DOCUMENTS Indexation of Noisy Documents

被引:0
|
作者
Renard, Arnaud [1 ]
Calabretto, Sylvie [1 ]
Rumpler, Beatrice [1 ]
机构
[1] Univ Lyon, CNRS, UMR5205, INSA Lyon,LIRIS, F-69621 Villeurbanne, France
关键词
Information retrieval; (semi-)Structured documents; XML; Fuzzy semantic matching; Semantic resource; Thesaurus; Ontology; Error correction; OCR; SIMILARITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, semantics is one of the greatest challenges in IR systems evolution, as well as when it comes to (semi-)structured IR systems which are considered here. Usually, this challenge needs an additional external semantic resource related to the documents collection. In order to compare concepts and from a wider point of view to work with semantic resources, it is necessary to have semantic similarity measures. Similarity measures assume that concepts related to the terms have been identified without ambiguity. Therefore, misspelled terms interfere in term to concept matching process. So, existing semantic aware (semi )structured IR systems lay on basic concept identification but don't care about terms spelling uncertainty. We choose to deal with this last aspect and we suggest a way to detect and correct misspelled terms through a fuzzy semantic weighting formula which can be integrated in an IR system. In order to evaluate expected gains, we have developed a prototype which first results on small datasets seem interesting.
引用
收藏
页码:253 / 260
页数:8
相关论文
共 50 条
  • [21] On efficient matching of streaming XML documents and queries
    Lakshmanan, LVS
    Parthasarathy, S
    [J]. ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 142 - 160
  • [22] Online Dictionary Matching for Streams of XML Documents
    Silvasti, Panu
    Sippu, Seppo
    Soisalon-Soininen, Eljas
    [J]. THEORETICAL COMPUTER SCIENCE, 2010, 323 : 153 - +
  • [23] A Deep and Uniform Model for Semantic Annotation of Semi Structured Documents Based on SHIRl
    Thiam, Mouhamadou
    [J]. 2016 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2016,
  • [24] Evaluating fuzzy association rules on XML documents
    Combi, C
    Oliboni, B
    Rossato, R
    [J]. Computational Intelligence, Theory and Applications, 2005, : 435 - 448
  • [25] Consistencies of fuzzy spatiotemporal data in XML documents
    Ma, Zongmin
    Bai, Luyi
    Ishikawa, Yoshiharu
    Yan, Li
    [J]. FUZZY SETS AND SYSTEMS, 2018, 343 : 97 - 125
  • [26] Fast updatable indexing scheme for structured XML documents
    Kim, SW
    Lee, J
    Lim, HC
    [J]. WEB AND COMMUNICATION TECHNOLOGIES AND INTERNET-RELATED SOCIAL ISSUES - HSI 2003, 2003, 2713 : 207 - 217
  • [27] Semantic Structural Similarity Measure for Clustering XML Documents
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Zhang, Dongmei
    Wang, Zhen
    [J]. WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 232 - +
  • [28] A system for converting PDF documents into structured XML format
    Déjean, H
    Meunier, JL
    [J]. DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 129 - 140
  • [29] Multi-Viewpoints Semantic Annotation of XML Documents
    Ouahiba, Djama
    Zizette, Boufaida
    [J]. WORLD CONGRESS ON ENGINEERING - WCE 2013, VOL I, 2013, : 390 - +
  • [30] Clustering Algorithm Based on Semantic Distance for XML Documents
    Yang, Lingxian
    Gu, Jinguang
    Chen, Heping
    [J]. FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 549 - +