Using Wikipedia as a reference for extracting semantic information from a text

被引:1
|
作者
Prato, Andrea [1 ]
Ronchetti, Marco [1 ]
机构
[1] Univ Trent, Dipartimento Ingn & Sci Informaz, Povo, Italy
关键词
Semantic analysis; clustering; multi-words; Wikipedia;
D O I
10.1109/SEMAPRO.2009.24
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm.
引用
收藏
页码:56 / 61
页数:6
相关论文
共 50 条
  • [1] Extracting Semantic Concept Relations from Wikipedia
    Arnold, Patrick
    Rahm, Erhard
    [J]. 4TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS, 2014,
  • [2] The system for extracting semantic information from natural language text
    Kuznetsov, IP
    Kozerenko, EB
    Charnine, MM
    [J]. MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 75 - 80
  • [3] Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
    Zesch, Torsten
    Mueller, Christof
    Gurevych, Iryna
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1646 - 1652
  • [4] An Empirical Research on Extracting Relations from Wikipedia Text
    Huang, Jin-Xia
    Ryu, Pum-Mo
    Choi, Key-Sun
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2008, 2008, 5326 : 241 - 249
  • [5] Extracting information from text
    Chai, JY
    Biermann, AW
    [J]. PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 202 - 206
  • [6] Semantic Enrichment of Text Representation with Wikipedia for Text Classification
    Yamakawa, Hiroki
    Peng, Jing
    Feldman, Anna
    [J]. IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [7] Extracting reference text from citation contexts
    Khalid, Afsheen
    Alam, Fakhri
    Ahmed, Imran
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (01): : 605 - 622
  • [8] Extracting reference text from citation contexts
    Afsheen Khalid
    Fakhri Alam
    Imran Ahmed
    [J]. Cluster Computing, 2018, 21 : 605 - 622
  • [9] Improving the Relevance of Search Engine Results by Using Semantic Information from Wikipedia
    Scheau, Cristina
    Rebedea, Traian
    Chiru, Costin
    Trausan-Matu, Stefan
    [J]. 9TH ROEDUNET IEEE INTERNATIONAL CONFERENCE, 2010, : 151 - 156
  • [10] Extracting Semantic Knowledge from Unstructured Text using Embedded Controlled Language
    Safwat, Hazem
    Gruzitis, Normunds
    Davis, Brian
    Enache, Ramona
    [J]. 2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 87 - 90