Automatic term categorization by extracting knowledge from the Web

被引:0
|
作者
Rigutini, Leonardo [1 ]
Di Iorio, Ernesto [1 ]
Ernandes, Marco [1 ]
Maggini, Marco [1 ]
机构
[1] Univ Siena, Dipartimento Ingn Informaz, Via Roma 56, I-53100 Siena, Italy
来源
ECAI 2006, PROCEEDINGS | 2006年 / 141卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of semantic domains exploiting the knowledge available on-line in the Web. The proposed system can be effectively used for the automatic expansion of thesauri, limiting the human effort to the preparation of a small training set of tagged entities. The classification of terms is performed by modeling the contexts in which terms from the same class usually appear. The Web is exploited as a significant repository of contexts that are extracted by querying one or more search engines. In particular, it is shown how the required knowledge can be obtained directly from the snippets returned by the search engines without the overhead of document downloads. Since the Web is continuously updated "World Wide", this approach allows us to face the problem of open-domain term categorization handling both the geographical and temporal variability of term semantics. The performances attained by different text classifiers are compared, showing that the accuracy results are very good independently of the specific model, thus validating the idea of using term contexts extracted from search engine snippets. Moreover, the experimental results indicate that only very few training examples are needed to reach the best performance (over 90% for the F1 measure).
引用
收藏
页码:531 / +
页数:2
相关论文
共 50 条
  • [1] Extracting World Knowledge from the Web
    Yates, Alexander
    [J]. COMPUTER, 2009, 42 (06) : 94 - 97
  • [2] Extracting Knowledge from Web Data
    Ezzikouri, Hanane
    Fakir, Mohamed
    Daoui, Cherki
    Erritali, Mohamed
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2014, 7 (04) : 27 - 41
  • [3] Extracting spatial knowledge from the Web
    Morimoto, Y
    Aono, M
    Houle, ME
    McCurley, KS
    [J]. 2003 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2003, : 326 - 333
  • [4] Extracting knowledge from the World Wide Web
    Henzinger, M
    Lawrence, S
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 : 5186 - 5191
  • [5] Extracting focused knowledge from the semantic web
    Crow, L
    Shadbolt, N
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2001, 54 (01) : 155 - 184
  • [6] Managing knowledge on the Web - Extracting ontology from HTML']HTML Web
    Du, Timon C.
    Li, Feng
    King, Irwin
    [J]. DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 319 - 331
  • [7] Automatic knowledge retrieval from the web
    Skowron, M
    Araki, K
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2005, : 127 - 136
  • [8] Extracting Visual Knowledge from the Web with Multimodal Learning
    Gong, Dihong
    Wang, Daisy Zhe
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1718 - 1724
  • [9] Extracting Knowledge from Web Search Engine Results
    Kanavos, Andreas
    Theodoridis, Evangelos
    Tsakalidis, Athanasios
    [J]. 2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 860 - 867
  • [10] NEIL: Extracting Visual Knowledge from Web Data
    Chen, Xinlei
    Shrivastava, Abhinav
    Gupta, Abhinav
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 1409 - 1416