Extracting Topic Maps from Web Pages

被引:0
|
作者
Mase, Motohiro [1 ]
Yamada, Seiji [2 ]
Nitta, Katsumi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
Web information extraction; Topic Maps; clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the follwing two points to the existing clustering method: The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeing the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [1] Extracting Topic Maps from Web Pages by Web Link Structure and Content
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 1232 - +
  • [2] Extracting Topic Maps from Web histories by clustering with Web structure and contents
    Mase, Motohiro
    Yamada, Seiji
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS PROCEEDINGS, 2006, : 405 - +
  • [3] Semiautomatic extraction of topic maps from Web pages using clustering with web contents and structure
    Mase, Motohiro
    Yamada, Seiji
    Nitta, Katsumi
    [J]. PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 208 - +
  • [4] Extracting Templates from Web pages
    Manjula, R.
    Chilambuchelvan, A.
    [J]. 2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 788 - 791
  • [5] Adaptively extracting structured data from Web pages
    Guo, Yingnan
    Zhang, Jiajun
    Chen, Xing
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1524 - 1525
  • [6] Extracting Academic Information from Conference Web Pages
    Wang, Peng
    You, Yue
    Xu, Baowen
    Zhao, Jianyu
    [J]. 2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 952 - 959
  • [7] Finding and Extracting Data Records from Web Pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2010, 59 (01): : 123 - 137
  • [8] Finding and Extracting Data Records from Web Pages
    Manuel Álvarez
    Alberto Pan
    Juan Raposo
    Fernando Bellas
    Fidel Cacheda
    [J]. Journal of Signal Processing Systems, 2010, 59 : 123 - 137
  • [9] Extracting structured data from web pages (poster)
    Arasu, A
    Garcia-Molina, H
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 698 - 698
  • [10] Finding and extracting data records from web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2007, 4808 : 466 - 478