Extracting Topic Maps from Web Pages

被引:0
|
作者
Mase, Motohiro [1 ]
Yamada, Seiji [2 ]
Nitta, Katsumi [1 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Natl Inst Informat, Tokyo, Japan
来源
关键词
Web information extraction; Topic Maps; clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework to extract topic maps from a set of Web pages. We use the clustering method with the Web pages and extract the topic map prototypes. We introduced the follwing two points to the existing clustering method: The relevance is based on the types of links with directories in Web sites structure and the distance between the directories in which the pages are located. We generate the topic map prototypes from the results of the clustering. Finally, users complete the prototype by labeing the topics and associations and removing the unnecessary items. For this paper, at the first step, we mounted the proposed clustering method and extracted the prototype with the method.
引用
收藏
页码:169 / +
页数:3
相关论文
共 50 条
  • [31] System for analyzing topic-specific Web pages
    Song, Ju-Ping
    Wang, Yong-Cheng
    Yin, Zhong-Hang
    Teng, Wei
    [J]. Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2003, 37 (03): : 401 - 403
  • [32] Detecting Off-Topic Pages in Web Archives
    AlNoamany, Yasmin
    Weigle, Michele C.
    Nelson, Michael L.
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2015, 9316 : 225 - 237
  • [33] On building maps of web pages with a cellular automaton
    Azzag, H.
    Ratsimba, D.
    Da Costa, D.
    Guinot, C.
    Venturini, G.
    [J]. BIOLOGICALLY INSPIRED COOPERATIVE COMPUTING, 2006, 216 : 33 - +
  • [34] A Rule Based DFA Driven Information Extractor for Content Extracting from Web Pages
    Liu, Jin
    Chu, Danliang
    Song, Junjie
    Zhong, Bei
    Cai, Biqi
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 482 - 488
  • [35] Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model
    Mengel, Susan
    Jing, Yaoquin
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2009, PROCEEDINGS, 2009, 5802 : 219 - 226
  • [36] Discovery of semantic relationships among Web pages based on Web topic structures
    Matsukura, T
    Kondo, H
    Hirata, Y
    Tanaka, K
    [J]. SEMANTIC ISSUES IN E-COMMERCE SYSTEMS, 2003, 111 : 171 - 185
  • [37] Extracting Topics Information from Conference Web Pages using Page Segmentation and SVM
    Chen, Yaw-Huei
    Li, Sin-Sian
    Chen, Yu-Ta
    [J]. INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 270 - 277
  • [38] Software agents for extracting, aggregating and updating data from web pages of genomic databanks
    Stella, A
    Masseroli, M
    Alcalay, M
    Pinciroli, F
    [J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 1171 - 1171
  • [39] An open platform for collecting domain specific web pages and extracting information from them
    Karkaletsis, V
    Spyropoulos, CD
    [J]. Knowledge Mining, 2005, 185 : 147 - 157
  • [40] BEYOND RANKED LISTS IN WEB SEARCH: AGGREGATING WEB CONTENT INTO TOPIC PAGES
    Balasubramanian, Niranjan
    Cucerzan, Silviu
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2010, 4 (04) : 509 - 534