Structural analysis and grouping of Web pages

被引:0
|
作者
Kojima, Shuichi [1 ]
Takasu, Atsuhiro [2 ]
Adachi, Jun [2 ]
机构
[1] Graduate School of Engineering, University of Tokyo
[2] National Institute of Informatics
来源
NII Journal | 2002年 / 04期
关键词
Decision support systems - Groupware - Hierarchical systems - Information analysis - Information retrieval - Semantics - Structural analysis - Text processing - Virtual reality - Web browsers;
D O I
暂无
中图分类号
学科分类号
摘要
In order to easily cope with scattered information on the Web, we propose a method of grouping Web pages on a site. This method allows us to decompose the whole structure of a site by considering semantically related pages as one virtual document. In this paper, we describe the proposed grouping method of Web pages based on the link structure between pages without using similarity between documents which is utilized by traditional text categorization. We consider a Web page set as a Web graph and try to extract strongly connected components as groups. Next, because groups comprise hierarchy structure, we divide a strongly connected component to extract the hierarchy structure.
引用
收藏
页码:23 / 35
相关论文
共 50 条
  • [1] Rule-based structural analysis of Web pages
    Vitali, F
    Di Iorio, A
    Campori, EV
    [J]. DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 425 - 437
  • [2] Empirical Analysis of Grouping Web Pages Using Vector Space Model for Link Structures
    Sasaki, Yuichi
    Kurihara, Masahito
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 188 - 193
  • [3] Grouping web pages about persons and organizations for information extraction
    Ye, SR
    Chua, TS
    Liu, JM
    Kei, JR
    [J]. DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 241 - 251
  • [4] Towards a semantic web: methodological proposal for structural analysis of knowledge stored in web pages
    Pulido, J. R. G.
    Legrand, S.
    Acosta-Diaz, Ricardo
    Herrera-Morales, Roman
    Damian-Reyes, Pedro
    [J]. CISCI 2007: 6TA CONFERENCIA IBEROAMERICANA EN SISTEMAS, CIBERNETICA E INFORMATICA, MEMORIAS, VOL III, 2007, : 154 - +
  • [5] Identifying semantic blocks in Web pages using Gestalt laws of grouping
    Xu, Zhen
    Miller, James
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (05): : 957 - 978
  • [6] Identifying semantic blocks in Web pages using Gestalt laws of grouping
    Zhen Xu
    James Miller
    [J]. World Wide Web, 2016, 19 : 957 - 978
  • [7] Structural Analysis and Regular Expressions based Noise Elimination from Web Pages for Web Content Mining
    Dutta, Amit
    Paria, Sudipta
    Golui, Tanmoy
    Kole, Dipak K.
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1445 - 1451
  • [8] Exploiting Web Sites Structural and Content Features for Web Pages Clustering
    Lanotte, Pasqua Fabiana
    Fumarola, Fabio
    Malerba, Donato
    Ceci, Michelangelo
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 446 - 456
  • [9] Semantic analysis of web pages using web patterns
    Kudelka, Milos
    Snasel, Vaclav
    Lehecka, Ondrej
    E-Qawasmeh, Eyas
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 329 - +
  • [10] Librarians' personal Web pages: An analysis
    Haines, A
    [J]. COLLEGE & RESEARCH LIBRARIES, 1999, 60 (06): : 543 - 550