Querying and clustering web pages about persons and organizations

被引:0
|
作者
Ye, SR [1 ]
Chua, TS [1 ]
Kei, JR [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several entities. The paper describes a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
引用
收藏
页码:344 / 350
页数:7
相关论文
共 50 条
  • [41] A method for pinpoint clustering of web pages with pseudo-clique search
    Haraguchi, M
    Okubo, Y
    [J]. FEDERATION OVER THE WEB, 2006, 3847 : 59 - 78
  • [42] Clustering Method based on Fuzzy Multisets for Web Pages and Customer Segments
    Wang, Suozhu
    Xu, Chunjie
    Wu, Rui
    [J]. ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 2, 2009, : 125 - +
  • [43] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [44] Arabic web pages clustering and annotation using semantic class features
    Alghamdi, Hanan M.
    Selamat, Ali
    Karim, Nor Shahriza Abdul
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 388 - 397
  • [45] A Novel Method for Clustering Web Search Results with Wikipedia Disambiguation Pages
    Huang, Zhi
    Niu, Zhendong
    Liu, Donglei
    Niu, Wenjuan
    Wang, Wei
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015, 2015, 9052 : 3 - 16
  • [46] Web navigation patterns mining based on clustering of paths and pages content
    Gang, F
    Ma, GS
    Jing, H
    [J]. ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 857 - 860
  • [47] Identifying similar pages in Web applications using a competitive clustering algorithm
    De Lucia, Andrea
    Scanniello, Giuseppe
    Tortora, Genoveffa
    [J]. JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2007, 19 (05): : 281 - 296
  • [48] Using clustering to support the migration from static to dynamic web pages
    Ricca, F
    Tonella, P
    [J]. IWPC 2003: 11TH IEEE INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, 2003, : 207 - 216
  • [49] Improving density-based methods for hierarchical clustering of web pages
    Chehreghani, Morteza Haghir
    Abolhassani, Hassan
    Chehreghani, Mostafa Haghir
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 67 (01) : 30 - 50
  • [50] Scanpath Trend Analysis on Web Pages: Clustering Eye Tracking Scanpaths
    Eraslan, Sukru
    Yesilada, Yeliz
    Harper, Simon
    [J]. ACM TRANSACTIONS ON THE WEB, 2016, 10 (04)