Querying and clustering web pages about persons and organizations

被引:0
|
作者
Ye, SR [1 ]
Chua, TS [1 ]
Kei, JR [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117543, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and non-unique. Thus, a single name may be mapped to several entities. The paper describes a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
引用
收藏
页码:344 / 350
页数:7
相关论文
共 50 条
  • [1] Clustering web pages about persons and organizations
    Ye, Shiren
    Chua, Tat-Seng
    Kei, Jeremy R.
    [J]. Web Intelligence and Agent Systems, 2005, 3 (04): : 203 - 216
  • [2] Grouping web pages about persons and organizations for information extraction
    Ye, SR
    Chua, TS
    Liu, JM
    Kei, JR
    [J]. DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 241 - 251
  • [3] Querying Web pages with lattice expressions
    Hsu, PY
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (01) : 156 - 164
  • [4] Indexing and querying segmented web pages: the BlockWeb Model
    Emmanuel Bruno
    Nicolas Faessel
    Hervé Glotin
    Jacques Le Maitre
    Michel Scholl
    [J]. World Wide Web, 2011, 14 : 623 - 649
  • [5] Indexing and querying segmented web pages: the BlockWeb Model
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (5-6): : 623 - 649
  • [6] Clustering Web pages based on their structure
    Crescenzi, V
    Merialdo, P
    Missier, P
    [J]. DATA & KNOWLEDGE ENGINEERING, 2005, 54 (03) : 279 - 299
  • [7] Clustering Web Pages into Hierarchical Categories
    Yao, Zhongmei
    Choi, Ben
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2007, 3 (02) : 17 - 35
  • [8] Block Clustering for Web Pages Categorization
    Charrad, Malika
    Lechevallier, Yves
    ben Ahmed, Mohamed
    Saporta, Gilbert
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, PROCEEDINGS, 2009, 5788 : 260 - +
  • [9] A Review on Web Pages Clustering Techniques
    Patel, Dipak
    Zaveri, Mukesh
    [J]. TRENDS IN NETWORKS AND COMMUNICATIONS, 2011, 197 : 700 - 710
  • [10] Web pages reordering and clustering based on web patterns
    Kudelka, Milos
    Snasel, Vaclav
    Lehecka, Ondrej
    El-Qawasmeh, Eyas
    Pokorny, Jaroslav
    [J]. SOFSEM 2008: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2008, 4910 : 731 - +