A layered approach for investigating the topological structure of communities in the Web

被引:12
|
作者
Thelwall, M [1 ]
机构
[1] Wolverhampton Univ, Sch Comp & Informat Technol, Wolverhampton WV1 1DJ, W Midlands, England
关键词
Internet; Web site classification; modelling; United Kingdom; information retrieval;
D O I
10.1108/00220410310485703
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A layered approach for identifying communities in the Web is presented and explored by applying the flake exact community identification algorithm to the UK academic Web. Although community or topic-identification is a common task in information retrieval, a new perspective is developed by: the application of alternative document models, shifting the focus from individual pages to aggregated collections based upon Web directories, domains and entire sites; the removal of internal site links; and the adaptation of a new fast algorithm to allow fully-automated community identification using all possible single starting points. The overall topology of the graphs in the three least-aggregated layers was first investigated and found to include a large number of isolated points but, surprisingly, with most of the remainder being in one huge connected component, exact Proportions varying by layer. The community identification process then found that the number of communities far exceeded the number of topological components, indicating that community identification is a potentially useful technique, even with random starting points. Both the number and size of communities identified was dependent on the parameter of the algorithm, with very different results being obtained in each case. In conclusion, the UK academic Web is embedded with layers of non-trivial communities and, if it is not unique in this, then there is the promise of improved results for information retrieval algorithms that can exploit this additional structure, and the application of the technique directly to partially automate Web metrics tasks such as that of finding all pages related to a given subject hosted by a single country's universities.
引用
收藏
页码:410 / 429
页数:20
相关论文
共 50 条
  • [1] A topological approach of the Web classification
    Ciobanu, Gabriel
    Rusu, Danut
    [J]. THEORETICAL ASPECTS OF COMPUTING - ICTAC 2006, 2006, 4281 : 80 - 92
  • [2] Investigating Similarity of Nodes' Attributes in Topological Based Communities
    Sharma, Rajesh
    Montesi, Danilo
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1253 - 1260
  • [3] Investigating Structure of Modern Web Traffic
    Kamiyama, Noriaki
    Nakano, Yuusuke
    Shiomoto, Kohei
    Hasegawa, Go
    Murata, Masayuki
    Miyahara, Hideo
    [J]. 2015 IEEE 16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2015, : 164 - 171
  • [4] Investigating the Evolution of Web API Cooperative Communities in the Mashup Ecosystem
    Qi, Qing
    Cao, Jian
    [J]. 2020 IEEE 13TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2020), 2020, : 413 - 417
  • [5] On topological structure of web services networks for composition
    [J]. Cherifi, C. (chantalbonner@gmail.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (08):
  • [6] Using a layered approach for interoperability on the Semantic Web
    Cruz, IF
    Xiao, HY
    [J]. FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, : 221 - 231
  • [7] A stochastic approach for modeling and computing web communities
    Greco, G
    Greco, S
    Zumpano, E
    [J]. WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, 2002, : 43 - 52
  • [8] An Approach to Investigating Proactive Knowledge Retention in OSS Communities
    Rashid, Mehvish
    Clarke, Paul M.
    O'Connor, Rory V.
    [J]. SYSTEMS, SOFTWARE AND SERVICES PROCESS IMPROVEMENT (EUROSPI 2018), 2018, 896 : 108 - 119
  • [9] LAYERED STRUCTURE APPROACH TO NETWORK ARCHITECTURE
    HOSHI, M
    SHIGEMATSU, N
    KITAMI, K
    [J]. WORLD PROSPERITY THROUGH COMMUNICATIONS, VOLS 1-3: CONFERENCE RECORD, 1989, : 341 - 345
  • [10] An approach to relate the web communities through bipartite graphs
    Reddy, PK
    Kitsuregawa, M
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 301 - 310