Indexing Temporal Information for Web Pages

被引:4
|
作者
Jin, Peiquan [1 ]
Chen, Hong [1 ]
Zhao, Xujian [1 ]
Li, Xiaowen [1 ]
Yue, Lihua [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
关键词
Web search; temporal-textual query; temporal information; index structure;
D O I
10.2298/CSIS100407025J
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.
引用
收藏
页码:711 / 737
页数:27
相关论文
共 50 条
  • [21] Web Clustering based on the Information of Sibling Pages
    Lu, Caimei
    Zhang, Xiaodan
    Park, Jung-ran
    Hu, Xiaohua
    He, Tingting
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 480 - +
  • [22] Identifying Information Sender Configuration of Web Pages
    Kato, Yoshikiyo
    Kawahara, Daisuke
    Inui, Kentaro
    Kurohashi, Sadao
    Shibata, Tomohide
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2009, : 335 - 340
  • [23] Renarrating Web Pages for Improving Information Accessibility
    Prasad, Gollapudi V. R. J. Sai
    Soumya, Maddala Saraswati
    Choppella, Venkatesh
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,
  • [24] Temporal pre-fetching of dynamic web pages
    Lam, KY
    Ngan, CCH
    [J]. INFORMATION SYSTEMS, 2006, 31 (03) : 149 - 169
  • [25] Clustering algorithms and latent semantic indexing to identify similar pages in web applications
    De Lucia, Andrea
    Risi, Michele
    Tortora, Genoveffa
    Scanniello, Giuseppe
    [J]. WSE 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON WEB SITE EVOLUTION, PROCEEDINGS, 2007, : 65 - +
  • [26] Language model based temporal information indexing
    Bassara, Andrzej
    [J]. BUSINESS INFORMATION SYSTEMS, 2008, 7 : 24 - 35
  • [27] Page Ranking Algorithms in Web Mining, Limitations of Existing methods and a New Method for Indexing Web Pages
    Jain, Ashish
    Sharma, Rajeev
    Dixit, Gireesh
    Tomar, Varsha
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 640 - 645
  • [28] Study of Extraction for Web Pages Information Based on XML
    Li, Suming
    [J]. PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 829 - 832
  • [29] Content Information Extraction of Theme Web Pages based on Tag Information
    Wang, Jie
    Wu, Jian
    Zhang, Yafeng
    He, Guowan
    [J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 501 - 504
  • [30] Research on Interactive Information Visualization Design in Web Pages
    Jiang, Ya-qi
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELING, SIMULATION AND APPLIED MATHEMATICS (CMSAM 2016), 2016, : 246 - 251