Indexing Temporal Information for Web Pages

被引:4
|
作者
Jin, Peiquan [1 ]
Chen, Hong [1 ]
Zhao, Xujian [1 ]
Li, Xiaowen [1 ]
Yue, Lihua [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
关键词
Web search; temporal-textual query; temporal information; index structure;
D O I
10.2298/CSIS100407025J
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.
引用
收藏
页码:711 / 737
页数:27
相关论文
共 50 条
  • [1] Automatic Identification of Temporal Information in Tourism Web Pages
    Weiser, Stephanie
    Laublet, Philippe
    Minel, Jean-Luc
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 127 - 131
  • [2] A method for indexing web pages using web bots
    Szymanski, BK
    Chung, MS
    [J]. 2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C1 - C6
  • [3] Indexing by Permeability in Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 70 - 73
  • [4] BT plus -tree: A New Index for Temporal Information in Web Pages
    Chen, Hong
    Li, Qiang
    Jin, Peiquan
    [J]. DATABASE THEORY AND APPLICATION, BIO-SCIENCE AND BIO-TECHNOLOGY, 2010, 118 : 68 - 78
  • [5] Indexing and querying segmented web pages: the BlockWeb Model
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (5-6): : 623 - 649
  • [6] Indexing and querying segmented web pages: the BlockWeb Model
    Emmanuel Bruno
    Nicolas Faessel
    Hervé Glotin
    Jacques Le Maitre
    Michel Scholl
    [J]. World Wide Web, 2011, 14 : 623 - 649
  • [7] ENHANCING TOPIC TRACKING FOR CHINESE NEWS WEB PAGES WITH TEMPORAL INFORMATION AND KEY WEB CONTEXTS
    Qiu, Jing
    Liao, Lejian
    Li, Peng
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (01): : 399 - 408
  • [8] Indexing multilingual information on the web
    Yip, CL
    Kao, B
    [J]. TWENTY-SECOND ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE - PROCEEDINGS, 1998, : 576 - 581
  • [9] TEMPORAL INFORMATION INDEXING MODEL
    Abramowicz, Witold
    Bassara, Andrzej
    [J]. ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : 387 - 390
  • [10] Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
    Urae, Hiroshi
    Tezuka, Taro
    Kimura, Fuminori
    Maeda, Akira
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 662 - 667