Indexing Temporal Information for Web Pages

被引:4
|
作者
Jin, Peiquan [1 ]
Chen, Hong [1 ]
Zhao, Xujian [1 ]
Li, Xiaowen [1 ]
Yue, Lihua [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
关键词
Web search; temporal-textual query; temporal information; index structure;
D O I
10.2298/CSIS100407025J
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.
引用
收藏
页码:711 / 737
页数:27
相关论文
共 50 条
  • [41] Concept-matching IR systems versus word-matching information retrieval systems:: Considering fuzzy interrelations for indexing Web pages
    Garcés, PJ
    Olivas, JA
    Romero, FP
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (04): : 564 - 576
  • [42] Webformer: Pre-training with Web Pages for Information Retrieval
    Guo, Yu
    Ma, Zhengyi
    Mao, Jiaxin
    Qian, Hongjin
    Zhang, Xinyu
    Jiang, Hao
    Cao, Zhao
    Dou, Zhicheng
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1502 - 1512
  • [43] Term frequency occurrences on web pages for textual information retrieval
    Sivapathasundaram, Karthika
    Cheng, Xiaochun
    Petridis, Miltos
    [J]. DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 585 - 590
  • [44] Relating Web Pages to Enable Information-Gathering Tasks
    Bagchi, Amitabha
    Lahoti, Garima
    [J]. 20TH ACM CONFERENCE ON HYPERTEXT AND HYPERMEDIA (HYPERTEXT 2009), 2009, : 109 - 118
  • [45] Differences in information processing from print ads and web pages
    Unni, R
    [J]. ADVANCES IN CONSUMER RESEARCH, VOLUME XXXI, 2004, 31 : 263 - 264
  • [46] Web information seeking by pages: an observational study of moving and stopping
    Kari, J
    [J]. INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2004, 9 (04):
  • [47] A color selection tool for the readability of textual information on Web pages
    Zuffia, Silvia
    Beretta, Giordano
    Brambilla, Carla
    [J]. INTERNET IMAGING VII, 2006, 6061
  • [48] Effectively finding relevant Web pages from linkage information
    Hou, JY
    Zhang, YC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 940 - 951
  • [49] Noise elimination from web pages for efficacious information retrieval
    Uma, R.
    Latha, B.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 14583 - 14602
  • [50] Automatic MEDLINE searching: Integrating medical information into Web pages
    Worel, S
    [J]. ECONTENT, 1999, 22 (04) : 38 - 42