Temporal-Textual Retrieval: Time and Keyword Search in Web Documents

被引:0
|
作者
Khodaei, Ali [1 ]
Shahabi, Cyrus [1 ,3 ,4 ]
Khodaei, Amir [2 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90007 USA
[2] Univ Calif Berkeley, Elect Engn & Comp Sci Dept, Berkeley, CA 94720 USA
[3] Univ Southern Calif, Comp Sci & Elect Engn, Los Angeles, CA USA
[4] Univ Southern Calif, NSFs Integrated Media Syst Ctr IMSC, Los Angeles, CA USA
关键词
Web Search; Time-aware ranking; Indexing; Temporal information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., "Barack Obama 1980-1985". Unfortunately, not much work has been done by industry or academia to support this type of searches. To the best of our knowledge, the only way that some search engines exploit the time information in the user query is to filter out those resulting web pages whose publication/modification time are not within the queried time interval. In this paper, we propose a new indexing and ranking framework for temporal-textual retrieval. The framework leverages the classical vector space model and provides a complete scheme for indexing, query processing and ranking of the temporal-textual queries. We propose a variety of approaches to exploit popular keyword and temporal index structures. We present a novel hybrid index structure which indexes both the temporal and the textual aspects of the documents in a unified, integrated manner. We also study how to rank documents by seamlessly combining their temporal and textual features. We develop a new scoring schema called temporal tf-idf to compute the temporal relevance of a document to a query, and we combine this score with the textual relevance to compute the overall relevance score of the document to the query. We present both a cost model analysis and an extensive set of experiments over real-world datasets (New York Times Annotated Corpus and Freebase) to evaluate the proposed framework and demonstrate its efficiency and effectiveness.
引用
收藏
页码:288 / +
页数:25
相关论文
共 50 条
  • [1] Hybrid Index Structures for Temporal-Textual Web Search
    Jin, Peiquan
    Chen, Hong
    Lin, Sheng
    Zhao, Xujian
    Yue, Lihua
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 271 - 277
  • [2] Entity-based keyword search in web documents
    Sartori E.
    Velegrakis Y.
    Guerra F.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9630 : 21 - 49
  • [3] Indexing scheme for keyword search over semantic web documents
    Kim, YounHee
    Shin, HyeYeon
    Chong, KyunRak
    Lim, HaeChull
    9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 1205 - +
  • [4] Keyword search in handwritten documents
    Kolcz, A
    Alspector, J
    Augusteijn, M
    Carlson, R
    Popescu, GV
    PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON APPLICATIONS OF NEURAL NETWORKS TO TELECOMMUNICATIONS 3, 1997, 3 : 171 - 180
  • [5] A System for Keyword Search on Textual Streams
    Hristidis, Vagelis
    Valdivia, Oscar
    Vlachos, Michail
    Yu, Philip S.
    PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 503 - +
  • [6] An intelligent system for semantic information retrieval information from textual web documents
    Karthik, Mukundan
    Marikkannan, Mariappan
    Kannan, Arputharaj
    COMPUTATIONAL FORENSICS, PROCEEDINGS, 2008, 5158 : 135 - +
  • [8] Keyword Search over Web Documents Based on Earth Mover's Distance
    Ma, Jiangang
    Sheng, Quan Z.
    Yao, Lina
    Xu, Yong
    Shemshadi, Ali
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 256 - 265
  • [9] On retrieval performance of Malay textual documents
    Hamzah, MP
    Sembok, TMT
    Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, 2006, : 156 - 161
  • [10] Temporal and spatial attribute extraction from web documents and time-specific regional web search system
    Tezuka, T
    Tanaka, K
    WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, 2005, 3428 : 14 - 25