Temporal-Textual Retrieval: Time and Keyword Search in Web Documents

被引:0
|
作者
Khodaei, Ali [1 ]
Shahabi, Cyrus [1 ,3 ,4 ]
Khodaei, Amir [2 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90007 USA
[2] Univ Calif Berkeley, Elect Engn & Comp Sci Dept, Berkeley, CA 94720 USA
[3] Univ Southern Calif, Comp Sci & Elect Engn, Los Angeles, CA USA
[4] Univ Southern Calif, NSFs Integrated Media Syst Ctr IMSC, Los Angeles, CA USA
关键词
Web Search; Time-aware ranking; Indexing; Temporal information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., "Barack Obama 1980-1985". Unfortunately, not much work has been done by industry or academia to support this type of searches. To the best of our knowledge, the only way that some search engines exploit the time information in the user query is to filter out those resulting web pages whose publication/modification time are not within the queried time interval. In this paper, we propose a new indexing and ranking framework for temporal-textual retrieval. The framework leverages the classical vector space model and provides a complete scheme for indexing, query processing and ranking of the temporal-textual queries. We propose a variety of approaches to exploit popular keyword and temporal index structures. We present a novel hybrid index structure which indexes both the temporal and the textual aspects of the documents in a unified, integrated manner. We also study how to rank documents by seamlessly combining their temporal and textual features. We develop a new scoring schema called temporal tf-idf to compute the temporal relevance of a document to a query, and we combine this score with the textual relevance to compute the overall relevance score of the document to the query. We present both a cost model analysis and an extensive set of experiments over real-world datasets (New York Times Annotated Corpus and Freebase) to evaluate the proposed framework and demonstrate its efficiency and effectiveness.
引用
收藏
页码:288 / +
页数:25
相关论文
共 50 条
  • [31] SoulMate: Short-Text Author Linking Through Multi-Aspect Temporal-Textual Embedding
    Najafipour, Saeed
    Hosseini, Saeid
    Hua, Wen
    Kangavari, Mohammad Reza
    Zhou, Xiaofang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 448 - 461
  • [32] Use of textual and conceptual profiles for personalized retrieval of political documents
    Vicente-Lopez, Eduardo
    de Campos, Luis M.
    Fernandez-Luna, Juan M.
    Huete, Juan E.
    KNOWLEDGE-BASED SYSTEMS, 2016, 112 : 127 - 141
  • [33] Combination of Visual and Textual Similarity Retrieval from Medical Documents
    Eggel, Ivan
    Mueller, Henning
    MEDICAL INFORMATICS IN A UNITED AND HEALTHY EUROPE, 2009, 150 : 841 - 845
  • [34] Exploiting ID references for effective keyword search in XML documents
    Chen, Bo
    Lu, Jiaheng
    Ling, Tok Wang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2008, 4947 : 529 - +
  • [35] Efficient keyword index search over encrypted documents of groups
    Park, Hyun-A
    Lee, Dong Hoon
    Zhan, Justin
    Blosser, Gary
    ISI 2008: 2008 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2008, : 225 - +
  • [36] Temporal Web Image Retrieval
    Dias, Gael
    Moreno, Jose G.
    Jatowt, Adam
    Campos, Ricardo
    STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012, 2012, 7608 : 199 - 204
  • [37] Segmentation of Web Documents and Retrieval of Useful Passages
    Figuerola, Carlos G.
    Berrocal, Jose L. Alonso
    Rodriguez, Angel F. Zazo
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 732 - 736
  • [38] A New PSO Methodology for Web Documents Retrieval
    Ramya, C.
    Shreedhara, K. S.
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 852 - 856
  • [39] Intelligent support for information retrieval of web documents
    Koval, R
    Návrat, P
    COMPUTING AND INFORMATICS, 2002, 21 (05) : 509 - 528
  • [40] Efficient Textual Web Retrieval using Wavelet Tree
    Yadav, Arun Kumar
    Yadav, Divakar
    Prasad, Rajesh
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2016, 6 (04) : 16 - 29