Iterative query selection for opaque search engines with pseudo relevance feedback

被引:0
|
作者
Reuben, Maor [1 ,2 ]
Elyashar, Aviad [1 ,3 ]
Puzis, Rami [1 ,2 ]
机构
[1] Telekom Innovat Labs, Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Ben Gurion, Israel
[3] Sami Shamoon Coll Engn, Dept Comp Sci, Beer Sheva, Israel
关键词
Query selection; Opaque search engine; Pseudo relevance feedback; Fake news;
D O I
10.1016/j.eswa.2022.117027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retrieving information from an online search engine is the first and most important step in many data mining tasks, such as fake news detection. Most of the search engines currently available on the web, including all social media platforms, are black-boxes (i.e., opaque) supporting short keyword queries. In these settings, it is challenging to retrieve all posts and comments discussing a particular news item automatically and on a large scale.In this paper, we propose a method for generating short keyword queries given a prototype document. The proposed iterative query selection (IQS) algorithm interacts with the opaque search engine to iteratively improve the query, by maximizing the number of relevant results retrieved. Our evaluation of IQS was performed on the Twitter TREC Microblog 2012 and TREC-COVID 2019 datasets and demonstrated the algorithm's superior performance compared to state-of-the-art. In addition, we implemented IQS algorithm to automatically collect a large-scale dataset for fake news detection task of about 70K true and fake news items. The dataset, which we have made publicly available to the research community, includes over 22M accounts and 61M tweets. We demonstrate the usefulness of the dataset for fake news detection task achieving state-of-the-art performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] Query Dependent Pseudo-Relevance Feedback based on Wikipedia
    Xu, Yang
    Jones, Gareth J. F.
    Wang, Bin
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 59 - 66
  • [12] Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback
    Yu, HongChien
    Xiong, Chenyan
    Callan, Jamie
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3592 - 3596
  • [13] Query exhaustivity, relevance feedback and search success in automatic and interactive query expansion
    Vakkari, P
    Jones, S
    MacFarlane, A
    Sormunen, E
    JOURNAL OF DOCUMENTATION, 2004, 60 (02) : 109 - 127
  • [14] Robust query-specific pseudo feedback document selection for query expansion
    Huang, Qiang
    Song, Dawei
    Miger, Stefan
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 547 - 554
  • [15] Social Book Search with Pseudo-Relevance Feedback
    Geng, Bin
    Zhou, Fang
    Qu, Jiao
    Zhang, Bo-Wen
    Cui, Xiao-Ping
    Yin, Xu-Cheng
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT II, 2014, 8835 : 203 - 211
  • [16] Deep Neural Network and Pseudo Relevance Feedback Based Query Expansion
    Shukla, Abhishek Kumar
    Das, Sujoy
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (02): : 3557 - 3570
  • [17] Improved biomedical term selection in pseudo relevance feedback
    Asim, Muhammad Nabeel
    Wasim, Muhammad
    Khan, Muhammad Usman Ghani
    Mahmood, Waqar
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [18] Pseudo-relevance feedback based query expansion using boosting algorithm
    Imran Rasheed
    Haider Banka
    Hamaid Mahmood Khan
    Artificial Intelligence Review, 2021, 54 : 6101 - 6124
  • [19] Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
    Li, Hang
    Zhuang, Shengyao
    Mourad, Ahmed
    Ma, Xueguang
    Lin, Jimmy
    Zuccon, Guido
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 599 - 612
  • [20] Pseudo-relevance feedback and statistical query expansion for web snippet generation
    Ko, Youngjoong
    An, Hongkuk
    Seo, Jungyun
    INFORMATION PROCESSING LETTERS, 2008, 109 (01) : 18 - 22