The Power of Anchor Text in the Neural Retrieval Era

被引:0
|
作者
Froebe, Maik [1 ]
Guenther, Sebastian [1 ]
Probst, Maximilian [1 ]
Potthast, Martin [2 ]
Hagen, Matthias [1 ]
机构
[1] Martin Luther Univ Halle Wittenberg, Halle, Germany
[2] Univ Leipzig, Leipzig, Germany
来源
关键词
Anchor text; MS MARCO; ORCAS; TREC Deep Learning track; INFORMATION-RETRIEVAL;
D O I
10.1007/978-3-030-99736-6_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the early days of web search, a study by Craswell et al. [11] showed that anchor texts are particularly helpful ranking features for navigational queries and a study by Eiron and McCurley [24] showed that anchor texts closely resemble the characteristics of queries and that retrieval against anchor texts yields more homogeneous results than against documents. In this reproducibility study, we analyze to what extent these observations still hold in the web search scenario of the current MS MARCO dataset, including the paradigm shift caused by pre-trained transformers. Our results show that anchor texts still are particularly helpful for navigational queries, but also that they only very roughly resemble the characteristics of queries and that they now yield less homogeneous results than the content of documents. As for retrieval effectiveness, we also evaluate anchor texts from different time frames and include modern baselines in a comparison on the TREC 2019 and 2020 Deep Learning tracks. Our code and the newly created Webis MS MARCO Anchor Texts 2022 datasets are freely available.
引用
收藏
页码:567 / 583
页数:17
相关论文
共 50 条
  • [1] Mining Anchor Text Trends for Retrieval
    Dai, Na
    Davison, Brian D.
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 127 - 139
  • [2] Using Anchor Text Refined by Page Importance to Improve Web Retrieval
    Zhang, Yonggang
    Lei, Kai
    Huang, Lian'en
    [J]. PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 1200 - 1203
  • [3] ZYINDEX - FULL TEXT RETRIEVAL POWER
    HOLLAND, MP
    [J]. ONLINE, 1985, 9 (04): : 38 - 42
  • [4] Neural text generation for query expansion in information retrieval
    Claveau, Vincent
    [J]. 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 202 - 209
  • [5] Online Text Retrieval Method Based on Convolution Neural Network
    Tu, Hong
    [J]. JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2024, 42 (1-3) : 159 - 177
  • [6] Neural Text Embeddings for Information Retrieval (WSDM 2017 Tutorial)
    Mitra, Bhaskar
    Craswell, Nick
    [J]. WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 813 - 814
  • [7] Augmenting the power of LSI in text retrieval: Singular value rescaling
    Yan, Hua
    Grosky, William I.
    Fotouhi, Farshad
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 65 (01) : 108 - 125
  • [8] Data Retrieval = Text Retrieval?
    Bugaje, Maryam
    Chowdhury, Gobinda
    [J]. TRANSFORMING DIGITAL WORLDS, ICONFERENCE 2018, 2018, 10766 : 253 - 262
  • [9] A unified cycle-consistent neural model for text and image retrieval
    Cornia, Marcella
    Baraldi, Lorenzo
    Tavakoli, Hamed R.
    Cucchiara, Rita
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 25697 - 25721
  • [10] A Study on Event-Oriented Text Retrieval with Deep Neural Network
    Zhao, Lin
    Li, Minglei
    Chen, Shufeng
    Chen, Yuxiang
    Wang, Ying
    Zhang, Yang
    [J]. 2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 386 - 392