A set of novel HTML']HTML document quality features for Web information retrieval: Including applications to learning to rank for information retrieval

被引:0
|
作者
Aydin, Ahmet [1 ]
Arslan, Ahmet [1 ]
Dincer, Bekir Taner [2 ]
机构
[1] Eskisehir Tech Univ, Dept Comp Engn, TR-26555 Eskisehir, Turkiye
[2] Mugla Sıtkı Kocman Univ, Dept Comp Engn, TR-48000 Mugla, Turkiye
关键词
Information retrieval; Web search; Learning to rank; Machine learning; Search engines;
D O I
10.1016/j.eswa.2024.123177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The past work on Information Retrieval (IR) targeting web document collections shows that incorporating a measure that measures the quality of web documents, or rather the document prior (e.g., PageRank), into an IR system improves the retrieval effectiveness. In this study, we introduce new document priors and empirically investigate their effect by employing them as features in a learning to rank (LTR) deployment. The experiments are performed on the two standard Web IR test collections: the ClueWeb09 and the ClueWeb12 datasets, which include 500 and 733 million web documents, respectively, and the associated TREC & NTCIR query sets with a total number of 1,204 queries. A strong baseline is formed by using standard features introduced in the previous works, with respect to which the effect of newly introduced features in this paper is empirically compared. We test our features by LambdaMART, which is state-of-the-art LTR technique. The results reveal that the features introduced in this work led improvement in retrieval performance on the test collections in use. The introduced features are classified into 5 groups with respect to functional properties and each group is also analyzed in detail.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Learning to rank diversified results for biomedical information retrieval from multiple features
    Jiajin Wu
    Jimmy Xiangji Huang
    Zheng Ye
    [J]. BioMedical Engineering OnLine, 13
  • [22] Learning to rank diversified results for biomedical information retrieval from multiple features
    Wu, Jiajin
    Huang, Jimmy Xiangji
    Ye, Zheng
    [J]. BIOMEDICAL ENGINEERING ONLINE, 2014, 13
  • [23] On the quality of resources on the Web: An information retrieval perspective
    van Gils, B.
    Proper, H. A. Erik
    van Bommel, P.
    van der Weide, Th. P.
    [J]. INFORMATION SCIENCES, 2007, 177 (21) : 4566 - 4597
  • [24] HYBRID LEARNING FRAMEWORK FOR WEB INFORMATION RETRIEVAL
    Feng, Guang
    Lam, Kin-Man
    Zhang, Xu-Dong
    Wang, De-Sheng
    [J]. 2008 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 569 - +
  • [25] Learning to Rank for Information Retrieval and Natural Language Processing
    Li H.
    [J]. Synthesis Lectures on Human Language Technologies, 2011, 4 (01): : 1 - 115
  • [26] Learning to Rank for Information Retrieval and Natural Language Processing
    Candito, Marie
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2011, 52 (03): : 282 - 285
  • [27] Introduction to special issue on learning to rank for information retrieval
    Liu, Tie-Yan
    Joachims, Thorsten
    Li, Hang
    Zhai, Chengxiang
    [J]. INFORMATION RETRIEVAL, 2010, 13 (03): : 197 - 200
  • [28] Introduction to special issue on learning to rank for information retrieval
    Tie-Yan Liu
    Thorsten Joachims
    Hang Li
    Chengxiang Zhai
    [J]. Information Retrieval, 2010, 13 : 197 - 200
  • [29] Analysis of Adaptive Training for Learning to Rank in Information Retrieval
    Kuzi, Saar
    Labhishetty, Sahiti
    Santu, Shubhra Kanti Karmaker
    Joshi, Prasad Pradip
    Zhai, ChengXiang
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2325 - 2328
  • [30] ONLINE LEARNING TO RANK IN A LISTWISE APPROACH FOR INFORMATION RETRIEVAL
    Ma, Fan
    Yang, Haoyun
    Yin, Haibing
    Huang, Xiaofeng
    Yan, Chenggang
    Meng, Xiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1030 - 1035