Learning to Rank from Noisy Data

被引:6
|
作者
Ding, Wenkui [1 ]
Geng, Xiubo [2 ]
Zhang, Xu-Dong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Yahoo Labs Beijing, Beijing, Peoples R China
关键词
Noisy data; robust learning;
D O I
10.1145/2576230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Learning to Continually Learn Rapidly from Few and Noisy Data
    Kuo, Nicholas I-Hsien
    Harandi, Mehrtash
    Fourrier, Nicolas
    Walder, Christian
    Ferraro, Gabriela
    Suominen, Hanna
    AAAI WORKSHOP ON META-LEARNING AND METADL CHALLENGE, VOL 140, 2021, 140 : 65 - 76
  • [32] Trade-offs in learning controllers from noisy data
    Bisoffi, Andrea
    De Persis, Claudio
    Tesi, Pietro
    SYSTEMS & CONTROL LETTERS, 2021, 154
  • [33] Learning Causal Estimates of Linear Operators From Noisy Data
    Cacace, Filippo
    Germani, Alfredo
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 3902 - 3914
  • [34] Learning nonparametric ordinary differential equations from noisy data
    Lahouel, Kamel
    Wells, Michael
    Rielly, Victor
    Lew, Ethan
    Lovitz, David
    Jedynak, Bruno M.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 507
  • [35] Convergence Rates for Learning Linear Operators from Noisy Data
    de Hoop, Maarten V.
    Kovachki, Nikola B.
    Nelsen, Nicholas H.
    Stuart, Andrew M.
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2023, 11 (02): : 480 - 513
  • [36] Learning from Massive Noisy Labeled Data for Image Classification
    Xiao, Tong
    Xia, Tian
    Yang, Yi
    Huang, Chang
    Wang, Xiaogang
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2691 - 2699
  • [37] DC Proposal: Ontology Learning from Noisy Linked Data
    Zhu, Man
    SEMANTIC WEB - ISWC 2011, PT II, 2011, 7032 : 373 - 380
  • [38] Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
    Napierala, Krystyna
    Stefanowski, Jerzy
    Wilk, Szymon
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 158 - 167
  • [39] Learning of networked spreading models from noisy and incomplete data
    Wilinski, Mateusz
    Lokhov, Andrey Y.
    PHYSICAL REVIEW E, 2024, 110 (05)
  • [40] SCAN: Learning Speaker Identity From Noisy Sensor Data
    Lu, Chris Xiaoxuan
    Wen, Hongkai
    Wang, Sen
    Markham, Andrew
    Trigoni, Niki
    2017 16TH ACM/IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN), 2017, : 67 - 78