Learning to Rank from Noisy Data

被引:6
|
作者
Ding, Wenkui [1 ]
Geng, Xiubo [2 ]
Zhang, Xu-Dong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Yahoo Labs Beijing, Beijing, Peoples R China
关键词
Noisy data; robust learning;
D O I
10.1145/2576230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Learning from Noisy Pairwise Similarity and Unlabeled Data
    Wu, Songhua
    Liu, Tongliang
    Han, Bo
    Yu, Jun
    Niu, Gang
    Sugiyama, Masashi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [22] Reinforcement Learning for Relation Classification from Noisy Data
    Feng, Jun
    Huang, Minlie
    Zhao, Li
    Yang, Yang
    Zhu, Xiaoyan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5779 - 5786
  • [23] Learning MDL Logic Programs from Noisy Data
    Hocquette, Celine
    Niskanen, Andreas
    Jarvisalo, Matti
    Cropper, Andrew
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10553 - 10561
  • [24] Guest Editorial Learning From Noisy Multimedia Data
    Zhang, Jian
    Hanjalic, Alan
    Jain, Ramesh
    Hua, Xiansheng
    Satoh, Shin'ichi
    Yao, Yazhou
    Zeng, Dan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1247 - 1252
  • [25] An algorithm of wavelet network learning from noisy data
    Zhang, Zhiguo
    San, Ye
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 2746 - +
  • [26] Online Learning of Noisy Data
    Cesa-Bianchi, Nicolo
    Shalev-Shwartz, Shai
    Shamir, Ohad
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (12) : 7907 - 7931
  • [27] Learning to Rank from Distant Supervision: Exploiting Noisy Redundancy for Relational Entity Search
    Zhou, Mianwei
    Wang, Hongning
    Chang, Kevin Chen-Chuan
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 829 - 840
  • [28] Learning and Decision-Making from Rank Data
    Xia L.
    Synthesis Lectures on Artificial Intelligence and Machine Learning, 2019, 13 (01): : 1 - 159
  • [29] Learning Explanatory Rules from Noisy Data (Extended Abstract)
    Evans, Richard
    Grefenstette, Edward
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5598 - 5602
  • [30] Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data
    Fillippova, Katja
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 864 - 870