Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing

被引:49
|
作者
Sun, Chong [1 ]
Rampalli, Narasimhan [1 ]
Yang, Frank [1 ]
Doan, Anhai [1 ,2 ]
机构
[1] WalmartLabs, Mountain View, CA 94040 USA
[2] Univ Wisconsin Madison, Madison, WI 53706 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 7卷 / 13期
关键词
D O I
10.14778/2733004.2733024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale classification is an increasingly critical Big Data problem. So far, however, very little has been published on how this is done in practice. In this paper we describe Chimera, our solution to classify tens of millions of products into 5000+ product types at WalmartLabs. We show that at this scale, many conventional assumptions regarding learning and crowdsourcing break down, and that existing solutions cease to work. We describe how Chimera employs a combination of learning, rules (created by in-house analysts), and crowdsourcing to achieve accurate, continuously improving, and cost-effective classification. We discuss a set of lessons learned for other similar Big Data systems. In particular, we argue that at large scales crowdsourcing is critical, but must be used in combination with learning, rules, and in-house analysts. We also argue that using rules (in conjunction with learning) is a must, and that more research attention should be paid to helping analysts create and manage (tens of thousands of) rules more effectively.
引用
收藏
页码:1529 / 1540
页数:12
相关论文
共 50 条
  • [21] Large-scale Machine Learning over Graphs
    Yang, Yiming
    [J]. PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [22] Coding for Large-Scale Distributed Machine Learning
    Xiao, Ming
    Skoglund, Mikael
    [J]. ENTROPY, 2022, 24 (09)
  • [23] Large-Scale Machine Learning and Neuroimaging in Psychiatry
    Thompson, Paul
    [J]. BIOLOGICAL PSYCHIATRY, 2018, 83 (09) : S51 - S51
  • [24] Resource Elasticity for Large-Scale Machine Learning
    Huang, Botong
    Boehm, Matthias
    Tian, Yuanyuan
    Reinwald, Berthold
    Tatikonda, Shirish
    Reiss, Frederick R.
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 137 - 152
  • [25] Optimization Methods for Large-Scale Machine Learning
    Bottou, Leon
    Curtis, Frank E.
    Nocedal, Jorge
    [J]. SIAM REVIEW, 2018, 60 (02) : 223 - 311
  • [26] TensorFlow: A system for large-scale machine learning
    Abadi, Martin
    Barham, Paul
    Chen, Jianmin
    Chen, Zhifeng
    Davis, Andy
    Dean, Jeffrey
    Devin, Matthieu
    Ghemawat, Sanjay
    Irving, Geoffrey
    Isard, Michael
    Kudlur, Manjunath
    Levenberg, Josh
    Monga, Rajat
    Moore, Sherry
    Murray, Derek G.
    Steiner, Benoit
    Tucker, Paul
    Vasudevan, Vijay
    Warden, Pete
    Wicke, Martin
    Yu, Yuan
    Zheng, Xiaoqiang
    [J]. PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 265 - 283
  • [27] Learning Taxonomy Adaptation in Large-scale Classification
    Babbar, Rohit
    Partalas, Ioannis
    Gaussier, Eric
    Amini, Massih-Reza
    Amblard, Cecile
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [28] Detecting Anomaly in Large-scale Network using Mobile Crowdsourcing
    Li, Yang
    Sun, Jiachen
    Huang, Wenguang
    Tian, Xiaohua
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 2179 - 2187
  • [29] LINEX Support Vector Machine for Large-Scale Classification
    Ma, Yue
    Zhang, Qin
    Li, Dewei
    Tian, Yingjie
    [J]. IEEE ACCESS, 2019, 7 : 70319 - 70331
  • [30] Using supervised machine learning for large-scale classification in management research: The case for identifying artificial intelligence patents
    Miric, Milan
    Jia, Nan
    Huang, Kenneth G.
    [J]. STRATEGIC MANAGEMENT JOURNAL, 2023, 44 (02) : 491 - 519