Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing

被引:49
|
作者
Sun, Chong [1 ]
Rampalli, Narasimhan [1 ]
Yang, Frank [1 ]
Doan, Anhai [1 ,2 ]
机构
[1] WalmartLabs, Mountain View, CA 94040 USA
[2] Univ Wisconsin Madison, Madison, WI 53706 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 7卷 / 13期
关键词
D O I
10.14778/2733004.2733024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale classification is an increasingly critical Big Data problem. So far, however, very little has been published on how this is done in practice. In this paper we describe Chimera, our solution to classify tens of millions of products into 5000+ product types at WalmartLabs. We show that at this scale, many conventional assumptions regarding learning and crowdsourcing break down, and that existing solutions cease to work. We describe how Chimera employs a combination of learning, rules (created by in-house analysts), and crowdsourcing to achieve accurate, continuously improving, and cost-effective classification. We discuss a set of lessons learned for other similar Big Data systems. In particular, we argue that at large scales crowdsourcing is critical, but must be used in combination with learning, rules, and in-house analysts. We also argue that using rules (in conjunction with learning) is a must, and that more research attention should be paid to helping analysts create and manage (tens of thousands of) rules more effectively.
引用
收藏
页码:1529 / 1540
页数:12
相关论文
共 50 条
  • [41] Large-Scale Strategic Games and Adversarial Machine Learning
    Alpcan, Tansu
    Rubinstein, Benjamin I. P.
    Leckie, Christopher
    [J]. 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 4420 - 4426
  • [42] Dynamic Control Flow in Large-Scale Machine Learning
    Yu, Yuan
    Abadi, Martin
    Barham, Paul
    Brevdo, Eugene
    Burrows, Mike
    Davis, Andy
    Dean, Jeff
    Ghemawat, Sanjay
    Harley, Tim
    Hawkins, Peter
    Isard, Michael
    Kudlur, Manjunath
    Monga, Rajat
    Murray, Derek
    Zheng, Xiaoqiang
    [J]. EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, 2018,
  • [43] Introduction to Special Issue on Large-Scale Machine Learning
    Hsu, Chun-Nan
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [44] A review of Nystrom methods for large-scale machine learning
    Sun, Shiliang
    Zhao, Jing
    Zhu, Jiang
    [J]. INFORMATION FUSION, 2015, 26 : 36 - 48
  • [45] Machine learning for large-scale crop yield forecasting
    Paudel, Dilli
    Boogaard, Hendrik
    de Wit, Allard
    Janssen, Sander
    Osinga, Sjoukje
    Pylianidis, Christos
    Athanasiadis, Ioannis N.
    [J]. AGRICULTURAL SYSTEMS, 2021, 187
  • [46] Compressed linear algebra for large-scale machine learning
    Ahmed Elgohary
    Matthias Boehm
    Peter J. Haas
    Frederick R. Reiss
    Berthold Reinwald
    [J]. The VLDB Journal, 2018, 27 : 719 - 744
  • [47] Compressed linear algebra for large-scale machine learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    [J]. VLDB JOURNAL, 2018, 27 (05): : 719 - 744
  • [48] Humanization of antibodies using a machine learning approach on large-scale repertoire data
    Marks, Claire
    Hummer, Alissa M.
    Chin, Mark
    Deane, Charlotte M.
    [J]. BIOINFORMATICS, 2021, 37 (22) : 4041 - 4047
  • [49] Measuring human perceptions of a large-scale urban region using machine learning
    Zhang, Fan
    Zhou, Bolei
    Liu, Liu
    Liu, Yu
    Fung, Helene H.
    Lin, Hui
    Ratti, Carlo
    [J]. LANDSCAPE AND URBAN PLANNING, 2018, 180 : 148 - 160
  • [50] A GENERIC TRUST FRAMEWORK FOR LARGE-SCALE OPEN SYSTEMS USING MACHINE LEARNING
    Liu, Xin
    Tredan, Gilles
    Datta, Anwitaman
    [J]. COMPUTATIONAL INTELLIGENCE, 2014, 30 (04) : 700 - 721