Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing

被引:49
|
作者
Sun, Chong [1 ]
Rampalli, Narasimhan [1 ]
Yang, Frank [1 ]
Doan, Anhai [1 ,2 ]
机构
[1] WalmartLabs, Mountain View, CA 94040 USA
[2] Univ Wisconsin Madison, Madison, WI 53706 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 7卷 / 13期
关键词
D O I
10.14778/2733004.2733024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale classification is an increasingly critical Big Data problem. So far, however, very little has been published on how this is done in practice. In this paper we describe Chimera, our solution to classify tens of millions of products into 5000+ product types at WalmartLabs. We show that at this scale, many conventional assumptions regarding learning and crowdsourcing break down, and that existing solutions cease to work. We describe how Chimera employs a combination of learning, rules (created by in-house analysts), and crowdsourcing to achieve accurate, continuously improving, and cost-effective classification. We discuss a set of lessons learned for other similar Big Data systems. In particular, we argue that at large scales crowdsourcing is critical, but must be used in combination with learning, rules, and in-house analysts. We also argue that using rules (in conjunction with learning) is a must, and that more research attention should be paid to helping analysts create and manage (tens of thousands of) rules more effectively.
引用
收藏
页码:1529 / 1540
页数:12
相关论文
共 50 条
  • [1] Quick extreme learning machine for large-scale classification
    Audi Albtoush
    Manuel Fernández-Delgado
    Eva Cernadas
    Senén Barro
    [J]. Neural Computing and Applications, 2022, 34 : 5923 - 5938
  • [2] Large-scale machine learning for metagenomics sequence classification
    Vervier, Kevin
    Mahe, Pierre
    Tournoud, Maud
    Veyrieras, Jean-Baptiste
    Vert, Jean-Philippe
    [J]. BIOINFORMATICS, 2016, 32 (07) : 1023 - 1032
  • [3] Quick extreme learning machine for large-scale classification
    Albtoush, Audi
    Fernandez-Delgado, Manuel
    Cernadas, Eva
    Barro, Senen
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08): : 5923 - 5938
  • [4] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
    Chaudhury, Mousumi
    Karami, Amin
    Ghazanfar, Mustansar Ali
    [J]. ELECTRONICS, 2022, 11 (16)
  • [5] ENHANCING THE EFFICIENCY OF A LARGE-SCALE SCOPING REVIEW WITH CROWDSOURCING AND MACHINE-LEARNING METHODOLOGY
    Zorko, D. J.
    Mcnally, J.
    Rochwerg, B.
    Pinto, N.
    Couban, R.
    Hearn, K. O'
    Choong, K.
    [J]. PEDIATRIC CRITICAL CARE MEDICINE, 2022, 23 (11)
  • [6] Large-Scale Image Classification Using Active Learning
    Alajlan, Naif
    Pasolli, Edoardo
    Melgani, Farid
    Franzoso, Andrea
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (01) : 259 - 263
  • [7] Extreme Learning Machine for Large-Scale Graph Classification Based on MapReduce
    Wang, Zhanghui
    Zhao, Yuhai
    Wang, Guoren
    [J]. PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 93 - 105
  • [8] Extreme Learning Machine for large-scale graph classification based on MapReduce
    Wang, Zhanghui
    Zhao, Yuhai
    Yuan, Ye
    Wang, Guoren
    Chen, Lei
    [J]. NEUROCOMPUTING, 2017, 261 : 106 - 114
  • [9] Automatic large-scale data acquisition via crowdsourcing for crosswalk classification: A deep learning approach
    Berriel, Rodrigo F.
    Rossi, Franco Schmidt
    de Souza, Alberto F.
    Oliveira-Santos, Thiago
    [J]. COMPUTERS & GRAPHICS-UK, 2017, 68 : 32 - 42
  • [10] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594