Optimizing Data Collection for Machine Learning

被引:0
|
作者
Mahmood, Rafid [1 ]
Lucas, James [1 ]
Alvarez, Jose M. [1 ]
Fidler, Sanja [1 ,2 ,3 ]
Law, Marc T. [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
[2] Univ Toronto, Toronto, ON, Canada
[3] Vector Inst, Toronto, ON, Canada
关键词
POWER;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. Additionally, this formulation generalizes to tasks requiring multiple data sources, such as labeled and unlabeled data used in semi-supervised learning. To solve our problem, we develop Learn-Optimize-Collect (LOC), which minimizes expected future collection costs. Finally, we numerically compare our framework to the conventional baseline of estimating data requirements by extrapolating from neural scaling laws. We significantly reduce the risks of failing to meet desired performance targets on several classification, segmentation, and detection tasks, while maintaining low total collection costs.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Algorithms of Data Collection and Analysis of Biometric Voice Data with the Use of Machine Learning Methods
    Maksutov, Artem A.
    Bizhanov, Ruslan Zh.
    Kozlov, Valentin K.
    Antonchenko, Artem S.
    [J]. PROCEEDINGS OF THE 2018 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2018, : 1121 - 1125
  • [22] Optimizing Activity Data Collection with Gamification Points Using Uncertainty Based Active Learning
    Mairittha, Nattaya
    Mairittha, Tittaya
    Inoue, Sozo
    [J]. UBICOMP/ISWC'19 ADJUNCT: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2019 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2019, : 761 - 767
  • [23] Optimizing prediction of response to antidepressant medications using machine learning and environmental data
    Spinrad, A.
    Darki-Morag, S.
    Zoller, R.
    Taliaz, D.
    [J]. EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2021, 53 : S66 - S67
  • [24] Optimizing Machine Learning Data Pre-Processing for Financial Fraud Detection
    Bower, Matthew
    Godasu, Rajesh
    Nyakundi, Nicholas
    Reynolds, Shawn
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY, EIT 2024, 2024, : 28 - 37
  • [25] Optimizing data acquisition: a Bayesian approach for efficient machine learning model training
    Mahani, M. R.
    Nechepurenko, Igor A.
    Rahimof, Yasmin
    Wicht, Andreas
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (03):
  • [26] Optimizing prediction of response to antidepressant medications using machine learning and environmental data
    Spinrad, A.
    Darki-Morag, S.
    Taliaz, D.
    [J]. EUROPEAN PSYCHIATRY, 2021, 64 : S755 - S755
  • [27] USING MACHINE LEARNING TO SUPPLEMENT DATA VALIDATION IN EDUCATIONAL RESEARCH DATA COLLECTION; THE EARLY STAGES
    Brymer-Bashore, Jeffrey B.
    [J]. SGEM 2016, BK 1: PSYCHOLOGY AND PSYCHIATRY, SOCIOLOGY AND HEALTHCARE, EDUCATION CONFERENCE PROCEEDINGS, VOL I, 2016, : 1227 - 1231
  • [28] Review: Is design data collection still relevant in the big data era? With extensions to machine learning
    Freeman, Laura
    [J]. QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2023, 39 (04) : 1102 - 1106
  • [29] OPTIMIZING DRUG SCREENING WITH MACHINE LEARNING
    Chen Lin
    Zhou Xiaoxiao
    [J]. 2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [30] A Machine Learning Based Approach for Smart and Automated Data Collection: Applications in Transportation
    Agarwal, Shaurya
    Gupta, Saumya
    Kachroo, Pushkin
    Dhingra, Nilesh
    [J]. TRANSPORTATION IN DEVELOPING ECONOMIES, 2024, 10 (01)