A Baseline Generative Probabilistic Model for Weakly Supervised Learning

被引：0

作者：

Papadopoulos, Georgios ^{[1
]}

Silavong, Fran ^{[1
]}

Moran, Sean ^{[1
]}

机构：

[1] JPMorgan Chase & Co, 25 Bank St, London E14 5JP, England

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VI | 2023年 / 14174卷

关键词：

Weakly Supervised Learning; Generative Models; Probabilistic Models;

D O I：

10.1007/978-3-031-43427-3_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Finding relevant and high-quality datasets to train machine learning models is a major bottleneck for practitioners. Furthermore, to address ambitious real-world use-cases there is usually the requirement that the data come labelled with high-quality annotations that can facilitate the training of a supervised model. Manually labelling data with high-quality labels is generally a time-consuming and challenging task and often this turns out to be the bottleneck in a machine learning project. Weakly Supervised Learning (WSL) approaches have been developed to alleviate the annotation burden by offering an automatic way of assigning approximate labels (pseudo-labels) to unlabelled data based on heuristics, distant supervision and knowledge bases. We apply probabilistic generative latent variable models (PLVMs), trained on heuristic labelling representations of the original dataset, as an accurate, fast and cost-effective way to generate pseudo-labels. We show that the PLVMs achieve state-of-the-art performance across four datasets. For example, they achieve 22% points higher F1 score than Snorkel in the class-imbalanced Spouse dataset. PLVMs are plug-and-playable and are a drop-in replacement to existing WSL frameworks (e.g. Snorkel) or they can be used as baseline high-performance models for more complicated algorithms, giving practitioners a compelling accuracy boost.

引用

页码：36 / 50

页数：15

共 50 条

[1] Weakly Supervised Disentangled Generative Causal Representation Learning
Shen, Xinwei
Liu, Furui
Dong, Hanze
Lian, Qing
Chen, Zhitang
Zhang, Tong
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[2] Generative Prompt Model for Weakly Supervised Object Localization
Zhao, Yuzhong
Ye, Qixiang
Wu, Weijia
Shen, Chunhua
Wan, Fang
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6328 - 6338
[3] Generative Adversarial Learning Towards Fast Weakly Supervised Detection
Shen, Yunhan
Ji, Rongrong
Zhang, Shengchuan
Zuo, Wangmeng
Wang, Yan
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5764 - 5773
[4] Multimodal Generative Models for Scalable Weakly-Supervised Learning
Wu, Mike
Goodman, Noah
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] A Probabilistic Associative Model for Segmenting Weakly Supervised Images
Zhang, Luming
Yang, Yi
Gao, Yue
Yu, Yi
Wang, Changbo
Li, Xuelong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 4150 - 4159
[6] Kernel-Based Generative Adversarial Networks for Weakly Supervised Learning
Croce, Danilo
Castellucci, Giuseppe
Basili, Roberto
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI*IA 2019, 2019, 11946 : 336 - 347
[7] Probabilistic Representation and Inverse Design of Metamaterials Based on a Deep Generative Model with Semi-Supervised Learning Strategy
Ma, Wei
Cheng, Feng
Xu, Yihao
Wen, Qinlong
Liu, Yongmin
[J]. ADVANCED MATERIALS, 2019, 31 (35)
[8] Weakly supervised foreground learning for weakly supervised localization and detection
Zhang, Chen -Lin
Li, Yin
Wu, Jianxin
[J]. PATTERN RECOGNITION, 2023, 137
[9] Generative tensor network classification model for supervised machine learning
Sun, Zheng-Zhi
Peng, Cheng
Liu, Ding
Ran, Shi-Ju
Su, Gang
[J]. PHYSICAL REVIEW B, 2020, 101 (07)
[10] Safe Weakly Supervised Learning
Li, Yu-Feng
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4951 - 4955

← 1 2 3 4 5 →