Product Bundle Identification using Semi-Supervised Learning

被引：12

作者：

Tzaban, Hen ^{[1
]}

Guy, Ido ^{[2
]}

Greenstein-Messica, Asnat ^{[1
]}

Dagan, Arnon ^{[2
]}

Rokach, Lior ^{[1
]}

Shapira, Bracha ^{[1
]}

机构：

[1] Ben Gurion Univ Negev, Beer Sheva, Israel

[2] eBay Res, Netanya, Israel

来源：

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年

关键词：

electronic commerce; ensemble learning; product bundling; self-training; semi-supervised learning; NOISE;

D O I：

10.1145/3397271.3401128

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many sellers on e-commerce platforms offer buyers product bundles, which package together two or more different items. The identification of such bundles is a necessary step to support a variety of related services, from recommendation to dynamic pricing. In this work, we present a comprehensive study of bundle identification on a large e-commerce website. Our analysis of bundle compared to non-bundle listed items reveals several key differentiating characteristics, spanning the listing's title, image, and attributes. Following, we experiment with a multi-modal classifier, which takes advantage of these characteristics as features. Our analysis also shows that a bundle indicator input by sellers tends to be highly noisy and carries only a weak signal. The bundle identification task therefore faces the challenge of having a small set of manually-labeled clean examples and a larger set of noisy-labeled examples, in conjunction with class imbalance due to the relative scarcity of bundles. Our experiments with basic supervised classifiers, using the manually-labeled and/or the noisy-labeled data for training, demonstrates only moderate performance. We therefore turn to a semi-supervised approach and propose GREED, a self-training ensemble-based algorithm with a greedy model selection. Our evaluation over two different meta-categories shows a superior performance of semi-supervised approaches for the bundle identification task, with GREED outperforming several semi-supervised alternatives. The combination of textual, image, and some metadata features is shown to yield the best performance, reaching an AUC of 0.89 and 0.92 for the two meta-categories, respectively.

引用

页码：791 / 800

页数：10

共 50 条

[31] A survey on semi-supervised learning
Jesper E. van Engelen
Holger H. Hoos
Machine Learning, 2020, 109 : 373 - 440
[32] Semi-supervised Sequence Learning
Dai, Andrew M.
Le, Quoc V.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[33] Semi-Supervised Incremental Learning
Bouchachia, Abdelhamid
Prossegger, Markus
Duman, Hakan
2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
[34] Semi-supervised learning by disagreement
Zhi-Hua Zhou
Ming Li
Knowledge and Information Systems, 2010, 24 : 415 - 439
[35] Deep Semi-Supervised Learning
Hailat, Zeyad
Komarichev, Artem
Chen, Xue-Wen
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2154 - 2159
[36] Semi-Supervised Learning by Disagreement
Zhou, Zhi-Hua
2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 93 - 93
[37] Reliable Semi-supervised Learning
Shao, Junming
Huang, Chen
Yang, Qinli
Luo, Guangchun
2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1197 - 1202
[38] Semi-supervised Learning with Transfer Learning
Zhou, Huiwei
Zhang, Yan
Huang, Degen
Li, Lishuang
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 109 - 119
[39] Semi-supervised learning with dropouts
Abhishek
Yadav, Rakesh Kumar
Verma, Shekhar
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
[40] PRIVILEGED SEMI-SUPERVISED LEARNING
Chen, Xingyu
Gong, Chen
Ma, Chao
Huang, Xiaolin
Yang, Jie
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2999 - 3003

← 1 2 3 4 5 →