Taming Pretrained Transformers for Extreme Multi-label Text Classification

被引：116

作者：

Chang, Wei-Cheng ^{[1
]}

Yu, Hsiang-Fu ^{[2
]}

Zhong, Kai ^{[2
]}

Yang, Yiming ^{[1
]}

Dhillon, Inderjit S. ^{[2
,3
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Amazon, Bellevue, WA USA

[3] UT Austin, Austin, TX USA

来源：

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年

关键词：

Transformer models; eXtreme Multi-label text classification;

D O I：

10.1145/3394486.3403368

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved state-of-the-art performance on many NLP tasks including sentence classification, albeit with small label sets. However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue. In this paper, we propose X-Transformer, the first scalable approach to fine-tuning deep transformer models for the XMC problem. The proposed method achieves new state-of-the-art results on four XMC benchmark datasets. In particular, on a Wiki dataset with around 0.5 million labels, the prec@1 of X-Transformer is 77.28%, a substantial improvement over state-of-the-art XMC approaches Parabel (linear) and AttentionXML (neural), which achieve 68.70% and 76.95% precision@1, respectively. We further apply X-Transformer to a product2query dataset from Amazon and gained 10.7% relative improvement on prec@1 over Parabel.

引用

页码：3163 / 3171

页数：9

共 50 条

[41] Multi-Aspect co-Attentional Collaborative Filtering for extreme multi-label text classification
Wang, Jiyao
Chen, Zijie
Qin, Yang
He, Dengbo
Lin, Fangzhen
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 260
[42] Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Zhang, Jiong
Chang, Wei-cheng
Yu, Hsiang-fu
Dhillon, Inderjit S.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[43] Research on Multi-Classification and Multi-Label in Text Categorization
Hua, Liu
[J]. 2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 86 - 89
[44] Exploring Transformers for Multi-Label Classification of Java']Java Vulnerabilities
Mamede, Claudia
Pinconschi, Eduard
Abreu, Rui
Campos, Jose
[J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 43 - 52
[45] Data scarcity, robustness and extreme multi-label classification
Rohit Babbar
Bernhard Schölkopf
[J]. Machine Learning, 2019, 108 : 1329 - 1351
[46] Sparse Local Embeddings for Extreme Multi-label Classification
Bhatia, Kush
Jain, Himanshu
Kar, Purushottam
Varma, Manik
Jain, Prateek
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[47] Data scarcity, robustness and extreme multi-label classification
Babbar, Rohit
Schoelkopf, Bernhard
[J]. MACHINE LEARNING, 2019, 108 (8-9) : 1329 - 1351
[48] Multi-label Classification with Clustering for Image and Text Categorization
Nasierding, Gulisong
Sajjanhar, Atul
[J]. 2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 869 - 874
[49] Multi-label text classification with an ensemble feature space
Tandon, Kushagri
Chatterjee, Niladri
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436
[50] A Combined Approach for Multi-Label Text Data Classification
Strimaitis, Rokas
Stefanovic, Pavel
Ramanauskaite, Simona
Slotkiene, Asta
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022

← 1 2 3 4 5 →