MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text Classification

被引:0
|
作者
Ye, Hui [1 ]
Sunderraman, Rajshekhar [1 ]
Ji, Shihao [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30302 USA
基金
美国国家科学基金会;
关键词
Training; Transformers; Task analysis; Vectors; Self-supervised learning; Text categorization; Semantics; Extreme multi-label classification; label2vec; text-label matching; bipartite graph; contrastive learning;
D O I
10.1109/TKDE.2024.3374750
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The eXtreme Multi-label text Classification (XMC) refers to training a classifier that assigns a text sample with relevant labels from an extremely large-scale label set (e.g., millions of labels). We propose MatchXML, an efficient text-label matching framework for XMC. We observe that the label embeddings generated from the sparse Term Frequency-Inverse Document Frequency (TF-IDF) features have several limitations. We thus propose label2vec to effectively train the semantic dense label embeddings by the Skip-gram model. The dense label embeddings are then used to build a Hierarchical Label Tree by clustering. In fine-tuning the pre-trained encoder Transformer, we formulate the multi-label text classification as a text-label matching problem in a bipartite graph. We then extract the dense text representations from the fine-tuned Transformer. Besides the fine-tuned dense text embeddings, we also extract the static dense sentence embeddings from a pre-trained Sentence Transformer. Finally, a linear ranker is trained by utilizing the sparse TF-IDF features, the fine-tuned dense text representations, and static dense sentence features. Experimental results demonstrate that MatchXML achieves the state-of-the-art accuracies on five out of six datasets. As for the training speed, MatchXML outperforms the competing methods on all the six datasets.
引用
收藏
页码:4781 / 4793
页数:13
相关论文
共 50 条
  • [1] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] Correlation Networks for Extreme Multi-label Text Classification
    Xun, Guangxu
    Jha, Kishlay
    Sun, Jianhui
    Zhang, Aidong
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1074 - 1082
  • [3] Deep Learning for Extreme Multi-label Text Classification
    Liu, Jingzhou
    Chang, Wei-Cheng
    Wu, Yuexin
    Yang, Yiming
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 115 - 124
  • [4] Label prompt for multi-label text classification
    Song, Rui
    Liu, Zelong
    Chen, Xingbing
    An, Haining
    Zhang, Zhiqi
    Wang, Xiaoguang
    Xu, Hao
    [J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
  • [5] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    [J]. Applied Intelligence, 2023, 53 : 8761 - 8775
  • [6] LABEL-AWARE TEXT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Guo, Hao
    Li, Xiangyang
    Zhang, Lei
    Liu, Jia
    Chen, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7728 - 7732
  • [7] Taming Pretrained Transformers for Extreme Multi-label Text Classification
    Chang, Wei-Cheng
    Yu, Hsiang-Fu
    Zhong, Kai
    Yang, Yiming
    Dhillon, Inderjit S.
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3163 - 3171
  • [8] TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text Classification
    Zhao, Fei
    Ai, Qing
    Li, Xiangna
    Wang, Wenhui
    Gao, Qingyun
    Liu, Yichun
    [J]. NEURAL PROCESSING LETTERS, 2024, 56 (01)
  • [9] TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text Classification
    Fei Zhao
    Qing Ai
    Xiangna Li
    Wenhui Wang
    Qingyun Gao
    Yichun Liu
    [J]. Neural Processing Letters, 56
  • [10] Deep Learning Method with Attention for Extreme Multi-label Text Classification
    Chen, Si
    Wang, Liangguo
    Li, Wan
    Zhang, Kun
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 179 - 190