MATCH: Metadata-Aware Text Classification in A Large Hierarchy

被引:7
|
作者
Zhang, Yu [1 ,2 ]
Shen, Zhihong [2 ]
Dong, Yuxiao [2 ,3 ]
Wang, Kuansan [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Champaign, IL 91820 USA
[2] Microsoft Res, Redmond, WA USA
[3] Facebook AI, Seattle, WA USA
基金
美国国家科学基金会;
关键词
text classification; academic graph; hierarchical classification; REPRESENTATION;
D O I
10.1145/3442381.3449979
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label text classification refers to the problem of assigning each given document its most relevant labels from a label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH(1) solution-an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over the state-of-the-art deep learning baselines.
引用
收藏
页码:3246 / 3257
页数:12
相关论文
共 50 条
  • [1] Metadata-Aware End-to-End Keyword Spotting
    Liu, Hongyi
    Abhyankar, Apurva
    Mishchenko, Yuriy
    Senechal, Thibaud
    Fu, Gengshen
    Kulis, Brian
    Stein, Noah
    Shah, Anish
    Vitaladevuni, Shiv Naga Prasad
    [J]. INTERSPEECH 2020, 2020, : 2282 - 2286
  • [2] An Efficient and Metadata-Aware Big Data Storage Architecture
    Jin, Rize
    Paik, Joon-Young
    Biadgie, Yenewondim
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2020, 2020, 12115 : 146 - 152
  • [3] Supporting Source Code Annotations with Metadata-Aware Development Fnvironment
    Juhar, Jan
    [J]. PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 411 - 420
  • [4] Hierarchical Metadata-Aware Document Categorization under Weak Supervision
    Zhang, Yu
    Chen, Xiusi
    Meng, Yu
    Han, Jiawei
    [J]. WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 770 - 778
  • [5] Metadata-Aware Measures for Answer Summarization in Community Question Answering
    Tomasoni, Mattia
    Huang, Minlie
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 760 - 769
  • [6] A metadata-aware application for remote scoring and exchange of tissue microarray images
    Lorna Morris
    Andrew Tsui
    Charles Crichton
    Steve Harris
    Peter H Maccallum
    William J Howat
    Jim Davies
    James D Brenton
    Carlos Caldas
    [J]. BMC Bioinformatics, 14
  • [7] A metadata-aware application for remote scoring and exchange of tissue microarray images
    Morris, Lorna
    Tsui, Andrew
    Crichton, Charles
    Harris, Steve
    Maccallum, Peter H.
    Howat, William J.
    Davies, Jim
    Brenton, James D.
    Caldas, Carlos
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [8] Sentence Alignment of Bilingual Survey Texts Applying a Metadata-Aware Strategy
    Sorato, Danielly
    Zavala-Rojas, Diana
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 469 - 476
  • [9] Hierarchy-Aware Global Model for Hierarchical Text Classification
    Zhou, Jie
    Ma, Chunping
    Long, Dingkun
    Xu, Guangwei
    Ding, Ning
    Zhang, Haoyu
    Xie, Pengjun
    Liu, Gongshen
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1106 - 1117
  • [10] Hierarchy-Aware and Label Balanced Model for Hierarchical Text Classification
    Zhang, Jun
    Li, Yubin
    Shen, Fanfan
    Xia, Chenxi
    Tan, Hai
    He, Yanxiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 300