Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis

被引:2
|
作者
Meng, Yu [1 ]
Huang, Jiaxin [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Text Embedding; Topic Mining; Multi-Faceted Taxonomy; Text Cube; Massive Text Corpora; Multi-Dimensional Analysis;
D O I
10.1145/3394486.3406483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People nowadays are immersed in a wealth of text data, ranging from news articles, to social media, academic publications, advertisements, and economic reports. A grand challenge of data mining is to develop effective, scalable and weakly-supervised methods for extracting actionable structures and knowledge from massive text data. Without requiring extensive and corpus-specific human annotations, these methods will satisfy people's diverse applications and needs for comprehending and making good use of large-scale corpora. In this tutorial, we will introduce recent advances in text embeddings and their applications to a wide range of text mining tasks that facilitate multi-dimensional analysis of massive text corpora. Specifically, we first overview a set of recently developed unsupervised and weakly-supervised text embedding methods including state-of-the-art context-free embeddings and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several embedding-driven text mining techniques that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge, in the form of multi-dimensional topics and multi-faceted taxonomies, from large-scale text corpora. We finally show that the topics and taxonomies so discovered will naturally form a multidimensional TextCube structure, which greatly enhances text exploration and analysis for various important applications, including text classification, retrieval and summarization. We will demonstrate on the most recent real-world datasets (including political news articles as well as scientific publications related to the coronavirus) how multi-dimensional analysis of massive text corpora can be conducted with the introduced embedding-driven text mining techniques.
引用
收藏
页码:3573 / 3574
页数:2
相关论文
共 50 条
  • [41] Multi-dimensional Process Analysis
    Fahland, Dirk
    BUSINESS PROCESS MANAGEMENT (BPM 2022), 2022, 13420 : 27 - 33
  • [42] Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification
    Wang, Haobo
    Chen, Chen
    Liu, Weiwei
    Chen, Ke
    Hu, Tianlei
    Chen, Gang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6178 - 6185
  • [43] Multi-grain hierarchical topic extraction algorithm for text mining
    Zeng, Jianping
    Wu, Chengrong
    Wang, Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (04) : 3202 - 3208
  • [44] Mining and Ranking of Generalized Multi-Dimensional Frequent Subgraphs
    Petermann, Andre
    Micale, Giovanni
    Bergami, Giacomo
    Pulvirenti, Alfredo
    Rahm, Erhard
    2017 TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2017, : 236 - 245
  • [45] Incorporating Embedding to Topic Modeling for More Effective Short Text Analysis
    Rashid, Junaid
    Kim, Jungeun
    Naseem, Usman
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 73 - 76
  • [46] Application of Text Mining to Nursing Texts Exploratory Topic Analysis
    Hyun, Sookyung
    Cooper, Cheryl
    CIN-COMPUTERS INFORMATICS NURSING, 2020, 38 (10) : 475 - 482
  • [47] Tight coupling of personal interests with multi-dimensional visualization for exploration and analysis of text collections
    Thai, VinhTuan
    Handschuh, Siegfried
    Decker, Stefan
    PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION, 2008, : 221 - 226
  • [48] Multi-Dimensional Aspect Analysis of Text Input through Human Emotion and Social Factors
    Tasmin, Mahbuba
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 1779 - 1781
  • [49] Embedding Tasks Into the Latent Space: Cross-Space Consistency for Multi-Dimensional Analysis in Echocardiography
    Zhang, Zhenxuan
    Yu, Chengjin
    Zhang, Heye
    Gao, Zhifan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (06) : 2215 - 2228
  • [50] Multi-Dimensional Color Image Recognition and Mining Based on Feature Mining Algorithm
    Chen, Jiming
    Chen, Liping
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2021, 55 (02) : 195 - 201