Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis

被引:1
|
作者
Meng, Yu [1 ]
Huang, Jiaxin [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Text Embedding; Topic Mining; Multi-Faceted Taxonomy; Text Cube; Massive Text Corpora; Multi-Dimensional Analysis;
D O I
10.1145/3394486.3406483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People nowadays are immersed in a wealth of text data, ranging from news articles, to social media, academic publications, advertisements, and economic reports. A grand challenge of data mining is to develop effective, scalable and weakly-supervised methods for extracting actionable structures and knowledge from massive text data. Without requiring extensive and corpus-specific human annotations, these methods will satisfy people's diverse applications and needs for comprehending and making good use of large-scale corpora. In this tutorial, we will introduce recent advances in text embeddings and their applications to a wide range of text mining tasks that facilitate multi-dimensional analysis of massive text corpora. Specifically, we first overview a set of recently developed unsupervised and weakly-supervised text embedding methods including state-of-the-art context-free embeddings and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several embedding-driven text mining techniques that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge, in the form of multi-dimensional topics and multi-faceted taxonomies, from large-scale text corpora. We finally show that the topics and taxonomies so discovered will naturally form a multidimensional TextCube structure, which greatly enhances text exploration and analysis for various important applications, including text classification, retrieval and summarization. We will demonstrate on the most recent real-world datasets (including political news articles as well as scientific publications related to the coronavirus) how multi-dimensional analysis of massive text corpora can be conducted with the introduced embedding-driven text mining techniques.
引用
收藏
页码:3573 / 3574
页数:2
相关论文
共 50 条
  • [31] Multi-dimensional Model Driven Policy Generation
    Li, Juan
    Ouedraogo, Wendpanga Francis
    Biennier, Frederique
    CLOUD COMPUTING AND SERVICES SCIENCE, CLOSER 2013, 2014, 453 : 69 - 85
  • [32] Perception of multi-dimensional regularities is driven by salience
    Ru Qi Yu
    Yu Luo
    Daniel Osherson
    Jiaying Zhao
    Attention, Perception, & Psychophysics, 2019, 81 : 1564 - 1578
  • [33] Multi-dimensional Data Correlation Analysis Method Based on Neighborhood Preserving Embedding Mechanism
    Ge, Zhongdi
    Zhao, Longjun
    Wang, Zhen
    Cui, Dandan
    Yang, Yang
    Gao, Zhipeng
    2021 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2021,
  • [34] Multi-dimensional LSTM: A Model of Network Text Classification
    Wu, Weixin
    Liu, Xiaotong
    Shi, Leyi
    Liu, Yihao
    Song, Yuxiao
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT III, 2021, 12939 : 209 - 217
  • [35] Text mining with emergent self organizing maps and multi-dimensional scaling: A comparative study on domestic violence
    Poelmans, Jonas
    Van Hulle, Marc M.
    Viaene, Stijn
    Elzinga, Paul
    Dedene, Guido
    APPLIED SOFT COMPUTING, 2011, 11 (04) : 3870 - 3876
  • [36] The promise of machine-learning- driven text analysis techniques for historical research: topic modeling and word embedding
    Martin, Marta Villamor
    Kirsch, David A.
    Prieto-Nanez, Fabian
    MANAGEMENT & ORGANIZATIONAL HISTORY, 2023, 18 (01) : 81 - 96
  • [37] Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis
    Bellettini, Carlo
    Marchetto, Alessandro
    Trentini, Andrea
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 4, 2005, 4 : 129 - 132
  • [38] A Multi-dimensional Analysis of Deception
    Su, Qi
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 160 - 163
  • [39] ANALYSIS IN THE MULTI-DIMENSIONAL BALL
    Sjogren, Peter
    Szarek, Tomasz Z.
    MATHEMATIKA, 2019, 65 (02) : 190 - 212
  • [40] Embedding a family of disjoint multi-dimensional meshes into a crossed cube
    Dong, Qiang
    Yang, Xiaofan
    Zhao, Juan
    INFORMATION PROCESSING LETTERS, 2008, 108 (06) : 394 - 397