Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis

被引:1
|
作者
Meng, Yu [1 ]
Huang, Jiaxin [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Text Embedding; Topic Mining; Multi-Faceted Taxonomy; Text Cube; Massive Text Corpora; Multi-Dimensional Analysis;
D O I
10.1145/3394486.3406483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People nowadays are immersed in a wealth of text data, ranging from news articles, to social media, academic publications, advertisements, and economic reports. A grand challenge of data mining is to develop effective, scalable and weakly-supervised methods for extracting actionable structures and knowledge from massive text data. Without requiring extensive and corpus-specific human annotations, these methods will satisfy people's diverse applications and needs for comprehending and making good use of large-scale corpora. In this tutorial, we will introduce recent advances in text embeddings and their applications to a wide range of text mining tasks that facilitate multi-dimensional analysis of massive text corpora. Specifically, we first overview a set of recently developed unsupervised and weakly-supervised text embedding methods including state-of-the-art context-free embeddings and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several embedding-driven text mining techniques that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge, in the form of multi-dimensional topics and multi-faceted taxonomies, from large-scale text corpora. We finally show that the topics and taxonomies so discovered will naturally form a multidimensional TextCube structure, which greatly enhances text exploration and analysis for various important applications, including text classification, retrieval and summarization. We will demonstrate on the most recent real-world datasets (including political news articles as well as scientific publications related to the coronavirus) how multi-dimensional analysis of massive text corpora can be conducted with the introduced embedding-driven text mining techniques.
引用
收藏
页码:3573 / 3574
页数:2
相关论文
共 50 条
  • [21] Exploring Multi-dimensional Data via Subset Embedding
    Xie, Peng
    Tao, Wenyuan
    Li, Jie
    Huang, Wentao
    Chen, Siming
    COMPUTER GRAPHICS FORUM, 2021, 40 (03) : 75 - 86
  • [22] Mining association rules with multi-dimensional constraints
    Lee, AJT
    Lin, WC
    Wang, CS
    JOURNAL OF SYSTEMS AND SOFTWARE, 2006, 79 (01) : 79 - 92
  • [23] Mining multi-dimensional data with visualization techniques
    Liu, DY
    Sprague, AP
    PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3157 : 934 - 935
  • [24] Mining multi-dimensional data for decision support
    Donato, JM
    Schryver, JC
    Hinkel, GC
    Schmoyer, RL
    Grady, NW
    Leuze, MR
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 489 - 497
  • [25] Mining multi-dimensional data for decision support
    Donato, JM
    Schryver, JC
    Hinkel, GC
    Schmoyer, RL
    Leuze, MR
    Grandy, NW
    FUTURE GENERATION COMPUTER SYSTEMS, 1999, 15 (03) : 433 - 441
  • [26] Unexpected Subgroup Mining in Multi-Dimensional Database
    Zhang J.-T.
    Wu S.
    Chen G.
    Shou L.-D.
    Chen K.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1671 - 1685
  • [27] Improving the Ability of Mining for Multi-dimensional Data
    Shi, Yong
    Kling, Tyler
    DATABASE THEORY AND APPLICATION, BIO-SCIENCE AND BIO-TECHNOLOGY, 2010, 118 : 291 - 298
  • [28] Research on Multi-dimensional Association Rules Mining
    Li, Wenchao
    Yang, Nini
    PROCEEDINGS OF 2010 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL ENGINEERING, VOLS I AND II, 2010, : 725 - 728
  • [29] Utility Mining Across Multi-Dimensional Sequences
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Zhang, Jiexiong
    Yin, Hongzhi
    Fournier-Viger, Philippe
    Chao, Han-Chieh
    Yu, Philip S.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (05)
  • [30] Perception of multi-dimensional regularities is driven by salience
    Yu, Ru Qi
    Luo, Yu
    Osherson, Daniel
    Zhao, Jiaying
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2019, 81 (05) : 1564 - 1578