Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis

被引:1
|
作者
Meng, Yu [1 ]
Huang, Jiaxin [1 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Text Embedding; Topic Mining; Multi-Faceted Taxonomy; Text Cube; Massive Text Corpora; Multi-Dimensional Analysis;
D O I
10.1145/3394486.3406483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People nowadays are immersed in a wealth of text data, ranging from news articles, to social media, academic publications, advertisements, and economic reports. A grand challenge of data mining is to develop effective, scalable and weakly-supervised methods for extracting actionable structures and knowledge from massive text data. Without requiring extensive and corpus-specific human annotations, these methods will satisfy people's diverse applications and needs for comprehending and making good use of large-scale corpora. In this tutorial, we will introduce recent advances in text embeddings and their applications to a wide range of text mining tasks that facilitate multi-dimensional analysis of massive text corpora. Specifically, we first overview a set of recently developed unsupervised and weakly-supervised text embedding methods including state-of-the-art context-free embeddings and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several embedding-driven text mining techniques that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge, in the form of multi-dimensional topics and multi-faceted taxonomies, from large-scale text corpora. We finally show that the topics and taxonomies so discovered will naturally form a multidimensional TextCube structure, which greatly enhances text exploration and analysis for various important applications, including text classification, retrieval and summarization. We will demonstrate on the most recent real-world datasets (including political news articles as well as scientific publications related to the coronavirus) how multi-dimensional analysis of massive text corpora can be conducted with the introduced embedding-driven text mining techniques.
引用
收藏
页码:3573 / 3574
页数:2
相关论文
共 50 条
  • [1] EventCube: Multi-Dimensional Search and Mining of Structured and Text Data
    Tao, Fangbo
    Lei, Kin Hou
    Han, Jiawei
    Zhai, ChengXiang
    Cheng, Xiao
    Danilevsky, Marina
    Desai, Nihit
    Ding, Bolin
    Ge, Jing
    Ji, Heng
    Kanade, Rucha
    Kao, Anne
    Li, Qi
    Li, Yanen
    Lin, Cindy Xide
    Liu, Jialiu
    Oza, Nikunj
    Srivastava, Ashok
    Tjoelker, Rod
    Wang, Chi
    Zhang, Duo
    Zhao, Bo
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 1494 - 1497
  • [2] Opinion Mining Using Multi-Dimensional Analysis
    Biswas, Satarupa
    Poornalatha, G.
    IEEE ACCESS, 2023, 11 : 25906 - 25916
  • [3] Multi-Dimensional Analysis, text constellations, and interdisciplinary discourse
    Thompson, Paul
    Hunston, Susan
    Murakami, Akira
    Vajn, Dominik
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2017, 22 (02) : 153 - 186
  • [4] Research and analysis of multi-dimensional association rules mining
    Qin, F
    Yang, XB
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 1569 - 1572
  • [5] Restaurant Rating: Industrial Standard and Word-of-Mouth A Text Mining and Multi-dimensional Sentiment Analysis
    Gan, Qiwei
    Yu, Yang
    2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, : 1332 - 1340
  • [6] Accelerating Topic Exploration of Multi-Dimensional Documents
    Hsu Wen-Jing
    You, Lu
    Qi, Lee Zhuo
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1520 - 1527
  • [7] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
    Meng, Yu
    Zhang, Yunyi
    Huang, Jiaxin
    Zhang, Yu
    Zhang, Chao
    Han, Jiawei
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1908 - 1917
  • [8] Multi-Dimensional Network Embedding with Hierarchical Structure
    Ma, Yao
    Ren, Zhaochun
    Jiang, Ziheng
    Tang, Jiliang
    Yin, Dawei
    WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 387 - 395
  • [9] Mining multi-dimensional quantitative associations
    Okoniewski, M
    Gancarz, L
    Gawrysiak, P
    WEB KNOWLEDGE MANAGEMENT AND DECISION SUPPORTS, 2003, 2543 : 265 - 275
  • [10] Multi-Dimensional Relational Sequence Mining
    Esposito, Floriana
    Di Mauro, Nicola
    Basile, Teresa M. A.
    Ferilli, Stefano
    FUNDAMENTA INFORMATICAE, 2008, 89 (01) : 23 - 43