Semantically-Enhanced Topic Modeling

被引:10
|
作者
Viegas, Felipe [1 ]
Luiz, Washington [1 ]
Gomes, Christian [2 ]
Khatibi, Amir [1 ]
Canuto, Sergio [3 ]
Mourao, Fernando [4 ]
Salles, Thiago [1 ]
Rocha, Leonardo [2 ]
Goncalves, Marcos Andre [1 ]
机构
[1] Univ Fed Minas Gerais, Belo Horizonte, MG, Brazil
[2] Univ Fed Sao Joao del Rei, Sao Joao del Rei, Brazil
[3] IFG, Luziania, Brazil
[4] Seek AI Labs, Belo Horizonte, MG, Brazil
关键词
Topic Modeling; Word Embeddings; Bag of Words;
D O I
10.1145/3269206.3271797
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we advance the state-of-the-art in topic modeling by means of the design and development of a novel (semi-formal) general topic modeling framework. The novel contributions of our solution include: (i) the introduction of new semantically-enhanced data representations for topic modeling based on pooling, and (ii) the proposal of a novel topic extraction strategy - ASToC -that solves the difficulty in representing topics in our semantically-enhanced information space. In our extensive experimentation evaluation, covering 12 datasets and 12 state-of-the-art baselines, totalizing 108 tests, we exceed (with a few ties) in almost 100 cases, with gains of more than 50% against the best baselines (achieving up to 80% against some runner-ups). We provide qualitative and quantitative statistical analyses of why our solutions work so well. Finally, we show that our method is able to improve document representation in automatic text classification.
引用
下载
收藏
页码:893 / 902
页数:10
相关论文
共 50 条
  • [21] SELFIE: A Semantically-Enhanced Load Forecasting Approach with Indirect Estimate o Spatial Influences
    Das, Monidipa
    Dutta, Suparna
    2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 687 - 692
  • [22] Semantically Enhanced User Modeling
    Achananuparp, Palakorn
    Han, Hyoil
    Nasraoui, Olfa
    Johnson, Roberta
    APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 1335 - +
  • [23] Semantically-Enhanced Model-Experiment-Evaluation Processes (SeMEEPs) within the Atmospheric Chemistry Community
    Martin, Chris
    Haji, Mohammed H.
    Dew, Peter
    Pilling, Mike
    Jimack, Peter
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2008, 5272 : 293 - +
  • [24] Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval
    Yuhua Jia
    Liang Bai
    Shuang Liu
    Peng Wang
    Jinlin Guo
    Yuxiang Xie
    Multimedia Tools and Applications, 2019, 78 : 13169 - 13188
  • [25] Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval
    Jia, Yuhua
    Bai, Liang
    Liu, Shuang
    Wang, Peng
    Guo, Jinlin
    Xie, Yuxiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (10) : 13169 - 13188
  • [26] Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval
    Gong, Yan
    Cosma, Georgina
    PATTERN RECOGNITION, 2023, 137
  • [27] Disaster linguistics, climate change semantics and public discourse studies: a semantically-enhanced discourse study of 2011 Queensland Floods
    Bromhead, Helen
    LANGUAGE SCIENCES, 2021, 85
  • [28] Semantically-enhanced rule-based diagnostics for industrial Internet of Things: The SDRL language and case study for Siemens trains and turbines
    Kharlamov, Evgeny
    Mehdi, Gulnar
    Saykovic, Ognjen
    Xiao, Guohui
    Kalayci, Elem Guezel
    Roshchin, Mikhail
    JOURNAL OF WEB SEMANTICS, 2019, 56 : 11 - 29
  • [29] Knowledge Base Enhanced Topic Modeling
    Song, Dandan
    Gao, Jingwen
    Pang, Jinhui
    Liao, Lejian
    Qin, Lifei
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 380 - 387
  • [30] Analyzing Impact of Climate Variability on COVID-19 Outbreak: A Semantically-enhanced Theory-guided Data-driven Approach
    Das, Monidipa
    Ghosh, Soumya K.
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 1 - 9