A Comparative Study of Fuzzy Topic Models and LDA in terms of Interpretability

被引:9
|
作者
Rijcken, Emil [1 ]
Scheepers, Floortje [2 ]
Mosteiro, Pablo [3 ]
Zervanou, Kalliopi [4 ]
Spruit, Marco [5 ]
Kaymak, Uzay [1 ]
机构
[1] Eindhoven Univ Technol, Jheronimus Acad Data Sci, Eindhoven, Netherlands
[2] Univ Med Ctr Utrecht, Psychiat, Utrecht, Netherlands
[3] Univ Med Ctr Utrecht, Informat & Comp Sci, Utrecht, Netherlands
[4] Eindhoven Univ Technol, Ind Engn & Informat Sci, Eindhoven, Netherlands
[5] Leiden Univ, Med Ctr, Publ Hlth & Primary Care, Leiden, Netherlands
关键词
Topic Models; Text Classification; Fuzzy Modelling; Explainable AI; NLP; CLASSIFICATION; TEXT;
D O I
10.1109/SSCI50451.2021.9660139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many domains that employ machine learning models, both high performing and interpretable models are needed. A typical machine learning task is text classification, where models are hardly interpretable. Topic models, used as topic embeddings, carry the potential to better understand the decisions made by text classification algorithms. With this goal in mind, we propose two new fuzzy topic models; FLSA-W and FLSA-V. Both models are derived from the topic model Fuzzy Latent Semantic Analysis (FLSA). After training each model ten times, we use the mean coherence score to compare the different models with the benchmark models Latent Dirichlet Allocation (LDA) and FLSA. Our proposed models generally lead to higher coherence scores and lower standard deviations than the benchmark models. These proposed models are specifically useful as topic embeddings in text classification, since the coherence scores do not drop for a high number of topics, as opposed to the decay that occurs with LDA and FLSA.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] INTERPRETABILITY IN TERMS OF MODELS
    MONTAGUE, R
    KONINKLIJKE NEDERLANDSE AKADEMIE VAN WETESCHAPPEN-PROCEEDINGS SERIES A-MATHEMATICAL SCIENCES, 1965, 68 (03): : 467 - &
  • [2] Clustering with Probabilistic Topic Models on Arabic Texts: A Comparative Study of LDA and K-Means
    Kelaiaia, Abdessalem
    Merouani, Hayet
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (02) : 332 - 338
  • [3] Interpretability-preserving genetic optimization of linguistic terms in fuzzy models for fuzzy ordered classification: An ecological case study
    Van Broekhoven, Ester
    Adriaenssens, Veronique
    De Baets, Bernard
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2007, 44 (01) : 65 - 90
  • [4] Topic research in fuzzy domain: Based on LDA topic modelling
    Yu, Dejian
    Fang, Anran
    Xu, Zeshui
    INFORMATION SCIENCES, 2023, 648
  • [5] Interpretability of Fuzzy Temporal Models
    Shabelnikov, Alexander N.
    Kovalev, Sergey M.
    Sukhanov, Andrey V.
    PROCEEDINGS OF THE THIRD INTERNATIONAL SCIENTIFIC CONFERENCE INTELLIGENT INFORMATION TECHNOLOGIES FOR INDUSTRY (IITI'18), VOL 1, 2019, 874 : 223 - 234
  • [6] Gaussian LDA for Topic Models with Word Embeddings
    Das, Rajarshi
    Zaheer, Manzil
    Dyer, Chris
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 795 - 804
  • [7] Topic Significance Ranking of LDA Generative Models
    AlSumait, Loulwah
    Barbara, Daniel
    Gentle, James
    Domeniconi, Carlotta
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 67 - +
  • [8] The LDA Topic Model Extension Study
    Yang, Qingquan
    Li, Weijiang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE (LEMCS 2015), 2015, 117 : 857 - 860
  • [9] Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA
    Lu, Yue
    Mei, Qiaozhu
    Zhai, ChengXiang
    INFORMATION RETRIEVAL, 2011, 14 (02): : 178 - 203
  • [10] Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA
    Yue Lu
    Qiaozhu Mei
    ChengXiang Zhai
    Information Retrieval, 2011, 14 : 178 - 203