Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums

被引:0
|
作者
Bouabdallaoui, Ibrahim [1 ]
Guerouate, Fatima [1 ]
Sbihi, Mohammed [1 ]
机构
[1] Mohammed V Univ Rabat, LASTIMI Lab EST Sale, Ave Prince Heritier, Sale, Morocco
关键词
LDA; BERT; K-Means; Genetic Algorithms; Forum Analysis;
D O I
10.14201/adcaij.31448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Leveraging discussion forums as a medium for information exchange has led to a surge in data, making topic clustering in these platforms essential for understanding user interests, preferences, and concerns. This study introduces an innovative methodology for topic clustering by combining text embedding techniques-Latent Dirichlet Allocation (LDA) and BERT-trained on a singular autoencoder. Additionally, it proposes an amalgamation of K-Means and Genetic Algorithms for clustering topics within triadic discussion forum threads. The proposed technique begins with a preprocessing stage to clean and tokenize textual data, which is then transformed into a vector representation using the hybrid text embedding method. Subsequently, the K-Means algorithm clusters these vectorized data points, and Genetic Algorithms optimize the parameters of the K-Means clustering. We assess the efficacy of our approach by computing cosine similarities between topics and comparing performance against coherence and graph visualization. The results confirm that the hybrid text embedding methodology, coupled with evolutionary algorithms, enhances the quality of topic clustering across various discussion forum themes. This investigation contributes significantly to the development of effective methods for clustering discussion forums, with potential applications in diverse domains, including social media analysis, online education, and customer response analysis.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] CLUSTERING OF THREAD POSTS IN ONLINE DISCUSSION FORUMS
    Said, Dina
    Wanas, Nayer
    [J]. KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2010, : 314 - 319
  • [2] Sentiment Distribution of Topic Discussion in Online English Learning: An Approach Based on Clustering Algorithm and Improved CNN
    Yang, Qiujuan
    Zhang, Jiaxiao
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2023, 16 (02)
  • [3] Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation
    Stefanovitch, Nicolas
    Jacquet, Guillaume
    de Longueville, Bertrand
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 694 - 700
  • [4] Automatic Classification for Cognitive Engagement in Online Discussion Forums: Text Mining and Machine Learning Approach
    Hayati, Hind
    Idrissi, Mohammed Khalidi
    Bennani, Samir
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 114 - 118
  • [5] SBTM: A joint sentiment and behaviour topic model for online course discussion forums
    Peng, Xian
    Xu, Qinmei
    Gan, Wenbin
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 517 - 532
  • [6] A hybrid approach for text clustering
    Ajmi Al-Shuwaili, Saad O.
    Obied Redywi, Sadiq
    Naser, Mustafa A.
    [J]. Materials Today: Proceedings, 2023, 80 : 2584 - 2589
  • [7] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [8] A hybrid evolutionary computation approach with its application for optimizing text document clustering
    Song, Wei
    Qiao, Yingying
    Park, Soon Cheol
    Qian, Xuezhong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2517 - 2524
  • [9] A hybrid approach for text document clustering using Jaya optimization algorithm
    Thirumoorthy, Karpagalingam
    Muneeswaran, Karuppaiah
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [10] Ontology-based Topic Clustering for Online Discussion Data
    Wang, Yongheng
    Cao, Kening
    Zhang, Xiaoming
    [J]. INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2012), 2013, 8768