D3CAS: Distributed Clustering Algorithm Applied to Short-Text Stream Processing

被引:0
|
作者
Molina, Roberto [1 ,2 ]
Hasperue, Waldo [1 ,3 ]
Villa Monte, Augusto [1 ,4 ]
机构
[1] Univ Nacl La Plata, Fac Informat, Inst Invest Informat III LIDI, La Plata, Buenos Aires, Argentina
[2] CIN EVC, La Plata, Buenos Aires, Argentina
[3] CIC, Tolosa, Buenos Aires, Argentina
[4] UNLP, La Plata, Buenos Aires, Argentina
来源
关键词
Clustering; Spark; Streaming processing; Short text; Text analysis;
D O I
10.1007/978-3-030-20787-8_15
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this article, a proof of concept of a dynamic clustering algorithm based on density, called D3CAS, is presented. This algorithm was implemented to be run under the Spark Streaming framework, and it allows processing data streams. The algorithm was tested using a stream of short texts consisting of requirements generated by social media users, in particular, from a dataset called Pizza Request Dataset. The results, obtained in a virtualized environment, were analyzed with different configurations for algorithm parameters, which allowed establishing which are the configurations that yield the best results. Since the dataset used includes the label for each text in the stream, cluster purity could be measured and the results obtained could be compared to those presented by the authors of the dataset.
引用
收藏
页码:211 / 220
页数:10
相关论文
共 4 条
  • [1] Short-Text Clustering Algorithm Based on Laplacian Graph
    Meng, Hai-Ning
    Feng, Kai
    Zhu, Lei
    Zhang, Bei-Bei
    Tong, Xin-Yu
    Hei, Xin-Hong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (09): : 1716 - 1723
  • [2] A Scalable Short-Text Clustering Algorithm Using Apache Spark
    Akritidis, Leonidas
    Alamaniotis, Miltiadis
    Fevgas, Athanasios
    Bozanis, Panayiotis
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 927 - 934
  • [3] A new AntTree-based algorithm for clustering short-text corpora
    Luis Errecalde, Marcelo
    Alejandro Ingaramo, Diego
    Rosso, Paolo
    [J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2010, 10 (01): : 1 - 7
  • [4] Event Building Algorithm in a Distributed Stream Processing Data Acquisition Platform: D-Matrix
    Zhang, Lei
    Yang, Junfeng
    Wang, Tianxing
    Sun, Zhengyang
    Sun, Ke
    Zeng, Jinrui
    [J]. IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2023, 70 (02) : 105 - 112