Document clustering based on time series

被引:0
|
作者
Matei, Liviu Sebastian [1 ]
Trausan-Matu, Stefan [1 ,2 ,3 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp Sci, Bucharest, Romania
[2] Romanian Acad, Res Inst Artificial Intelligence, Bucharest, Romania
[3] Acad Romanian Scientists, Bucharest, Romania
关键词
time series; clustering; words; natural language processing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel document clustering algorithm that represents documents as a time series of words. Document clustering is very important due to the fact that it permits us to group them based on some certain criteria, especially nowadays when a large number of articles are available. The timed series representation of the document instead of the vector model permits us to consider a new algorithm for the computation of the distance between documents: dynamic time warping. This novel representation together with the dynamic time warping algorithm represents the foundation for computing the similarity and the clustering of the documents. The clustering algorithm used is hierarchical clustering. This novel clustering method of texts is applied on named entities and on the parts of speech of the words that compose the documents. As test data we are using the Reuters corpus of newspaper articles.
引用
收藏
页码:128 / 133
页数:6
相关论文
共 50 条
  • [1] Time Series Clustering Based on Singularity
    Chang, D.
    Ma, Y. F.
    Ding, X. L.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2017, 12 (06) : 790 - 802
  • [2] Time Series Clustering Based on Dynamic Time Warping
    Wang, Weizeng
    Lyu, Gaofan
    Shi, Yuliang
    Liang, Xun
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 487 - 490
  • [3] Time series clustering based on forecast densities
    Alonso, A. M.
    Berrendero, J. R.
    Hernandez, A.
    Justel, A.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) : 762 - 776
  • [4] MDL-based time series clustering
    Thanawin Rakthanmanon
    Eamonn J. Keogh
    Stefano Lonardi
    Scott Evans
    [J]. Knowledge and Information Systems, 2012, 33 : 371 - 399
  • [5] MDL-based time series clustering
    Rakthanmanon, Thanawin
    Keogh, Eamonn J.
    Lonardi, Stefano
    Evans, Scott
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 371 - 399
  • [6] Clustering time series based on dependence structure
    Zhang, Beibei
    An, Baiguo
    [J]. PLOS ONE, 2018, 13 (11):
  • [7] Time Series Forecasting Based on Weighted Clustering
    Lee, Chie-Hong
    Su, Yann-Yean
    Lin, Yu-Chun
    Lee, Shie-Jue
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA), 2017, : 421 - 425
  • [8] Document Semantic Distance based on the Time Series Model
    Matei, Liviu Sebastian
    Trausan Matu, Stefan
    [J]. 2016 15TH ROEDUNET CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH, 2016,
  • [9] Clustering Algorithm Based on Time Series Similarity to Web Data Clustering
    Yang Yan
    Yao Hua-Xiong
    Li Rong
    [J]. PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 1373 - 1377
  • [10] GARCH-based robust clustering of time series
    D'Urso, Pierpaolo
    De Giovanni, Livia
    Massari, Riccardo
    [J]. FUZZY SETS AND SYSTEMS, 2016, 305 : 1 - 28