Text documents streams with improved incremental similarity

被引:0
|
作者
Rui Portocarrero Sarmento
Douglas O. Cardoso
Kemmily Dearo
Pavel Brazdil
João Gama
机构
[1] FEUP,PRODEI
[2] University of Porto,LIAAD
[3] CEFET-RJ,undefined
[4] University of Twente,undefined
[5] INESC TEC,undefined
来源
关键词
Incremental sparse TF-IDF; Data streams; Text streams; Incremental similarity; Text documents networks;
D O I
暂无
中图分类号
学科分类号
摘要
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.
引用
收藏
相关论文
共 50 条
  • [1] Text documents streams with improved incremental similarity
    Sarmento, Rui Portocarrero
    O. Cardoso, Douglas
    Dearo, Kemmily
    Brazdil, Pavel
    Gama, Joao
    SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [2] CFTDISM:Clustering Financial Text Documents Using Improved Similarity Measure
    Srikanth, Panigrahi
    Deverapalli, Dharmaiah
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 865 - 868
  • [3] A Comparison of Similarity Measures for Text Documents
    Hariharan, Shanmugasundaram
    Srinivasan, Rengaramanujam
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2008, 7 (01) : 1 - 8
  • [4] Text mining: identification of similarity of text documents using hybrid similarity model
    K. M. Shiva Prasad
    Iran Journal of Computer Science, 2023, 6 (2) : 123 - 135
  • [5] Continuous Similarity Search for Dynamic Text Streams
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (12) : 2026 - 2035
  • [6] Incremental autoencoders for text streams clustering in social networks
    Rekik, Amal
    Jamoussi, Salma
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (11) : 1203 - 1221
  • [7] Improved VSM for Incremental Text Classification
    Yang, Zhen
    Lei, Jianjun
    Wang, Jian
    Zhang, Xing
    Guo, Jun
    INTERNATIONAL ELECTRONIC CONFERENCE ON COMPUTER SCIENCE, 2008, 1060 : 369 - +
  • [8] Similarity Detection between Turkish Text Documents with Distance Metrics
    Kaya Keles, Mumine
    Ozel, Selma Ayse
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 316 - 321
  • [9] Ontology-based similarity between text documents on manifold
    Wen, Guihua
    Jiang, Lijun
    Shadbolt, Nigel R.
    SEMANTIC WEB - ASWC 2006, PROCEEDINGS, 2006, 4185 : 113 - 125
  • [10] Similarity retrieval of web documents considering both text and style
    Chen, CC
    Chung, YC
    Chien, CC
    Lee, C
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 620 - 629