Text documents streams with improved incremental similarity

被引:0
|
作者
Rui Portocarrero Sarmento
Douglas O. Cardoso
Kemmily Dearo
Pavel Brazdil
João Gama
机构
[1] FEUP,PRODEI
[2] University of Porto,LIAAD
[3] CEFET-RJ,undefined
[4] University of Twente,undefined
[5] INESC TEC,undefined
来源
关键词
Incremental sparse TF-IDF; Data streams; Text streams; Incremental similarity; Text documents networks;
D O I
暂无
中图分类号
学科分类号
摘要
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.
引用
收藏
相关论文
共 50 条
  • [31] Text recognition method of electrical equipment nameplate based on improved similarity
    Huang Chengwenyuan
    Zhang Shaowei
    He Xin
    Wang Jianyu
    Zhao Haochen
    2021 INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SOCIAL INTELLIGENCE (ICCSI), 2021,
  • [32] Improved incremental local outlier detection for data streams based on the landmark window model
    Aihua Li
    Weijia Xu
    Zhidong Liu
    Yong Shi
    Knowledge and Information Systems, 2021, 63 : 2129 - 2155
  • [33] Improved incremental local outlier detection for data streams based on the landmark window model
    Li, Aihua
    Xu, Weijia
    Liu, Zhidong
    Shi, Yong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (08) : 2129 - 2155
  • [34] Incremental classification of invoice documents
    Hamza, Hatem
    Belaid, Yolande
    Belaid, Abdel
    Chaudhuri, Bidyut B.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2719 - 2722
  • [35] Incremental validation of XML documents
    Papakonstantinou, Y
    Vianu, V
    DATABASE THEORY ICDT 2003, PROCEEDINGS, 2003, 2572 : 47 - 63
  • [36] Incremental validation of XML documents
    Balmin, A
    Papakonstantinou, Y
    Vianu, V
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2004, 29 (04): : 710 - 751
  • [37] An improved incremental learning algorithm for text categorization using support vector machine
    Cao, Jianfang
    Wang, Hongbin
    Journal of Chemical and Pharmaceutical Research, 2014, 6 (06) : 210 - 217
  • [38] Privacy in Text Documents
    Dias, Mariana
    Ferreira, Joao C.
    Maia, Rui
    Santos, Pedro
    Ribeiro, Ricardo
    EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT THROUGH VISION 2020, 2019, : 2551 - 2560
  • [39] Marking text documents
    Maxemchuk, NF
    Low, S
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL III, 1997, : 13 - 13
  • [40] Classification of text documents
    Li, YH
    Jain, AK
    COMPUTER JOURNAL, 1998, 41 (08): : 537 - 546