Text documents streams with improved incremental similarity

被引:0
|
作者
Rui Portocarrero Sarmento
Douglas O. Cardoso
Kemmily Dearo
Pavel Brazdil
João Gama
机构
[1] FEUP,PRODEI
[2] University of Porto,LIAAD
[3] CEFET-RJ,undefined
[4] University of Twente,undefined
[5] INESC TEC,undefined
来源
关键词
Incremental sparse TF-IDF; Data streams; Text streams; Incremental similarity; Text documents networks;
D O I
暂无
中图分类号
学科分类号
摘要
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.
引用
收藏
相关论文
共 50 条
  • [41] Classification of text documents
    Li, YH
    Jain, AK
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1295 - 1297
  • [42] Rating news documents for similarity
    Watters, C
    Wang, H
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 2000, 51 (09): : 793 - 804
  • [43] Development of Optimized Linguistic Technique Using Similarity Score on BERT Model in Summarizing Hindi Text Documents
    Rajeshwari, S. B.
    Kallimani, Jagadish S.
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 767 - 781
  • [44] Rating news documents for similarity
    Watters, Carolyn, 2000, John Wiley and Sons Inc. (51):
  • [45] Syntactic similarity of web documents
    Pereira, AR
    Ziviani, N
    FIRST LATIN AMERICAN WEB CONGRESS, PROCEEDINGS, 2003, : 194 - 200
  • [46] Incremental constraint checking for XML documents
    Abrao, MA
    Bouchou, B
    Ferrari, MH
    Laurent, D
    Musicante, MA
    DATABASE AND XML TECHNOLOGIES, PROCEEDINGS, 2004, 3186 : 112 - 127
  • [47] Updates and incremental validation of XML documents
    Bouchou, A
    Alves, MHF
    DATABASE PROGRAMMING LANGUAGES, 2004, 2921 : 216 - 232
  • [48] Efficient incremental validation of XML documents
    Barbosa, D
    Mendelzon, AO
    Libkin, L
    Mignet, L
    Arenas, M
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 671 - 682
  • [49] Incremental modelling for compositional data streams
    Wei, Yuan
    Wang, Huiwen
    Wang, Shanshan
    Saporta, Gilbert
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (08) : 2229 - 2243
  • [50] An Incremental Classifier from Data Streams
    Pratama, Mahardhika
    Anavatti, Sreenatha G.
    Lughofer, Edwin
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 15 - 28