Text documents streams with improved incremental similarity

被引:0
|
作者
Rui Portocarrero Sarmento
Douglas O. Cardoso
Kemmily Dearo
Pavel Brazdil
João Gama
机构
[1] FEUP,PRODEI
[2] University of Porto,LIAAD
[3] CEFET-RJ,undefined
[4] University of Twente,undefined
[5] INESC TEC,undefined
来源
关键词
Incremental sparse TF-IDF; Data streams; Text streams; Incremental similarity; Text documents networks;
D O I
暂无
中图分类号
学科分类号
摘要
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.
引用
收藏
相关论文
共 50 条
  • [21] Mapping Texts Into Graphs: An Improved Text Similarity Algorithm
    Liu, Zuoguo
    Chen, Xiaorong
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1357 - 1361
  • [22] An Improved Incremental Queue Association Rules for Mining Mass Text
    Yang, Wenchuan
    Hui, Lei
    Zhang, Dong
    Fu, Yimin
    2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 447 - 450
  • [23] Classification of RSS-formatted documents using full text similarity measures
    Wegrzyn-Wolska, K
    Szczepaniak, PS
    WEB ENGINEERING, PROCEEDINGS, 2005, 3579 : 400 - 405
  • [24] Improved Semantic Similarity Method Based on HowNet for Text Clustering
    Nie, Hongmei
    Zhou, Jiaqing
    Guo, Qi
    Huang, Zhiqi
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 266 - 269
  • [25] Inferring Intra-organizational Collaboration from Cosine Similarity Distributions in Text Documents
    Esteva, Maria
    Bi, Hai
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 385 - 385
  • [26] Feature Extraction for Co-Occurrence-Based Cosine Similarity Score of Text Documents
    Kadhim, Ammar Ismael
    Cheah, Yu-N
    Ahamed, Nurul Hashimah
    Salman, Lubab A.
    2014 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2014,
  • [27] A method for variable quantization in JPEG for improved text quality in compound documents
    Konstantinides, K
    Tretter, D
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 2, 1998, : 565 - 568
  • [28] INCREMENTAL SIMILARITY AND TURBULENCE
    Barndorff-Nielsen, O. E.
    Hedevang, E.
    Schmiegel, J.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2017, 61 (03) : 482 - +
  • [29] An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure
    Huang, Cheng-hui
    Yin, Jian
    Han, Dong
    INFORMATION COMPUTING AND APPLICATIONS, PT 2, 2010, 106 : 150 - +
  • [30] AN IMPROVED TEXT SIMILARITY ALGORITHM RESEARCH FOR CLINICAL DECISION SUPPORT SYSTEM
    Wang, Wen
    Jiang, Qiaowei
    Lv, Tao
    Guo, Wenming
    Wang, Cong
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 155 - 159