A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

被引:0
|
作者
Cheng, Xin [1 ]
Miao, Duoqian [1 ]
Wang, Lei [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
关键词
WORDNET;
D O I
10.1007/978-3-319-11740-9_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.
引用
收藏
页码:332 / 342
页数:11
相关论文
共 50 条
  • [41] Statistics-Based Approach to Enable Consumer Profile Definition for Demand Response Programs
    Fernandes, R. A. S.
    Deus, L. O.
    Gomes, L.
    Valel, Z.
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2018, 620 : 63 - 70
  • [42] A novel ant-based clustering approach for document clustering
    He, Yulan
    Hui, Sin Cheung
    Sim, Yongxiang
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 537 - 544
  • [43] An approach to document clustering based on system relevance
    Desai, M
    Spink, A
    ASIST 2004: PROCEEDINGS OF THE 67TH ASIS&T ANNUAL MEETING, VOL 41, 2004: MANAGING AND ENHANCING INFORMATION: CULTURES AND CONFLICTS, 2004, 41 : 256 - 266
  • [44] Model-based document categorization employing semantic pattern analysis and local structure clustering
    Fume, Kosei
    Ishitani, Yasuto
    DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [45] Statistics-Based Music Generation Approach Considering Both Rhythm and Melody Coherence
    Goienetxea, Izaro
    Mendialdua, Inigo
    Rodriguez, Igor
    Sierra, Basilio
    IEEE ACCESS, 2019, 7 : 183365 - 183382
  • [46] Statistics-based segmentation using a continuous-scale naive Bayes approach
    Laursen, Morten Stigaard
    Midtiby, Henrik Skov
    Kruger, Norbert
    Jorgensen, Rasmus Nyholm
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2014, 109 : 271 - 277
  • [47] Packet Out-of-order and Retransmission in Statistics-based Traffic Analysis
    Lee, Su-Kang
    Ahn, Hyun-Min
    Kim, Myung-Sup
    2014 16TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2014,
  • [48] Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation
    Azimi, Maryam
    Kamrani, Ali K.
    Smadi, Hazem J.
    JOURNAL OF HEALTHCARE ENGINEERING, 2012, 3 (04) : 571 - 586
  • [49] A New Approach for Multi-Document Summarization based on Latent Semantic Analysis
    Xiong, Shuchu
    Luo, Yihui
    2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 177 - 180
  • [50] STATISTICS-BASED APPROACH TO WASTE-WATER TREATMENT-PLANT OPERATIONS
    BERTHOUEX, PM
    LAI, WJ
    DARJATMOKO, A
    JOURNAL OF ENVIRONMENTAL ENGINEERING-ASCE, 1989, 115 (03): : 650 - 674