A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

被引:0
|
作者
Cheng, Xin [1 ]
Miao, Duoqian [1 ]
Wang, Lei [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
关键词
WORDNET;
D O I
10.1007/978-3-319-11740-9_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.
引用
收藏
页码:332 / 342
页数:11
相关论文
共 50 条
  • [21] Web document clustering using semantic link analysis
    Arch-int, Somjit
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 13 - 18
  • [22] STATISTICS-BASED GAS SENSOR
    Khan, Shakir-ul Haque
    Banerjee, Aishwaryadev
    Broadbent, Samuel
    Bulbul, Ashrafuzzaman
    Simmons, Michelle Camilla
    Kim, Kyeong Heon
    Mastrangelo, Carlos H.
    Looper, Ryan
    Kim, Hanseup
    2019 IEEE 32ND INTERNATIONAL CONFERENCE ON MICRO ELECTRO MECHANICAL SYSTEMS (MEMS), 2019, : 137 - 140
  • [23] Semantic smoothing for model-based document clustering
    Zhang, Xiaodan
    Zhou, Xiaohua
    Hu, Xiaohua
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 1193 - +
  • [24] A link-based approach to semantic relation analysis
    Cheng, Xin
    Miao, Duoqian
    Wang, Can
    NEUROCOMPUTING, 2015, 154 : 127 - 138
  • [25] A Survey on Semantic Document Clustering
    Naik, Maitri P.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [26] A Cyclic Statistics-Based Parametric Approach to Time-Delay Estimation
    Yan Zhang
    Chunmei Wang
    Shuxun Wang
    Circuits, Systems and Signal Processing, 2002, 21 : 535 - 545
  • [27] A Statistics-based Approach of Contextualization for Adverse Drug Events Detection and Prevention
    Chazard, Emmanuel
    Bernonville, Stephanie
    Ficheur, Gregoire
    Beuscart, Regis
    QUALITY OF LIFE THROUGH QUALITY OF INFORMATION, 2012, 180 : 766 - 770
  • [28] Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis
    Seshadri, Karthick
    Iyer, K. Viswanathan
    Shalinie, Mercy S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (13):
  • [29] Statistics-based noise analysis for vibration-based damage identification
    Yu, L.
    Yin, T.
    Zhu, H. P.
    PROCEEDINGS OF THE 8TH BIENNIAL CONFERENCE ON ENGINEERING SYSTEMS DESIGN AND ANALYSIS, VOL 3, 2006, : 533 - 539
  • [30] Statistics-based Workload Modeling for MapReduce
    Yang, Hailong
    Luan, Zhongzhi
    Li, Wenjun
    Qian, Depei
    Guan, Gang
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2043 - 2051