A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

被引:0
|
作者
Cheng, Xin [1 ]
Miao, Duoqian [1 ]
Wang, Lei [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
关键词
WORDNET;
D O I
10.1007/978-3-319-11740-9_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.
引用
收藏
页码:332 / 342
页数:11
相关论文
共 50 条
  • [31] Adaptive statistics-based image enhancement
    Basallo, EG
    Looney, CG
    PROCEEDINGS OF THE ISCA 12TH INTERNATIONAL CONFERENCE INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2003, : 67 - 70
  • [32] Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings
    Al-Sabahi, Kamal
    Zhang Zuping
    Kang, Yang
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (01): : 254 - 276
  • [33] Cost-Effectiveness of a Statistics-Based Approach to Developmental Mathematics Education
    Finster, Matthew
    Feldman, Jill
    JOURNAL OF COLLEGE STUDENT RETENTION-RESEARCH THEORY & PRACTICE, 2023, 25 (03) : 533 - 553
  • [34] Multiscale modeling of polymer materials using a statistics-based micromechanics approach
    Valavala, P. K.
    Clancy, T. C.
    Odegard, G. M.
    Gates, T. S.
    Aifantis, E. C.
    ACTA MATERIALIA, 2009, 57 (02) : 525 - 532
  • [35] Cyclic statistics-based parametric approach to time-delay estimation
    Zhang, Y
    Wang, CM
    Wang, SX
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2002, 21 (06) : 535 - 545
  • [36] A Combined K-Mean Semantic Approach for the Implicit Document Clustering
    Rehna, R. S.
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 535 - 544
  • [37] Statistics-based Diagnostics of Brain Tumors
    Marcon, Petr
    Bartusek, Karel
    Dohnal, Premysl
    Siruckova, Katerina
    INTERNATIONAL INTERDISCIPLINARY PHD WORKSHOP 2016, 2016, : 92 - 96
  • [38] Statistics-based research - a pig in a poke?
    Penston, James
    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2011, 17 (05) : 862 - 867
  • [39] Statistics-based LINC Amplifier Calibration
    Huang, Xinping
    Caron, Mario
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 1247 - 1250
  • [40] On the Optimality of Sufficient Statistics-Based Quantizers
    Dulek, Berkan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3567 - 3573