A Statistics-Based Semantic Relation Analysis Approach for Document Clustering

被引:0
|
作者
Cheng, Xin [1 ]
Miao, Duoqian [1 ]
Wang, Lei [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
关键词
WORDNET;
D O I
10.1007/978-3-319-11740-9_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their co-occurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.
引用
收藏
页码:332 / 342
页数:11
相关论文
共 50 条
  • [1] A statistics-based method for video semantic analysis
    Wei, Wei
    Yue, Zhen-Xia
    Huang, Min
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1620 - 1625
  • [2] Document clustering based on semantic smoothing approach
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Huang, Zhilan
    ADVANCES IN INTELLIGENT WEB MASTERING, 2007, 43 : 217 - +
  • [3] A statistics-based approach to control the quality of subclusters in incremental gravitational clustering
    Chen, CY
    Hwang, SC
    Oyang, YJ
    PATTERN RECOGNITION, 2005, 38 (12) : 2256 - 2269
  • [4] A Statistics-Based Semantic Textual Entailment System
    Pakray, Partha
    Barman, Utsab
    Bandyopadhyay, Sivaji
    Gelbukh, Alexander
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PT I, 2011, 7094 : 267 - +
  • [5] WordNet and Semantic Similarity based Approach for Document Clustering
    Desai, Sneha S.
    Laxminarayana, J. A.
    2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 312 - 317
  • [6] Clustering in Wireless Propagation Channel with a Statistics-based Framework
    Li, Yupeng
    Zhang, Jianhua
    Ma, Zhanyu
    2018 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2018,
  • [7] A statistics-based approach to binary image registration with uncertainty analysis
    Simonson, Katherine M.
    Drescher, Steven M., Jr.
    Tanner, Franklin R.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (01) : 112 - 125
  • [8] A STATISTICS-BASED APPROACH FOR SINGLE IMAGE DEHAZING
    Bui, Trung Minh
    Kim, Wonha
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [9] Semantic document clustering based on ontology
    Wang, Ying
    Peng, Tao
    Zuo, Wanli
    He, Fengling
    Wang, Dong
    Journal of Computational Information Systems, 2009, 5 (03): : 1437 - 1444
  • [10] A Survey of Document Clustering using Semantic Approach
    Saiyad, Nagma Y.
    Prajapati, Harshadkumar B.
    Dabhi, Vipul K.
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2555 - 2562