Exploiting noun phrases and semantic relationships for text document clustering

被引:65
|
作者
Zheng, Hai-Tao [1 ]
Kang, Bo-Yeong [1 ]
Kim, Hong-Gee [1 ]
机构
[1] Seoul Natl Univ, Biomed Knowledge Engn Lab, Coll Dent BK21, Seoul 110810, South Korea
关键词
Ontology; WordNet; Text document clustering; Noun phrase; Hypernymy; Hyponymy; Holonymy; Meronymy; WORD SENSE DISAMBIGUATION; GENE-ONTOLOGY; MODEL;
D O I
10.1016/j.ins.2009.02.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text document clustering plays an important role in providing better document retrieval, document browsing, and text mining. Traditionally, clustering techniques do not consider the semantic relationships between words, such as synonymy and hypernymy. To exploit semantic relationships, ontologies such as WordNet have been used to improve clustering results. However, WordNet-based clustering methods mostly rely on single-term analysis of text; they do not perform any phrase-based analysis. In addition, these methods utilize synonymy to identify concepts and only explore hypernymy to calculate concept frequencies, without considering other semantic relationships such as hyponymy. To address these issues, we combine detection of noun phrases with the use of WordNet as background knowledge to explore better ways of representing documents semantically for clustering. First, based on noun phrases as well as single-term analysis, we exploit different document representation methods to analyze the effectiveness of hypernymy, hyponymy, holonymy, and metonymy. Second, we choose the most effective method and compare it with the WordNet-based clustering method proposed by others. The experimental results show the effectiveness of semantic relationships for clustering are (from highest to lowest): hypernymy, hyponymy, metonymy, and holonymy. Moreover, we found that noun phrase analysis improves the WordNet-based clustering method. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:2249 / 2262
页数:14
相关论文
共 50 条
  • [1] Text Document Clustering with Negative Noun Attributes
    Vijayalakshmi, S.
    Murugeswari, P.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2021, 14 (09): : 277 - 284
  • [2] Text document clustering using semantic neighbors
    Young Researchers Club, Jouybar Branch, Islamic Azad University, Jouybar, Iran
    [J]. J. Softw. Eng, 4 (136-144):
  • [3] Stylistic Effect of indefinite Noun phrases in a literary Text or Implicature by Avoiding definite Noun phrases
    Sumidai, Yasunori
    [J]. SPRACHWISSENSCHAFT, 2012, 37 (02): : 213 - 241
  • [4] A Review of the Semantic Interpretation of Bare Noun Phrases
    刘茜
    [J]. 科技信息, 2011, (04) : 139 - 140
  • [5] Reversing the semantic governance in French noun phrases
    Wauthion, Michel
    [J]. LINGUISTICAE INVESTIGATIONES, 2016, 39 (01): : 27 - 47
  • [6] An Approach for Text Mining Based on Noun Phrases
    Pinheiro, Marcello Sandi
    do Prado, Hercules Antonio
    Ferneda, Edilson
    Ladeira, Marcelo
    [J]. INTELLIGENT DECISION TECHNOLOGIES, 2015, 39 : 525 - 535
  • [7] Statistical recognition of noun phrases in unrestricted text
    Serrano, JI
    Araujo, L
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, 2005, 3646 : 397 - 408
  • [8] Semantic Dependency Labeling of Chinese Noun Phrases Based on Semantic Lexicon
    Li, Yimeng
    Shao, Yanqiu
    Yang, Hongkai
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 237 - 248
  • [9] Noun phrases in interactive query expansion and document ranking
    Olga Vechtomova
    [J]. Information Retrieval, 2006, 9 : 399 - 420
  • [10] Distributed Noun Attribute Based on its First Appearance for Text Document Clustering
    Vijayalakshmi, S.
    Manimegalai, D.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 780 - 784