WordNet-based lexical semantic classification for text corpus analysis

被引:5
|
作者
Long Jun [1 ]
Wang Lu-da [1 ]
Li Zu-de [1 ]
Zhang Zu-ping [1 ]
Yang Liu [2 ]
机构
[1] Cent S Univ, Sch Informat Sci & Engn, Changsha 410075, Hunan, Peoples R China
[2] Cent S Univ, Sch Software, Changsha 410075, Hunan, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
document representation; lexical semantic content; classification; eigenvector; IDF;
D O I
10.1007/s11771-015-2702-8
中图分类号
TF [冶金工业];
学科分类号
0806 ;
摘要
Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic VSM, to solve the problem. Using WordNet, this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.
引用
收藏
页码:1833 / 1840
页数:8
相关论文
共 50 条
  • [21] Combining Lexical and Semantic Features for Short Text Classification
    Yang, Lili
    Li, Chunping
    Ding, Qiang
    Li, Li
    [J]. 17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 78 - 86
  • [22] Extract Semantic Information from WordNet to Improve Text Classification Performance
    Bai, Rujiang
    Wang, Xiaoyue
    Liao, Junhua
    [J]. ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2010, 6059 : 409 - 420
  • [23] WordNet-Based Suffix Tree Clustering Algorithm
    Dang, Qiuyue
    Zhang, Jiwei
    Lu, Yueming
    Zhang, Kuo
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMPUTER APPLICATIONS (ICSA 2013), 2013, 92 : 66 - 74
  • [24] REVIEW ON WORDNET-BASED ONTOLOGY CONSTRUCTION IN CHINA
    Zhang, Fei
    Liu, Wuying
    Bi, Yude
    [J]. INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2013, 6 (02): : 630 - 647
  • [25] Corpus-based Semantic Relatedness for the Construction of Polish WordNet
    Broda, Bartosz
    Derwojedowa, Magdalena
    Piasecki, Maciej
    Szpakowicz, Stanislaw
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1800 - 1807
  • [26] A WordNet-Based indexing technique for geographical information retrieval
    Buscaldi, Davide
    Rosso, Paolo
    Sanchis, Emilio
    [J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 954 - +
  • [27] WordNet-based Summarization to Enhance Learning Interaction Tutoring
    Carbonaro, Antonella
    [J]. JOURNAL OF E-LEARNING AND KNOWLEDGE SOCIETY, 2010, 6 (02): : 67 - 74
  • [28] Abstractive Summarization Improved by WordNet-Based Extractive Sentences
    Xie, Niantao
    Li, Sujian
    Ren, Huiling
    Zhai, Qibin
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 404 - 415
  • [29] Wordnet-Based Criminal Networks Mining for Cybercrime Investigation
    Iqbal, Farkhund
    Fung, Benjamin C. M.
    Debbabi, Mourad
    Batool, Rabia
    Marrington, Andrew
    [J]. IEEE ACCESS, 2019, 7 : 22740 - 22755
  • [30] A WordNet-Based Natural Language Interface to Relational Databases
    Li, Hu
    Shi, Yong
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 514 - 518