Word Net-based lexical semantic classification for text corpus analysis

被引:0
|
作者
龙军 [1 ]
王鲁达 [1 ]
李祖德 [1 ]
张祖平 [1 ]
杨柳 [2 ]
机构
[1] School of Information Science and Engineering Central South University
[2] School of Software Central South University
基金
中国国家自然科学基金; 高等学校博士学科点专项科研基金;
关键词
document representation; lexical semantic content; classification; eigenvector;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors.This work proposed a document representation method, Word Net-based lexical semantic VSM, to solve the problem. Using Word Net,this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.
引用
收藏
页码:1833 / 1840
页数:8
相关论文
共 50 条
  • [1] WordNet-based lexical semantic classification for text corpus analysis
    Jun Long
    Lu-da Wang
    Zu-de Li
    Zu-ping Zhang
    Liu Yang
    [J]. Journal of Central South University, 2015, 22 : 1833 - 1840
  • [2] WordNet-based lexical semantic classification for text corpus analysis
    Long Jun
    Wang Lu-da
    Li Zu-de
    Zhang Zu-ping
    Yang Liu
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2015, 22 (05) : 1833 - 1840
  • [3] Effect of Semantic Differences in Word Net-Based Similarity Measures
    Ernesto Menendez-Mora, Raul
    Ichise, Ryutaro
    [J]. TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT II, PROCEEDINGS, 2010, 6097 : 545 - 554
  • [4] A word net-based approach to a corpus-supported modeling terminology
    Beisswenger, Michael
    [J]. ZEITSCHRIFT FUR GERMANISTISCHE LINGUISTIK, 2010, 38 (03): : 346 - 369
  • [5] Text categorization algorithms using semantic approaches, corpus-based thesaurus and Word Net
    Li, Cheng Hua
    Yang, Ju Cheng
    Park, Soon Cheol
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 765 - 772
  • [6] Incorporating How Net-Based Semantic Relatedness Into Chinese Word Sense Disambiguation
    Zhou, Qiaoli
    Yue, Gu
    Meng, Yuguang
    [J]. CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 359 - 370
  • [7] Academic text classification based on lexical-semantic content
    Venegas, Rene
    [J]. REVISTA SIGNOS, 2007, 40 (63): : 239 - 271
  • [8] Cooperative Word Net Editor for Lexical Semantic Acquisition
    Szymanski, Julian
    [J]. KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, 2011, 128 : 187 - 196
  • [9] Lexical semantic techniques for corpus analysis
    Pustejovsky, James
    Bergler, Sabine
    Anick, Peter
    [J]. Computational Linguistics, 1993, 19 (02)
  • [10] Combining Lexical and Semantic Features for Short Text Classification
    Yang, Lili
    Li, Chunping
    Ding, Qiang
    Li, Li
    [J]. 17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 78 - 86