WordNet-based lexical semantic classification for text corpus analysis

被引:0
|
作者
Jun Long
Lu-da Wang
Zu-de Li
Zu-ping Zhang
Liu Yang
机构
[1] Central South University,School of Information Science and Engineering
[2] Central South University,School of Software
来源
关键词
document representation; lexical semantic content; classification; eigenvector;
D O I
暂无
中图分类号
学科分类号
摘要
Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic VSM, to solve the problem. Using WordNet, this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performs F1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.
引用
收藏
页码:1833 / 1840
页数:7
相关论文
共 50 条
  • [41] Intelligent information access by learning WordNet-based user profiles
    Degemmis, M
    Lops, P
    Semeraro, G
    [J]. AI*IA2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3673 : 78 - 81
  • [42] Semantic Similarity Analysis for Examination Questions Classification Using WordNet
    Goh, Thing Thing
    Jamaludin, Nor Azliana Akmal
    Mohamed, Hassan
    Ismail, Mohd Nazri
    Chua, Huangshen
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [43] Semantic and Lexical Text Analyzer
    Severt, Marcos
    Martin, Alvaro
    Martin, David
    Perez, Daniel
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2018, 7 (04): : 27 - 34
  • [44] Translations as semantic mirrors: from parallel corpus to wordnet
    Dyvik, H
    [J]. ADVANCES IN CORPUS LINGUISTICS, 2004, (49): : 311 - 326
  • [45] WSSM : A WordNet-Based Web Service Similarity Mining Mechanism
    Qu, Xianyang
    Sun, Hailong
    Li, Xiang
    Liu, Xudong
    Lin, Wei
    [J]. 2009 COMPUTATION WORLD: FUTURE COMPUTING, SERVICE COMPUTATION, COGNITIVE, ADAPTIVE, CONTENT, PATTERNS, 2009, : 339 - 345
  • [46] A method for integration of WordNet-based ontologies using distance measures
    Duong, Trong Hai
    Nguyen, Ngoc Thanh
    Jo, Geun Sik
    [J]. KNOWLEDGE - BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2008, 5177 : 210 - +
  • [47] Using WordNet-Based Neighborhood for Improving Social Tag Recommendation
    Zhu, Ya-Tao
    Liu, Sheng-Hua
    Cheng, Xue-Qi
    Liu, Yue
    Wang, Yuan-Zhuo
    Liu, Jin-Gang
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2012, 2012, 7390 : 221 - 228
  • [48] Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study
    Mukherjee, Partha
    Leroy, Gondy
    Kauchak, David
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (05) : 2164 - 2173
  • [49] Lexical and grammatical semantics: a corpus-based statistical study of lexical semantic groups
    Norman, B. Yu
    Mukhin, M. Yu
    [J]. SIBIRSKII FILOLOGICHESKII ZHURNAL, 2018, (03): : 178 - 191
  • [50] Web Text Classification Based on Improved Latent Semantic Analysis
    Wang, Lan
    Wan, Yuan
    [J]. 2011 SECOND ETP/IITA CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2011), VOL 1, 2011, : 176 - 179