Semantic Vector Space Model for Reducing Arabic Text Dimensionality

被引:0
|
作者
Awajun, Arafat [1 ]
机构
[1] Princess Sumava Univ Technol, Dept Comp Sci, Amman, Jordan
关键词
Semantic vector space model; word-context matrix; Arabic language processing; text dimension reduction;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce an efficient method to represent Arabic texts in comparatively smaller sizes without losing significant information. The proposed method uses the linguistic features of the Arabic language, mainly its very productive morphology and its richness in synonyms, to reduce the dimension of the document vector and to improve its vector space model representation. We have incorporated semantic information from word thesauri like WordNet to create clusters of similar words extracted from the same root and regrouped along with their synonyms. Distributional similarity measures are applied on the word-context matrix associated with the document in order to identify similar words based on a text's context. The experimental results have confirmed that the proposed method significantly reduces the size of text representation by about 20% compared with the stem-based vector space model and by about 40% compared with the traditional bag of words model.
引用
收藏
页码:129 / 135
页数:7
相关论文
共 50 条
  • [41] Arabic text clustering using improved clustering algorithms with dimensionality reduction
    Arun Kumar Sangaiah
    Ahmed E. Fakhry
    Mohamed Abdel-Basset
    Ibrahim El-henawy
    Cluster Computing, 2019, 22 : 4535 - 4549
  • [42] Reducing feature space dimensionality for image retrieval
    Lakdashti, Abolfazl
    Moin, M. Shahram
    Badie, Kambiz
    2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 305 - +
  • [43] Arabic text clustering using improved clustering algorithms with dimensionality reduction
    Sangaiah, Arun Kumar
    Fakhry, Ahmed E.
    Abdel-Basset, Mohamed
    El-henawy, Ibrahim
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S4535 - S4549
  • [44] Reducing the dimensionality of feature vector in quantitative cytological analysis
    Pang, Baochuan
    Lu, Yimin
    Xu, Duanquan
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2009, 37 (07): : 27 - 30
  • [45] Incorporating Syntactic Dependencies into Semantic Word Vector Model for Medical Text Processing
    Iyer, Maia
    Zou, Christopher
    Luo, Xiao
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 659 - 664
  • [46] Topic Detections in Arabic Dark Websites Using Improved Vector Space Model
    Alghamdi, Hanan M.
    Selamat, Ali
    2012 4TH CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2012, : 6 - 12
  • [48] DIMENSIONALITY REDUCING MODEL FOR DISTRIBUTED FILTERING
    ANGEL, E
    JAIN, AK
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1973, AC18 (01) : 59 - 62
  • [49] SemTree Ontology for Enriching Arabic Text with Lexical Semantic Annotations
    Al-Yahya, Maha
    Al-Shaman, Mona
    Al-Otaiby, Nehal
    Al-Sultan, Wafa
    Al-Zahrani, Asma
    Al-Dalbahie, Mesheal
    2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 167 - 168
  • [50] A Semantic Text Expansion for Paraphrasing Identification in Arabic Microblog Posts
    Al-Shboul, Bashar
    Al-Darras, Duha
    Al-Qudah, Dana
    PROCEEDINGS OF 2022 14TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS, MEDES 2022, 2022, : 129 - 135