Semantic Vector Space Model for Reducing Arabic Text Dimensionality

被引:0
|
作者
Awajun, Arafat [1 ]
机构
[1] Princess Sumava Univ Technol, Dept Comp Sci, Amman, Jordan
关键词
Semantic vector space model; word-context matrix; Arabic language processing; text dimension reduction;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce an efficient method to represent Arabic texts in comparatively smaller sizes without losing significant information. The proposed method uses the linguistic features of the Arabic language, mainly its very productive morphology and its richness in synonyms, to reduce the dimension of the document vector and to improve its vector space model representation. We have incorporated semantic information from word thesauri like WordNet to create clusters of similar words extracted from the same root and regrouped along with their synonyms. Distributional similarity measures are applied on the word-context matrix associated with the document in order to identify similar words based on a text's context. The experimental results have confirmed that the proposed method significantly reduces the size of text representation by about 20% compared with the stem-based vector space model and by about 40% compared with the traditional bag of words model.
引用
收藏
页码:129 / 135
页数:7
相关论文
共 50 条
  • [1] Semantic similarity based approach for reducing Arabic texts dimensionality
    Awajan, Arafat
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (02) : 191 - 201
  • [2] Text Similarity Algorithm Based on Semantic Vector Space Model
    Xu, LiHong
    Sun, ShuTao
    Wang, Qi
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1193 - 1196
  • [3] Text classification model based on semantic pattern vector space
    Wang, Xiaoyue
    Hu, Zewen
    Li, Yuping
    Journal of Information and Computational Science, 2010, 7 (11): : 2302 - 2311
  • [4] Reducing the dimensionality of vector space embeddings of graphs
    Riesen, Kaspar
    Kilchherr, Vivian
    Bunke, Horst
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 563 - +
  • [5] A Chinese text classification model based on vector space and semantic meaning
    Wang, BY
    Zhang, SM
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1141 - 1145
  • [6] Word sense disambiguation for Arabic text using Wikipedia and Vector Space Model
    Alian, Marwah
    Awajan, Arafat
    Al-Kouz, Akram
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 857 - 867
  • [7] Classify Arabic Text using Vector Space Models
    Hanandeh, Essam S.
    abu Awwad, Aref
    Khassawneh, Yazan
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 465 - 476
  • [8] Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (01) : 105 - 115
  • [9] A Model for Generating Arabic Text from Semantic Representation
    Ismail, Sally S.
    Aref, Mostafa
    Moawad, Ibrahim F.
    2015 11TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2015, : 117 - 122
  • [10] Text summarization using topic-based vector space model and semantic measure
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)