Semantic Vector Space Model for Reducing Arabic Text Dimensionality

被引:0
|
作者
Awajun, Arafat [1 ]
机构
[1] Princess Sumava Univ Technol, Dept Comp Sci, Amman, Jordan
关键词
Semantic vector space model; word-context matrix; Arabic language processing; text dimension reduction;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we introduce an efficient method to represent Arabic texts in comparatively smaller sizes without losing significant information. The proposed method uses the linguistic features of the Arabic language, mainly its very productive morphology and its richness in synonyms, to reduce the dimension of the document vector and to improve its vector space model representation. We have incorporated semantic information from word thesauri like WordNet to create clusters of similar words extracted from the same root and regrouped along with their synonyms. Distributional similarity measures are applied on the word-context matrix associated with the document in order to identify similar words based on a text's context. The experimental results have confirmed that the proposed method significantly reduces the size of text representation by about 20% compared with the stem-based vector space model and by about 40% compared with the traditional bag of words model.
引用
收藏
页码:129 / 135
页数:7
相关论文
共 50 条
  • [21] Text representation combining syntax in vector space model
    Liu P.-Y.
    Yang Y.-Z.
    Zhao J.
    Advances in Information Sciences and Service Sciences, 2011, 3 (07): : 251 - 259
  • [22] Summarization of Text Clustering based Vector Space Model
    Chen, Mingzhen
    Song, Yu
    2009 IEEE 10TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1-3: E-BUSINESS, CREATIVE DESIGN, MANUFACTURING - CAID&CD'2009, 2009, : 2362 - 2365
  • [23] Reducing Dimensionality to Improve Search in Semantic Genetic Programming
    Oliveira, Luiz Otavio V. B.
    Miranda, Luis F.
    Pappa, Gisele L.
    Otero, Fernando E. B.
    Takahashi, Ricardo H. C.
    PARALLEL PROBLEM SOLVING FROM NATURE - PPSN XIV, 2016, 9921 : 375 - 385
  • [24] On a New Model for Automatic Text Categorization Based on Vector Space Model
    Suzuki, Makoto
    Yamagishi, Naohide
    Ishidat, Takashi
    Gotot, Masayuki
    Hirasawa, Shigeichi
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3152 - 3159
  • [25] On a new model for automatic text categorization based on vector space model
    Faculty of Information Science, Shonan Institute of Technology, 1-1-25 Tsujido Nishikaigan, Fujisawa, Kanagawa, 251-8511, Japan
    不详
    不详
    Conf. Proc. IEEE Int. Conf. Syst. Man Cybern., 2010, (3152-3159):
  • [26] Vector Space Model for Arabic Information Retrieval - Application to "Hadith" Indexing
    Harrag, Fouzi
    Hamdi-Cherif, Aboubekeur
    El-Qawasmeh, Eyas
    2008 FIRST INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES, VOLS 1 AND 2, 2008, : 114 - +
  • [27] NGram Approach for Semantic Similarity on Arabic Short Text
    Al-Mahmoud, Rana Husni
    Sharieh, Ahmad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
  • [28] Arabic text semantic-based query expansion
    Yusuf, Nuhu
    Yunus, Mohd Amin Mohd
    Wahid, Norfaradilla
    Mustapha, Aida
    Nawi, Nazri Mohd
    Samsudin, Noor Azah
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2022, 14 (01) : 30 - 40
  • [29] A Text Semantic Similarity Approach for Arabic Paraphrase Detection
    Mahmoud, Adnen
    Zrigui, Ahmed
    Zrigui, Mounir
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 338 - 349
  • [30] Graph-based Arabic text semantic representation
    Etaiwi, Wael
    Awajan, Arafat
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)