Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines

被引:0
|
作者
Wray, Samantha [1 ]
机构
[1] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
基金
美国国家科学基金会;
关键词
text classification; validation of language resources; language identification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Colloquial dialects of Arabic can be roughly categorized into five groups based on relatedness and geographic location (Egyptian, North African/Maghrebi, Gulf, Iraqi, and Levantine), but given that all dialects utilize much of the same writing system and share overlapping features and vocabulary, dialect identification and text classification is no trivial task. Furthermore, text classification by dialect is often performed at a coarse-grained level into these five groups or a subset thereof, and there is little work on sub-dialectal classification. The current study utilizes an n-gram based SVM to classify on a fine-grained sub-dialectal level, and compares it to methods used in dialect classification such as vocabulary pruning of shared items across dialects. A test case of the dialect Levantine is presented here, and results of 65% accuracy on a four-way classification experiment to sub-dialects of Levantine (Jordanian, Lebanese, Palestinian and Syrian) are presented and discussed. This paper also examines the possibility of leveraging existing mixed-dialectal resources to determine their sub-dialectal makeup by automatic classification.
引用
收藏
页码:3671 / 3674
页数:4
相关论文
共 50 条
  • [1] Text and Speech-based Tunisian Arabic Sub-Dialects Identification
    Ben Abdallah, Najla
    Kchaou, Sameh
    Bougares, Fethi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6405 - 6411
  • [2] A Study on GPS GDOP Approximation Using Support-Vector Machines
    Wu, Chih-Hung
    Su, Wei-Han
    Ho, Ya-Wei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2011, 60 (01) : 137 - 145
  • [3] Classifying seismograms using the FastMap algorithm and support-vector machines
    Malcolm C. A. White
    Kushal Sharma
    Ang Li
    T. K. Satish Kumar
    Nori Nakata
    Communications Engineering, 2 (1):
  • [4] Automatic document metadata extraction using support-vector machines
    Han, H
    Giles, CL
    Manavoglu, E
    Zha, HY
    Zhang, ZY
    Fox, EA
    2003 JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2003, : 37 - 48
  • [5] Arabic Question Classification Using Support Vector Machines and Convolutional Neural Networks
    Aouichat, Asma
    Ameur, Mohamed Seghir Hadj
    Geussoum, Ahmed
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 113 - 125
  • [6] On-Chip Voltage-Droop Prediction Using Support-Vector Machines
    Ye, Fangming
    Firouzi, Farshad
    Yang, Yang
    Chakrabarty, Krishnendu
    Tahoori, Mehdi B.
    2014 IEEE 32ND VLSI TEST SYMPOSIUM (VTS), 2014,
  • [7] Land-use-change modeling using unbalanced support-vector machines
    Huang, Bo
    Xie, Chenglin
    Tay, Richard
    Wu, Bo
    ENVIRONMENT AND PLANNING B-PLANNING & DESIGN, 2009, 36 (03): : 398 - 416
  • [8] Effective Functional Mapping of fMRI Data with Support-Vector Machines
    Lee, Sangkyun
    Halder, Sebastian
    Kuebler, Andrea
    Birbaumer, Niels
    Sitaram, Ranganatha
    HUMAN BRAIN MAPPING, 2010, 31 (10) : 1502 - 1511
  • [9] Bag classification using support vector machines
    Kartoun, Uri
    Stern, Helman
    Edan, Yael
    APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 665 - 674
  • [10] Wafer Classification Using Support Vector Machines
    Baly, Ramy
    Hajj, Hazem
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2012, 25 (03) : 373 - 383