Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines

被引:0
|
作者
Wray, Samantha [1 ]
机构
[1] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
基金
美国国家科学基金会;
关键词
text classification; validation of language resources; language identification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Colloquial dialects of Arabic can be roughly categorized into five groups based on relatedness and geographic location (Egyptian, North African/Maghrebi, Gulf, Iraqi, and Levantine), but given that all dialects utilize much of the same writing system and share overlapping features and vocabulary, dialect identification and text classification is no trivial task. Furthermore, text classification by dialect is often performed at a coarse-grained level into these five groups or a subset thereof, and there is little work on sub-dialectal classification. The current study utilizes an n-gram based SVM to classify on a fine-grained sub-dialectal level, and compares it to methods used in dialect classification such as vocabulary pruning of shared items across dialects. A test case of the dialect Levantine is presented here, and results of 65% accuracy on a four-way classification experiment to sub-dialects of Levantine (Jordanian, Lebanese, Palestinian and Syrian) are presented and discussed. This paper also examines the possibility of leveraging existing mixed-dialectal resources to determine their sub-dialectal makeup by automatic classification.
引用
收藏
页码:3671 / 3674
页数:4
相关论文
共 50 条
  • [31] Classification of Raman Spectra using Support Vector Machines
    Kyriakides, Alexandros
    Kastanos, Evdokia
    Pitris, Constantinos
    2009 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS IN BIOMEDICINE, 2009, : 449 - +
  • [32] Musical genre classification using support vector machines
    Xu, CS
    Maddage, NC
    Shao, X
    Cao, F
    Tian, Q
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 429 - 432
  • [33] Model Selection for Support-Vector Machines through Metaheuristic Optimization Algorithms
    Ghnimi, Oumeima
    Kharbech, Sofiane
    Belazi, Akram
    Bouallegue, Ammar
    THIRTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2020), 2021, 11605
  • [34] Robust classification and regression using support vector machines
    Trafalis, Theodore B.
    Gilbert, Robin C.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) : 893 - 909
  • [35] Classification of Nucleotide Sequences Using Support Vector Machines
    Tae-Kun Seo
    Journal of Molecular Evolution, 2010, 71 : 250 - 267
  • [36] Classification of the Thyroid Nodules Using Support Vector Machines
    Chang, Chuan-Yu
    Tsai, Ming-Feng
    Chen, Shao-Jer
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3093 - +
  • [37] Classification using support vector machines with graded resolution
    Wang, LP
    Liu, B
    Wan, CR
    2005 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2005, : 666 - 670
  • [38] Heart rate classification using support vector machines
    Vogt, M
    Moissl, U
    Schaab, J
    FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 716 - +
  • [39] Color Image Classification Using Support Vector Machines
    冯霞
    中国民航学院学报, 2003, (S2) : 184 - 190
  • [40] Classification of stages of psychosis using support vector machines
    Cocchi, Angelo
    Preti, Antonio
    Meliante, Maria
    Cascio, Maria Teresa
    Meneghelli, Anna
    EARLY INTERVENTION IN PSYCHIATRY, 2014, 8 : 65 - 65