Word Segmentation of Informal Arabic with Domain Adaptation

被引:0
|
作者
Monroe, Will [1 ]
Green, Spence [1 ]
Manning, Christoper D. [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Segmentation of clitics has been shown to improve accuracy on a variety of Arabic NLP tasks. However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect. We extend an existing MSA segmenter with a simple domain adaptation technique and new features in order to segment informal and dialectal Arabic text. Experiments show that our system outperforms existing systems on newswire, broadcast news and Egyptian dialect, improving segmentation F-1 score on a recently released Egyptian Arabic corpus to 95.1%, compared to 90.8% for another segmenter designed specifically for Egyptian Arabic.
引用
收藏
页码:206 / 211
页数:6
相关论文
共 50 条
  • [31] Improved Arabic Handwriting Word Segmentation Approach using Random Forests
    Abdeen, Roqyiah M.
    Afifi, Ahmed
    El-Sisi, Ashraf B.
    [J]. 2015 IEEE/ACS 12TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2015,
  • [32] Linguistic Constraints on Statistical Word Segmentation: The Role of Consonants in Arabic and English
    Kastner, Itamar
    Adriaans, Frans
    [J]. COGNITIVE SCIENCE, 2018, 42 : 494 - 518
  • [33] Word Stretching for Effective Segmentation and Classification of Historical Arabic Handwritten Documents
    Al Aghbari, Zaher
    Brook, Salama
    [J]. RCIS 2009: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE, 2009, : 217 - 224
  • [34] PDA: Progressive Domain Adaptation for Semantic Segmentation
    Liao, Muxin
    Tian, Shishun
    Zhang, Yuhang
    Hua, Guoguang
    Zou, Wenbin
    Li, Xia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [35] Adaptation and initial validation of the Arabic version of the Word Memory Test (WMTARB)
    Bajjaleh, Christine
    Braw, Yoram C.
    Elkana, Odelia
    [J]. APPLIED NEUROPSYCHOLOGY-ADULT, 2023, 30 (02) : 204 - 213
  • [36] Knowledge based domain adaptation for semantic segmentation
    Zhang, Yuxiao
    Ye, Mao
    Gan, Yan
    Zhang, Wencong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [37] Action Segmentation with Mixed Temporal Domain Adaptation
    Chen, Min-Hung
    Li, Baopu
    Bao, Yingze
    AlRegib, Ghassan
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 594 - 603
  • [38] Bidirectional Learning for Domain Adaptation of Semantic Segmentation
    Li, Yunsheng
    Yuan, Lu
    Vasconcelos, Nuno
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6929 - 6938
  • [39] Unsupervised Domain Adaptation for Referring Semantic Segmentation
    Shi, Haonan
    Pan, Wenwen
    Zhao, Zhou
    Zhang, Mingmin
    Wu, Fei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5807 - 5818
  • [40] Unsupervised Domain Adaptation for LiDAR Panoptic Segmentation
    Besic, Borna
    Gosala, Nikhil
    Cattaneo, Daniele
    Valada, Abhinav
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3404 - 3411