Word Segmentation of Informal Arabic with Domain Adaptation

被引:0
|
作者
Monroe, Will [1 ]
Green, Spence [1 ]
Manning, Christoper D. [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Segmentation of clitics has been shown to improve accuracy on a variety of Arabic NLP tasks. However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect. We extend an existing MSA segmenter with a simple domain adaptation technique and new features in order to segment informal and dialectal Arabic text. Experiments show that our system outperforms existing systems on newswire, broadcast news and Egyptian dialect, improving segmentation F-1 score on a recently released Egyptian Arabic corpus to 95.1%, compared to 90.8% for another segmenter designed specifically for Egyptian Arabic.
引用
收藏
页码:206 / 211
页数:6
相关论文
共 50 条
  • [21] Constrained Domain Adaptation for Image Segmentation
    Bateson, M.
    Dolz, J.
    Kervadec, H.
    Lombaert, H.
    Ben Ayed, I
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (07) : 1875 - 1887
  • [22] Partial Domain Adaptation on Semantic Segmentation
    Tian, Yingjie
    Zhu, Siyu
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3798 - 3809
  • [23] Benchmarking domain adaptation for semantic segmentation
    Ahmed, Masud
    Hasan, Zahid
    Khan, Naima
    Roy, Nirmalya
    Purushotham, Sanjay
    Gangopadhyay, Aryya
    You, Suya
    [J]. UNMANNED SYSTEMS TECHNOLOGY XXIV, 2022, 12124
  • [24] NuSegDA: Domain adaptation for nuclei segmentation
    Haq, Mohammad Minhazul
    Ma, Hehuan
    Huang, Junzhou
    [J]. FRONTIERS IN BIG DATA, 2023, 6
  • [25] Geometric domain adaptation for CBCT segmentation
    Querfurth, Anne
    Rohleder, Maximilian
    Maier, Andreas
    Schmidt, Wolfgang Hohenforst
    Kunze, Holger
    [J]. COMPUTER-AIDED DIAGNOSIS, MEDICAL IMAGING 2024, 2024, 12927
  • [26] Domain Adaptation for Word Sense Disambiguation Using Word Embeddings
    Komiya, Kanako
    Suzuki, Shota
    Sasaki, Minoru
    Shinnou, Hiroyuki
    Okumura, Manabu
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 195 - 206
  • [27] Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
    Alharbi, Abdullah I.
    Lee, Mark
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2020), 2020, 12089 : 213 - 224
  • [28] Learning Domain Invariant Word Representations for Parsing Domain Adaptation
    Qiao, Xiuming
    Zhang, Yue
    Zhao, Tiejun
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 801 - 813
  • [29] Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding
    Almuhareb, Abdulrahman
    Alsanie, Waleed
    Al-Thubaity, Abdulmohsen
    [J]. IEEE ACCESS, 2019, 7 : 12879 - 12887
  • [30] Improving Cross-Domain Chinese Word Segmentation with Word Embeddings
    Ye, Yuxiao
    Zhang, Yue
    Li, Weikang
    Qiu, Likun
    Sun, Jian
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2726 - 2735