Segmentation for Domain Adaptation in Arabic

被引:0
|
作者
Attia, Mohammed [1 ]
Elkahky, Ali [1 ]
机构
[1] Google LLC, New York, NY 10011 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Segmentation serves as an integral part in many NLP applications including Machine Translation, Parsing, and Information Retrieval. When a model trained on the standard language is applied to dialects, the accuracy drops dramatically. However, there are more lexical items shared by the standard language and dialects than can be found by mere surface word matching. This shared lexicon is obscured by a lot of cliticization, gemination, and character repetition. In this paper, we prove that segmentation and base normalization of dialects can help in domain adaptation by reducing data sparseness. Segmentation will improve a system performance by reducing the number of OOVs, help isolate the differences and allow better utilization of the commonalities. We show that adding a small amount of dialectal segmentation training data reduced OOVs by 5% and remarkably improves POS tagging for dialects by 7.37% f-score, even though no dialect-specific POS training data is included.
引用
收藏
页码:119 / 129
页数:11
相关论文
共 50 条
  • [1] Word Segmentation of Informal Arabic with Domain Adaptation
    Monroe, Will
    Green, Spence
    Manning, Christoper D.
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 206 - 211
  • [2] Constrained Domain Adaptation for Segmentation
    Bateson, Mathilde
    Kervadec, Hoel
    Dolz, Jose
    Lombaert, Herve
    Ben Ayed, Ismail
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 326 - 334
  • [3] Adversarial Domain Adaptation for Cell Segmentation
    Haq, Mohammad Minhazul
    Huang, Junzhou
    [J]. MEDICAL IMAGING WITH DEEP LEARNING, VOL 121, 2020, 121 : 277 - 287
  • [4] Domain Adaptation in Nuclei Semantic Segmentation
    Li, Dawei
    Shi, Zongxuan
    Zhang, Hao
    Zhang, Renhao
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [5] Constrained Domain Adaptation for Image Segmentation
    Bateson, M.
    Dolz, J.
    Kervadec, H.
    Lombaert, H.
    Ben Ayed, I
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (07) : 1875 - 1887
  • [6] Partial Domain Adaptation on Semantic Segmentation
    Tian, Yingjie
    Zhu, Siyu
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3798 - 3809
  • [7] Benchmarking domain adaptation for semantic segmentation
    Ahmed, Masud
    Hasan, Zahid
    Khan, Naima
    Roy, Nirmalya
    Purushotham, Sanjay
    Gangopadhyay, Aryya
    You, Suya
    [J]. UNMANNED SYSTEMS TECHNOLOGY XXIV, 2022, 12124
  • [8] NuSegDA: Domain adaptation for nuclei segmentation
    Haq, Mohammad Minhazul
    Ma, Hehuan
    Huang, Junzhou
    [J]. FRONTIERS IN BIG DATA, 2023, 6
  • [9] Geometric domain adaptation for CBCT segmentation
    Querfurth, Anne
    Rohleder, Maximilian
    Maier, Andreas
    Schmidt, Wolfgang Hohenforst
    Kunze, Holger
    [J]. COMPUTER-AIDED DIAGNOSIS, MEDICAL IMAGING 2024, 2024, 12927
  • [10] PDA: Progressive Domain Adaptation for Semantic Segmentation
    Liao, Muxin
    Tian, Shishun
    Zhang, Yuhang
    Hua, Guoguang
    Zou, Wenbin
    Li, Xia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284