Word Segmentation of Informal Arabic with Domain Adaptation

被引:0
|
作者
Monroe, Will [1 ]
Green, Spence [1 ]
Manning, Christoper D. [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Segmentation of clitics has been shown to improve accuracy on a variety of Arabic NLP tasks. However, state-of-the-art Arabic word segmenters are either limited to formal Modern Standard Arabic, performing poorly on Arabic text featuring dialectal vocabulary and grammar, or rely on linguistic knowledge that is hand-tuned for each dialect. We extend an existing MSA segmenter with a simple domain adaptation technique and new features in order to segment informal and dialectal Arabic text. Experiments show that our system outperforms existing systems on newswire, broadcast news and Egyptian dialect, improving segmentation F-1 score on a recently released Egyptian Arabic corpus to 95.1%, compared to 90.8% for another segmenter designed specifically for Egyptian Arabic.
引用
收藏
页码:206 / 211
页数:6
相关论文
共 50 条
  • [1] Segmentation for Domain Adaptation in Arabic
    Attia, Mohammed
    Elkahky, Ali
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 119 - 129
  • [2] Neural Domain Adaptation or Chinese Word Segmentation
    Bao, Zuyi
    Li, Si
    Xu, Weiran
    Gao, Sheng
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 131 - 134
  • [3] Neural Domain Adaptation with Contextualized Character Embedding for Chinese Word Segmentation
    Bao, Zuyi
    Li, Si
    Gao, Sheng
    Xu, Weiran
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 419 - 430
  • [4] Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble
    Limkonchotiwat, Peerat
    Phatthiyaphaibu, Wannaphong
    Sarwar, Raheem
    Chuangsuwanicht, Ekapol
    Nutanong, Sarana
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3841 - 3847
  • [5] Morphology-Aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation
    Tawfik, Ahmed Y.
    Emam, Mahitab
    Essam, Khaled
    Nabil, Robert
    Hassan, Hany
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 11 - 17
  • [6] Using a Goodness Measurement for Domain Adaptation: A Case Study on Chinese Word Segmentation
    Song, Yan
    Xia, Fei
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3853 - 3860
  • [7] Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation
    Guo, Zhen
    Zhang, Yujie
    Su, Chen
    Xu, Jinan
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 121 - 131
  • [8] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [9] Arabic Word Segmentation for Better Unit of Analysis
    Benajiba, Yassine
    Zitouni, Imed
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1346 - 1352
  • [10] Word Segmentation for Arabic Abstractive Headline Generation
    Abdelaziz, Yaser O.
    El-Beltagy, Samhaa R.
    [J]. 2021 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT 2021), 2021, : 59 - 63