Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language

被引:2
|
作者
Ehsan, Toqeer [1 ]
Khalid, Javairia [2 ]
Ambreen, Saadia [2 ]
Mustafa, Asad [2 ]
Hussain, Sarmad [2 ]
机构
[1] Univ Gujrat, Dept Comp Sci, Gujrat 50700, Pakistan
[2] Univ Engn & Technol UET, Ctr Language Engn CLE, Al Khawarizmi Inst Comp Sci KICS, Lahore 54000, Pakistan
关键词
BiLSTM; ELMo; Urdu; Chunking; Shallow Parsing;
D O I
10.1007/s13369-021-06343-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Phrase chunking is an important task in various natural language processing (NLP) applications. This paper presents a neural phrase chunking for Urdu by training contextualized word representations. This work also produces an annotated corpus. The annotation has been performed by using IOB (inside-outside-begin) labels. Comprehensive guidelines have been developed for four phrases which are noun phrase (NP), verb phrase (VP), post-positional phrase (PP) and prepositional phrase (PRP). The annotated text has been evaluated for completeness and correctness automatically. Inter-annotator agreement has been calculated for ten percent reference corpus. A neural chunker has been developed and trained on the annotated corpus. The chunker is based on long-short- term memory networks. Transfer learning has been employed to improve the chunking results. For that purpose, context-free (Word2Vec) and contextualized (ELMo) word representations have been trained. The chunker performed with an f-score of 94.9 when trained by using third layer of ELMo embeddings.
引用
收藏
页码:9781 / 9799
页数:19
相关论文
共 50 条
  • [1] Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically Rich Language
    Toqeer Ehsan
    Javairia Khalid
    Saadia Ambreen
    Asad Mustafa
    Sarmad Hussain
    [J]. Arabian Journal for Science and Engineering, 2022, 47 : 9781 - 9799
  • [2] Improving Named Entity Recognition for Morphologically Rich Languages using Word Embeddings
    Demir, Hakan
    Ozgur, Arzucan
    [J]. 2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 117 - 122
  • [3] Development and Evaluation of Word Embeddings for Morphologically Rich Languages
    Vasic, Daniel
    Brajkovic, Emil
    [J]. 2018 26TH INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), 2018, : 327 - 331
  • [4] Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
    Zhai, Zenan
    Dat Quoc Nguyen
    Akhondi, Saber A.
    Thorne, Camilo
    Druckenbrodt, Christian
    Cohn, Trevor
    Gregory, Michelle
    Verspoor, Karin
    [J]. SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 328 - 338
  • [5] Grapheme-level Awareness in Word Embeddings for Morphologically Rich Languages
    Park, Suzi
    Shin, Hyopil
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2974 - 2980
  • [6] Towards Detection of Subjective Bias using Contextualized Word Embeddings
    Pant, Kartikey
    Dadu, Tanvi
    Mamidi, Radhika
    [J]. WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 75 - 76
  • [7] Shahmukhi named entity recognition by using contextualized word embeddings
    Tehseen, Amina
    Ehsan, Toqeer
    Bin Liaqat, Hannan
    Kong, Xiangjie
    Ali, Amjad
    Al-Fuqaha, Ala
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [8] What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition
    Chang, Ting-Yun
    Chen, Yun-Nung
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6064 - 6070
  • [9] Improving WordNet using Word Embeddings
    Chiru, Costin-Gabriel
    Truica, Ciprian-Octavian
    Apostol, Elena-Simona
    Ionescu, Alexandru
    [J]. 2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, : 121 - 128
  • [10] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    [J]. RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105