Using Annotation Projection for Semantic Role Labeling of Low-Resourced Language: Sinhala

被引:0
|
作者
Gunasekara, Sandun [1 ]
Chathura, Dulanjaya [1 ]
Jeewantha, Chamoda [1 ]
Dias, Gihan [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka
关键词
SRL; Semantics; Semantic Role Labeling; Sinhala; Annotation; Projection; Labeller; Roles;
D O I
10.1109/ialp51396.2020.9310468
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present SinSRL, the first-ever semantic role labeller (SRL) for Sinhala, an Indo-European language spoken mainly in Sri Lanka. SinSRL takes parallel text in English (or any other language for which a suitable SRL exists) and Sinhala and outputs semantically annotated Sinhala text. We have enhanced existing tools to address several issues related to the target language. This will also be useful for labeling other Indic languages. In addition, we have manually semantically labeled a small Sinhala-English parallel dataset. The accuracy of our system is similar to that of manually labeled data. Our implementation can be used to generate a SRL dataset which may be used to train a direct semantic role labeller. SinSRL may be easily modified to annotate other low-resource languages for which parallel corpora are available.
引用
收藏
页码:98 / 103
页数:6
相关论文
共 50 条
  • [21] Leveraging Large Language Models in Low-resourced Language NLP: A spaCy Implementation for Modern Tibetan
    Kyogoku, Yuki
    Erhard, Franz Xaver
    Engels, James
    Barnett, Robert
    REVUE D ETUDES TIBETAINES, 2025, (74):
  • [22] Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification
    Dey, Spandan
    Singh, Premjeet
    Saha, Goutam
    INTERSPEECH 2023, 2023, : 1953 - 1957
  • [23] Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification
    Sefara, Tshephisho J.
    Zwane, Skhumbuzo G.
    Gama, Nelisiwe
    Sibisi, Hlawulani
    Senoamadi, Phillemon N.
    Marivate, Vukosi
    2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 127 - 132
  • [24] Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score
    Datta, Goutam
    Joshi, Nisheeth
    Gupta, Kusum
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 155 - 162
  • [25] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
    Agbesi, Victor Kwaku
    Chen, Wenyu
    Yussif, Sophyani Banaamwini
    Hossin, Md Altab
    Ukwuoma, Chiagoziem C.
    Kuadey, Noble A.
    Agbesi, Colin Collinson
    Samee, Nagwan Abdel
    Jamjoom, Mona M.
    Al-antari, Mugahed A.
    SYSTEMS, 2024, 12 (01):
  • [26] Explainable Pre-Trained Language Models for Sentiment Analysis in Low-Resourced Languages
    Mabokela, Koena Ronny
    Primus, Mpho
    Celik, Turgay
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
  • [27] END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS
    Yue, Xianghu
    Lee, Grandee
    Yilmaz, Emre
    Deng, Fang
    Li, Haizhou
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 972 - 979
  • [28] BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language
    Ashraf, Muhammad Rehan
    Jana, Yasmeen
    Umer, Qasim
    Jaffar, M. Arfan
    Chung, Sungwook
    Ramay, Waheed Yousuf
    IEEE ACCESS, 2023, 11 : 110245 - 110259
  • [29] ENGAGING LOW-RESOURCED FAMILIES IN ADOLESCENT WELLNESS INTERVENTION RESEARCH USING FMRI
    Hardin, Heather K.
    Bender, Anna E.
    Killion, Cheryl
    Moore, Shirley M.
    JOURNAL OF ADOLESCENT HEALTH, 2020, 66 (02) : S66 - S67
  • [30] Leveraging ChatGPT for Enhancing Arabic NLP: Application for Semantic Role Labeling and Cross-Lingual Annotation Projection
    Senator, Ferial
    Lakhfif, Abdelaziz
    Zenbout, Imene
    Boutouta, Hanane
    Mediani, Chahrazed
    IEEE ACCESS, 2025, 13 : 3707 - 3725