Transfer Learning for Low-Resource Multilingual Relation Classification

被引:1
|
作者
Nag, Arijit [1 ]
Samanta, Bidisha [1 ]
Mukherjee, Animesh [1 ]
Ganguly, Niloy [1 ]
Chakrabarti, Soumen [1 ]
机构
[1] Indian Inst Technol, Kharagpur, W Bengal, India
关键词
Relation extraction;
D O I
10.1145/3554734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Relation classification (sometimes called relation extraction) requires trustworthy datasets for fine-tuning large language models, as well as for evaluation. Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English. Despite recent interest in deep generative models for Indian languages, relation classification is still not well served by public datasets. In response, we present IndoRE, a dataset with 21K entity- and relationtagged gold sentences in three Indian languages (Bengali, Hindi, and Telugu), plus English. We start with a multilingual BERT (mBERT)-based system that captures entity span positions and type information, and provides competitive performance on monolingual relation classification. Using this baseline system, we explore transfer mechanisms between languages and the scope to reduce expensive data annotation while achieving reasonable relation extraction performance. Specifically, we (a) study the accuracy-efficiency trade-off between expensive, manually labeled gold instances vs. automatically translated and aligned silver instances to train a relation extractor, (b) device a simple mechanism for budgeted gold data annotation by intelligently converting distant-supervised silver training instances to gold training instances with human annotators using active learning, and finally (c) propose an ensemble model to provide a performance boost over that achieved via limited gold training instances. We release the dataset for future research.(1)
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Knowledge Transfer for Utterance Classification in Low-Resource Languages
    Smirnov, Andrei
    Mendelev, Valentin
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 435 - 442
  • [2] Few-shot Controllable Style Transfer for Low-Resource Multilingual Settings
    Krishna, Kalpesh
    Nathani, Deepak
    Garcia, Xavier
    Samanta, Bidisha
    Talukdar, Partha
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7439 - 7468
  • [3] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656
  • [4] HRCL: Hierarchical Relation Contrastive Learning for Low-Resource Relation Extraction
    Guo, Qian
    Guo, Yi
    Zhao, Jin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [5] Hierarchical Transfer Learning for Multilingual, Multi-Speaker, and Style Transfer DNN-Based TTS on Low-Resource Languages
    Azizah, Kurniawati
    Adriani, Mirna
    Jatmiko, Wisnu
    [J]. IEEE ACCESS, 2020, 8 : 179798 - 179812
  • [6] Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition
    Zhou, Shiyu
    Zhao, Yuanyuan
    Xu, Shuang
    Xu, Bo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 704 - 708
  • [7] Improving NER Tagging Performance in Low-Resource Languages via Multilingual Learning
    Murthy, Rudra
    Khapra, Mitesh M.
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
  • [8] Transfer Learning Based Free-Form Speech Command Classification for Low-Resource Languages
    Karunanayake, Yohan
    Thayasivam, Uthayasanker
    Ranathunga, Surangika
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 288 - 294
  • [9] MULTILINGUAL MLP FEATURES FOR LOW-RESOURCE LVCSR SYSTEMS
    Thomas, Samuel
    Ganapathy, Sriram
    Hermansky, Hynek
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4269 - 4272
  • [10] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903