Transfer Learning for Low-Resource Multilingual Relation Classification

被引:1
|
作者
Nag, Arijit [1 ]
Samanta, Bidisha [1 ]
Mukherjee, Animesh [1 ]
Ganguly, Niloy [1 ]
Chakrabarti, Soumen [1 ]
机构
[1] Indian Inst Technol, Kharagpur, W Bengal, India
关键词
Relation extraction;
D O I
10.1145/3554734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Relation classification (sometimes called relation extraction) requires trustworthy datasets for fine-tuning large language models, as well as for evaluation. Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English. Despite recent interest in deep generative models for Indian languages, relation classification is still not well served by public datasets. In response, we present IndoRE, a dataset with 21K entity- and relationtagged gold sentences in three Indian languages (Bengali, Hindi, and Telugu), plus English. We start with a multilingual BERT (mBERT)-based system that captures entity span positions and type information, and provides competitive performance on monolingual relation classification. Using this baseline system, we explore transfer mechanisms between languages and the scope to reduce expensive data annotation while achieving reasonable relation extraction performance. Specifically, we (a) study the accuracy-efficiency trade-off between expensive, manually labeled gold instances vs. automatically translated and aligned silver instances to train a relation extractor, (b) device a simple mechanism for budgeted gold data annotation by intelligently converting distant-supervised silver training instances to gold training instances with human annotators using active learning, and finally (c) propose an ensemble model to provide a performance boost over that achieved via limited gold training instances. We release the dataset for future research.(1)
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
    Nesterenko, Anton
    Akhmerov, Ruslan
    Matveeva, Yulia
    Goremykina, Anna
    Astankov, Dmitry
    Shuranov, Evgeniy
    Shirshova, Alexandra
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 508 - 521
  • [22] Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
    Chaudhary, Vishrav
    Tang, Yuqing
    Guzman, Francisco
    Schwenk, Holger
    Koehn, Philipp
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 261 - 266
  • [23] Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Zheng, Guolin
    Liang, Xiaodan
    Lin, Liang
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14112 - 14120
  • [24] LOW-RESOURCE LANGUAGE IDENTIFICATION FROM SPEECH USING TRANSFER LEARNING
    Feng, Kexin
    Chaspari, Theodora
    [J]. 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [25] Language-Adversarial Transfer Learning for Low-Resource Speech Recognition
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 621 - 630
  • [26] Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation
    Luo, Gongxu
    Yang, Yating
    Yuan, Yang
    Chen, Zhanheng
    Ainiwaer, Aizimaiti
    [J]. IEEE ACCESS, 2019, 7 : 154157 - 154166
  • [27] SELFLRE: Self-refining Representation Learning for Low-resource Relation Extraction
    Hu, Xuming
    Chen, Junzhe
    Meng, Shiao
    Wen, Lijie
    Yu, Philip S.
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2364 - 2368
  • [28] Exploring low-resource medical image classification with weakly supervised prompt learning
    Zheng, Fudan
    Cao, Jindong
    Yu, Weijiang
    Chen, Zhiguang
    Xiao, Nong
    Lu, Yutong
    [J]. PATTERN RECOGNITION, 2024, 149
  • [29] Knowledge-Aware Meta-learning for Low-Resource Text Classification
    Yao, Huaxiu
    Wu, Yingxin
    Al-Shedivat, Maruan
    Xing, Eric P.
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1814 - 1821
  • [30] A STUDY OF RANK-CONSTRAINED MULTILINGUAL DNNS FOR LOW-RESOURCE ASR
    Sahraeian, Reza
    Van Compernolle, Dirk
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5420 - 5424