Multi-lingual and Multi-cultural Figurative Language Understanding

被引:0
|
作者
Kabra, Anubha [1 ]
Liu, Emmy [1 ]
Khanuja, Simran [1 ]
Aji, Alham Fikri [2 ]
Winata, Genta Indra [3 ]
Cahyawijaya, Samuel [4 ]
Aremu, Anuoluwapo [5 ]
Ogayo, Perez [1 ]
Neubig, Graham [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
[3] Bloomberg, New York, NY USA
[4] HKUST, Hong Kong, Peoples R China
[5] Masakhane, Pretoria, South Africa
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be universally applicable. In this work, we create a figurative language inference dataset, MABL, for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. We assess multilingual LMs' abilities to interpret figurative language in zero-shot and few-shot settings. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data, emphasizing the need for LMs to be exposed to a broader range of linguistic and cultural variation during training. 1
引用
收藏
页码:8269 / 8284
页数:16
相关论文
共 50 条
  • [41] Multi-lingual and Cross-lingual timeline extraction
    Laparra, Egoitz
    Agerri, Rodrigo
    Aldabe, Itziar
    Rigau, German
    KNOWLEDGE-BASED SYSTEMS, 2017, 133 : 77 - 89
  • [42] Language resources used in multi-lingual question-answering systems
    Olvera-Lobo, Maria-Dolores
    Gutierrez-Artacho, Juncal
    ONLINE INFORMATION REVIEW, 2011, 35 (04) : 543 - 557
  • [43] Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis
    Lampert, Jasmin
    Lampert, Christoph H.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5185 - 5192
  • [44] MultiFiT: Efficient Multi-lingual Language Model Fine-tuning
    Eisenschlos, Julian
    Ruder, Sebastian
    Czapla, Piotr
    Kardas, Marcin
    Gugger, Sylvain
    Howard, Jeremy
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5702 - 5707
  • [45] Phylogenetic Multi-Lingual Dependency Parsing
    Dehouck, Mathieu
    Denis, Pascal
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 192 - 203
  • [46] A Multi-Lingual Dictionary of Dirty Words
    Sjoebergh, Jonas
    Araki, Kenji
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 509 - 512
  • [47] A multi-lingual synthesis and verification environment
    Economakos, G
    Stergiou, S
    Papakonstantinou, G
    Zoukos, V
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 8 - 15
  • [48] An API for Multi-lingual Ontology Matching
    Trojahn, Cassia
    Quaresma, Paulo
    Vieira, Renata
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3830 - 3835
  • [49] Matching Multi-lingual Subject Vocabularies
    Wang, Shenghui
    Isaac, Antoine
    Schopman, Balthasar
    Schlobach, Stefan
    van der Meij, Lourens
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2009, 5714 : 125 - 137
  • [50] MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition
    Sadeghzadeh, Arezoo
    Shah, A. F. M. Shahen
    Islam, Md Baharul
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22