Multi-lingual and Multi-cultural Figurative Language Understanding

被引:0
|
作者
Kabra, Anubha [1 ]
Liu, Emmy [1 ]
Khanuja, Simran [1 ]
Aji, Alham Fikri [2 ]
Winata, Genta Indra [3 ]
Cahyawijaya, Samuel [4 ]
Aremu, Anuoluwapo [5 ]
Ogayo, Perez [1 ]
Neubig, Graham [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
[3] Bloomberg, New York, NY USA
[4] HKUST, Hong Kong, Peoples R China
[5] Masakhane, Pretoria, South Africa
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be universally applicable. In this work, we create a figurative language inference dataset, MABL, for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. We assess multilingual LMs' abilities to interpret figurative language in zero-shot and few-shot settings. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data, emphasizing the need for LMs to be exposed to a broader range of linguistic and cultural variation during training. 1
引用
收藏
页码:8269 / 8284
页数:16
相关论文
共 50 条
  • [21] Multi-lingual threading
    Kind, A
    Padget, J
    PROCEEDINGS OF THE SIXTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING - PDP '98, 1998, : 431 - 437
  • [22] MULTI-LINGUAL INTERPRETATION
    ROSENNE, S
    ISRAEL LAW REVIEW, 1971, 6 (03) : 360 - 366
  • [23] MULTI-LINGUAL SCHOLAR
    BOLTON, W
    COMPUTERS AND THE HUMANITIES, 1989, 23 (03): : 263 - 265
  • [24] Language identification in multi-lingual web-documents
    Mandl, Thomas
    Shramko, Margaryta
    Tartakovski, Olga
    Womser-Hacker, Christa
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2006, 3999 : 153 - 163
  • [25] MULTI-LINGUAL DEEP NEURAL NETWORKS FOR LANGUAGE RECOGNITION
    Marcos, Luis Murphy
    Richardson, Frederick
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 330 - 334
  • [26] Language Models for Multi-Lingual Tasks- A Survey
    Jafari, Amir Reza
    Heidary, Behnam
    Farahbakhsh, Reza
    Salehi, Mostafa
    Crespi, Noel
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (06) : 1458 - 1472
  • [27] Multi-lingual scene text detection and language identification
    Saha, Shaswata
    Chakraborty, Neelotpal
    Kundu, Soumyadeep
    Paul, Sayantan
    Mollah, Ayatullah Faruk
    Basu, Subhadip
    Sarkar, Ram
    PATTERN RECOGNITION LETTERS, 2020, 138 : 16 - 22
  • [28] TASK SPECIFIC CONTINUOUS WORD REPRESENTATIONS FOR MONO AND MULTI-LINGUAL SPOKEN LANGUAGE UNDERSTANDING
    Anastasakos, Tasos
    Kim, Young-Bum
    Deoras, Anoop
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [29] 2nd Global Information Village Plaza - Connecting multi-cultural, multi-lingual and multi-media universes - Sponsored by SIG III, IFP
    Caidi, N
    Menou, MJ
    ASIST 2003: PROCEEDINGS OF THE 66TH ASIST ANNUAL MEETING, VOL 40, 2003: HUMANIZING INFORMATION TECHNOLOGY: FROM IDEAS TO BITS AND BACK, 2003, 40 : 450 - 451
  • [30] Particularities of Language Classes in a Multi-cultural Context
    Julia, Belyasova
    Raisa, Teleshova
    DIGITAL SCIENCE, 2019, 850 : 174 - 187