A Dataset and Baselines for Multilingual Reply Suggestion

被引:0
|
作者
Zhang, Mozhi [1 ,4 ]
Wang, Wei [2 ,4 ]
Deb, Budhaditya [3 ]
Zheng, Guoqing [4 ]
Shokouhi, Milad [3 ]
Awadallah, Ahmed Hassan [4 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Qualtrics, Provo, UT USA
[3] Microsoft AI, Redmond, WA USA
[4] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reply suggestion models help users process emails and chats faster. Previous work only studies English reply suggestion. Instead, we present MRS, a multilingual reply suggestion dataset with ten languages. MRS can be used to compare two families of models: 1) retrieval models that select the reply from a fixed set and 2) generation models that produce the reply from scratch. Therefore, MRS complements existing cross-lingual generalization benchmarks that focus on classification and sequence labeling tasks. We build a generation model and a retrieval model as baselines for MRS. The two models have different strengths in the monolingual setting, and they require different strategies to generalize across languages. MRS is publicly available at https://github.com/zhangmozhi/mrs.
引用
收藏
页码:1207 / 1220
页数:14
相关论文
共 50 条
  • [31] A Multilingual Evaluation Dataset for MonolingualWord Sense Alignment
    Ahmadi, Sina
    McCrae, John P.
    Nimb, Sanni
    Khan, Fahad
    Monachini, Monica
    Pedersen, Bolette S.
    Declerck, Thierry
    Wissik, Tanja
    Bellandi, Andrea
    Pisani, Irene
    Troelsgard, Thomas
    Olsen, Sussi
    Krek, Simon
    Lipp, Veronika
    Varadi, Tamas
    Simon, Laszlo
    Gyorffy, Andras
    Tiberius, Carole
    Schoonheim, Tanneke
    Ben Moshe, Yifat
    Rudich, Maya
    Abu Ahmad, Raya
    Lonke, Dorielle
    Kovalenko, Kira
    Langemets, Margit
    Kallas, Jelena
    Dereza, Oksana
    Fransen, Theodorus
    Cillessen, David
    Lindemann, David
    Alonso, Mikel
    Salgado, Ana
    Sancho, Jose Luis
    Urena-Ruiz, Rafael-J
    Porta Zamorano, Jordi
    Simov, Kiril
    Osenova, Petya
    Kancheva, Zara
    Radev, Ivaylo
    Stankovic, Ranka
    Perdih, Andrej
    Gabrovsek, Dejan
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3232 - 3242
  • [32] A new dataset for French and multilingual keyphrase generation
    Piedboeuf, Frederic
    Langlais, Philippe
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [33] Multilingual Topic Classification in X: Dataset and Analysis
    Antypas, Dimosthenis
    Ushio, Asahi
    Barbieri, Francesco
    Camacho-Collados, Jose
    [J]. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 20136 - 20152
  • [34] Building a Dataset of Multilingual Cognates for the Romanian Lexicon
    Ciobanu, Alina Maria
    Dinu, Liviu P.
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1038 - 1043
  • [35] VoxEL: A Benchmark Dataset for Multilingual Entity Linking
    Rosales-Mendez, Henry
    Hogan, Aidan
    Poblete, Barbara
    [J]. SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 170 - 186
  • [36] NATURAL MULTIPLE BASELINES ACROSS PERSONS - A REPLY
    HAYES, SC
    [J]. BEHAVIORAL ASSESSMENT, 1985, 7 (02): : 129 - 132
  • [37] Reply to: Shifting baselines and biodiversity success stories
    Leung, Brian
    Hargreaves, Anna L.
    Greenberg, Dan A.
    McGill, Brian
    Dornelas, Maria
    [J]. NATURE, 2022, 601 (7894) : E19 - E19
  • [38] Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
    Huang, Xiaolei
    Xing, Linzi
    Dernoncourt, Franck
    Paul, Michael J.
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1440 - 1448
  • [39] Reply to: Shifting baselines and biodiversity success stories
    Brian Leung
    Anna L. Hargreaves
    Dan A. Greenberg
    Brian McGill
    Maria Dornelas
    [J]. Nature, 2022, 601 : E19 - E19
  • [40] A Dataset for Evaluating Query Suggestion Algorithms in Information Retrieval
    Badarinza, Ioan
    Sterca, Adrian
    Bufnea, Darius
    [J]. 2019 27TH INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), 2019, : 36 - 41