A Dataset and Baselines for Multilingual Reply Suggestion

被引:0
|
作者
Zhang, Mozhi [1 ,4 ]
Wang, Wei [2 ,4 ]
Deb, Budhaditya [3 ]
Zheng, Guoqing [4 ]
Shokouhi, Milad [3 ]
Awadallah, Ahmed Hassan [4 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Qualtrics, Provo, UT USA
[3] Microsoft AI, Redmond, WA USA
[4] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reply suggestion models help users process emails and chats faster. Previous work only studies English reply suggestion. Instead, we present MRS, a multilingual reply suggestion dataset with ten languages. MRS can be used to compare two families of models: 1) retrieval models that select the reply from a fixed set and 2) generation models that produce the reply from scratch. Therefore, MRS complements existing cross-lingual generalization benchmarks that focus on classification and sequence labeling tasks. We build a generation model and a retrieval model as baselines for MRS. The two models have different strengths in the monolingual setting, and they require different strategies to generalize across languages. MRS is publicly available at https://github.com/zhangmozhi/mrs.
引用
收藏
页码:1207 / 1220
页数:14
相关论文
共 50 条
  • [1] Multilingual Image Corpus - Towards a Multimodal and Multilingual Dataset
    Koeva, Svetla
    Stoyanova, Ivelina
    Kralev, Jordan
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1509 - 1518
  • [2] Guilloche Detection for ID Authentication: A Dataset and Baselines
    Al-Ghadi, Musab
    Ming, Zuheng
    Gomez-Kramer, Petra
    Burie, Jean-Christophe
    Coustaty, Mickael
    Sidere, Nicolas
    [J]. 2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
  • [3] A multilingual, multimodal dataset of aggression and bias: the ComMA dataset
    Kumar, Ritesh
    Ratan, Shyam
    Singh, Siddharth
    Nandi, Enakshi
    Devi, Laishram Niranjana
    Bhagat, Akash
    Dawer, Yogesh
    Lahiri, Bornini
    Bansal, Akanksha
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 757 - 837
  • [4] A Clinical Dataset and Various Baselines for Chromosome Instance Segmentation
    Huang, Runhua
    Lin, Chengchuang
    Yin, Aihua
    Chen, Hanbiao
    Guo, Li
    Zhao, Gansen
    Fan, Xiaomao
    Li, Shuangyin
    Yang, Jinji
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 31 - 39
  • [5] A Dataset and Baselines for e-Commerce Product Categorization
    Lin, Yiu-Chang
    Das, Pradipto
    Trotman, Andrew
    Kallumadi, Surya
    [J]. PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 212 - 215
  • [6] Towards hierarchical affiliation resolution: framework, baselines, dataset
    Tobias Backes
    Daniel Hienert
    Stefan Dietze
    [J]. International Journal on Digital Libraries, 2022, 23 : 267 - 288
  • [7] Towards hierarchical affiliation resolution: framework, baselines, dataset
    Backes, Tobias
    Hienert, Daniel
    Dietze, Stefan
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2022, 23 (03) : 267 - 288
  • [8] The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines
    Damen, Dima
    Doughty, Hazel
    Farinella, Giovanni Maria
    Fidler, Sanja
    Furnari, Antonino
    Kazakos, Evangelos
    Moltisanti, Davide
    Munro, Jonathan
    Perrett, Toby
    Price, Will
    Wray, Michael
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 4125 - 4141
  • [9] The Second DIHARD Diarization Challenge: Dataset, task, and baselines
    Ryant, Neville
    Church, Kenneth
    Cieri, Christopher
    Cristia, Alejandrina
    Du, Jun
    Ganapathy, Sriram
    Liberman, Mark
    [J]. INTERSPEECH 2019, 2019, : 978 - 982
  • [10] Leyzer: A Dataset for Multilingual Virtual Assistants
    Sowanski, Marcin
    Janicki, Artur
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 477 - 486