MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

被引:0
|
作者
Xu, Canwen [1 ]
Pei, Jiaxin [2 ]
Wu, Hongtao [3 ]
Liu, Yiyu [3 ]
Li, Chenliang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Hubei, Peoples R China
[2] Univ Michigan, Sch Informat, Ann Arbor, MI 48109 USA
[3] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MATINF contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by MATINF.
引用
收藏
页码:3586 / 3596
页数:11
相关论文
共 50 条
  • [1] VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage
    Becattini, Federico
    Bongini, Pietro
    Bulla, Luana
    Marinucci, Ludovica
    del Bimbo, Alberto
    Mongiovi, Misael
    Presutti, Valentina
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [2] ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
    Piryani, Bhawna
    Mozafari, Jamshid
    Jatowt, Adam
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2038 - 2048
  • [3] BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
    Sharma, Eva
    Li, Chen
    Wang, Lu
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2204 - 2213
  • [4] MEDIASUM: A Large-scale Media Interview Dataset for Dialogue Summarization
    Zhu, Chenguang
    Liu, Yang
    Mei, Jie
    Zeng, Michael
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5927 - 5934
  • [5] A large-scale hyperspectral dataset for flower classification
    Zheng, Yongrong
    Zhang, Tao
    Fu, Ying
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [6] Medical Exam Question Answering with Large-Scale Reading Comprehension
    Zhang, Xiao
    Wu, Ji
    He, Zhiyang
    Liu, Xien
    Su, Ying
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5706 - 5713
  • [7] Large-Scale Goodness Polarity Lexicons for Community Question Answering
    Mihaylov, Todor
    Balchev, Daniel
    Kiprov, Yasen
    Koychev, Ivan
    Nakov, Preslav
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1185 - 1188
  • [8] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
  • [9] MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
    Pal, Ankit
    Umapathi, Logesh Kumar
    Sankarasubbu, Malaikannan
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 248 - 260
  • [10] ArchivalQA: A Large-scale Benchmark Dataset for Open-Domain Question Answering over Historical News Collections
    Wang, Jiexin
    Jatowt, Adam
    Yoshikawa, Masatoshi
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3025 - 3035