DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

被引:0
|
作者
He, Wei [1 ]
Liu, Kai [1 ]
Liu, Jing [1 ]
Lyu, Yajuan [1 ]
Zhao, Shiqi [1 ]
Xiao, Xinyan [1 ]
Liu, Yuan [1 ]
Wang, Yizhong [1 ]
Wu, Hua [1 ]
She, Qiaoqiao [1 ]
Liu, Xuan [1 ]
Wu, Tian [1 ]
Wang, Haifeng [1 ]
机构
[1] Baidu Inc, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao(1) ; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader(2) and baseline systems(3) have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.
引用
收藏
页码:37 / 46
页数:10
相关论文
共 50 条
  • [1] DuReaderrobust : A Chinese Dataset Towards Evaluating Robustness and Generalization of Machine Reading Comprehension in Real-World Applications
    Tang, Hongxuan
    Li, Hongyu
    Liu, Jing
    Hong, Yu
    Wu, Hua
    Wang, Haifeng
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 955 - 963
  • [2] Dataset for the First Evaluation on Chinese Machine Reading Comprehension
    Cui, Yiming
    Liu, Ting
    Chen, Zhipeng
    Ma, Wentao
    Wang, Shijin
    Hu, Guoping
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2721 - 2725
  • [3] A Span-Extraction Dataset for Chinese Machine Reading Comprehension
    Cui, Yiming
    Liu, Ting
    Che, Wanxiang
    Xiao, Li
    Chen, Zhipeng
    Ma, Wentao
    Wang, Shijin
    Hu, Guoping
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5883 - 5889
  • [4] EQUALS: A Real-world Dataset for Legal Question Answering via Reading Chinese Laws
    Chen, Andong
    Yao, Feng
    Zhao, Xinyan
    Zhang, Yating
    Sun, Changlong
    Liu, Yun
    Shen, Weixing
    [J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 71 - 80
  • [5] A Multi-answer Multi-task Framework for Real-world Machine Reading Comprehension
    Liu, Jiahua
    Wei, Wan
    Sun, Maosong
    Chen, Hao
    Du, Yantao
    Lin, Dekang
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2109 - 2118
  • [6] BIOMRC: A Dataset for Biomedical Machine Reading Comprehension
    Stavropoulos, Petros
    Pappas, Dimitris
    Androutsopoulos, Ion
    McDonald, Ryan
    [J]. 19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 140 - 149
  • [7] REAL-WORLD READING
    GORDON, P
    [J]. CIVIL ENGINEERING, 1994, 64 (05): : 32 - 32
  • [8] Robustness of Chinese Machine Reading Comprehension
    Li Y.
    Tang H.
    Qian J.
    Zou B.
    Hong Y.
    [J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 16 - 22
  • [9] REAL-Colon: A dataset for developing real-world AI applications in colonoscopy
    Biffi, Carlo
    Antonelli, Giulio
    Bernhofer, Sebastian
    Hassan, Cesare
    Hirata, Daizen
    Iwatate, Mineo
    Maieron, Andreas
    Salvagnini, Pietro
    Cherubini, Andrea
    [J]. SCIENTIFIC DATA, 2024, 11 (01)
  • [10] Brain reading for real-world applications: Promises and pitfalls of neurotechnology
    Haynes, John-Dylan
    [J]. NEUROSCIENCE RESEARCH, 2009, 65 : S16 - S16