DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

被引：0

作者：

He, Wei ^{[1
]}

Liu, Kai ^{[1
]}

Liu, Jing ^{[1
]}

Lyu, Yajuan ^{[1
]}

Zhao, Shiqi ^{[1
]}

Xiao, Xinyan ^{[1
]}

Liu, Yuan ^{[1
]}

Wang, Yizhong ^{[1
]}

Wu, Hua ^{[1
]}

She, Qiaoqiao ^{[1
]}

Liu, Xuan ^{[1
]}

Wu, Tian ^{[1
]}

Wang, Haifeng ^{[1
]}

机构：

[1] Baidu Inc, Beijing, Peoples R China

来源：

MACHINE READING FOR QUESTION ANSWERING | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao(1) ; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader(2) and baseline systems(3) have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.

引用

页码：37 / 46

页数：10

共 50 条

[1] DuReaderrobust : A Chinese Dataset Towards Evaluating Robustness and Generalization of Machine Reading Comprehension in Real-World Applications
Tang, Hongxuan
Li, Hongyu
Liu, Jing
Hong, Yu
Wu, Hua
Wang, Haifeng
[J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 955 - 963
[2] Dataset for the First Evaluation on Chinese Machine Reading Comprehension
Cui, Yiming
Liu, Ting
Chen, Zhipeng
Ma, Wentao
Wang, Shijin
Hu, Guoping
[J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2721 - 2725
[3] A Span-Extraction Dataset for Chinese Machine Reading Comprehension
Cui, Yiming
Liu, Ting
Che, Wanxiang
Xiao, Li
Chen, Zhipeng
Ma, Wentao
Wang, Shijin
Hu, Guoping
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5883 - 5889
[4] EQUALS: A Real-world Dataset for Legal Question Answering via Reading Chinese Laws
Chen, Andong
Yao, Feng
Zhao, Xinyan
Zhang, Yating
Sun, Changlong
Liu, Yun
Shen, Weixing
[J]. PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 71 - 80
[5] A Multi-answer Multi-task Framework for Real-world Machine Reading Comprehension
Liu, Jiahua
Wei, Wan
Sun, Maosong
Chen, Hao
Du, Yantao
Lin, Dekang
[J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2109 - 2118
[6] BIOMRC: A Dataset for Biomedical Machine Reading Comprehension
Stavropoulos, Petros
Pappas, Dimitris
Androutsopoulos, Ion
McDonald, Ryan
[J]. 19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 140 - 149
[7] REAL-WORLD READING
GORDON, P
[J]. CIVIL ENGINEERING, 1994, 64 (05): : 32 - 32
[8] Robustness of Chinese Machine Reading Comprehension
Li Y.
Tang H.
Qian J.
Zou B.
Hong Y.
[J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 16 - 22
[9] REAL-Colon: A dataset for developing real-world AI applications in colonoscopy
Biffi, Carlo
Antonelli, Giulio
Bernhofer, Sebastian
Hassan, Cesare
Hirata, Daizen
Iwatate, Mineo
Maieron, Andreas
Salvagnini, Pietro
Cherubini, Andrea
[J]. SCIENTIFIC DATA, 2024, 11 (01)
[10] Brain reading for real-world applications: Promises and pitfalls of neurotechnology
Haynes, John-Dylan
[J]. NEUROSCIENCE RESEARCH, 2009, 65 : S16 - S16

← 1 2 3 4 5 →