PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

被引:0
|
作者
Hieu H. Pham
Ngoc H. Nguyen
Thanh T. Tran
Tuan N. M. Nguyen
Ha Q. Nguyen
机构
[1] Smart Health Center,
[2] VinBigData JSC,undefined
[3] College of Engineering & Computer Science,undefined
[4] VinUniversity,undefined
[5] VinUni-Illinois Smart Health Center,undefined
[6] Phu Tho Department of Health,undefined
[7] Training and Direction of Healthcare Activities Center,undefined
[8] Phu Tho General Hospital,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Computer-aided diagnosis systems in adult chest radiography (CXR) have recently achieved great success thanks to the availability of large-scale, annotated datasets and the advent of high-performance supervised learning algorithms. However, the development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist with more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the PediCXR data sample and make the dataset publicly available on https://physionet.org/content/vindr-pcxr/1.0.0/.
引用
收藏
相关论文
共 50 条
  • [21] Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset
    Tang, Siyi
    Ghorbani, Amirata
    Yamashita, Rikiya
    Rehman, Sameer
    Dunnmon, Jared A.
    Zou, James
    Rubin, Daniel L.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [22] CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images
    Gaggion, Nicolas
    Mosquera, Candelaria
    Mansilla, Lucas
    Saidman, Julia Mariel
    Aineseder, Martina
    Milone, Diego H.
    Ferrante, Enzo
    SCIENTIFIC DATA, 2024, 11 (01)
  • [23] Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset
    Siyi Tang
    Amirata Ghorbani
    Rikiya Yamashita
    Sameer Rehman
    Jared A. Dunnmon
    James Zou
    Daniel L. Rubin
    Scientific Reports, 11
  • [24] INTERRELATIONSHIPS AMONG COMMON PREDICTORS OF CARDIOVASCULAR DISEASES IN PATIENTS OF OSA: A LARGE-SCALE OBSERVATIONAL STUDY
    Li, X.
    Qian, Y.
    Xu, H.
    Guan, J.
    Yin, S.
    SLEEP MEDICINE, 2019, 64 : S310 - S310
  • [25] Interrelationships among common predictors of cardiovascular diseases in patients of OSA: A large-scale observational study
    Li, Xinyi
    Wang, Fan
    Xu, Huajun
    Qian, Yingjun
    Zou, Jianyin
    Yang, Mingpo
    Zhu, Huaming
    Yi, Hongliang
    Guan, Jian
    Yin, Shankai
    NUTRITION METABOLISM AND CARDIOVASCULAR DISEASES, 2020, 30 (01) : 23 - 32
  • [26] Long-term and large-scale multispecies dataset tracking population changes of common European breeding birds
    Vojtěch Brlík
    Eva Šilarová
    Jana Škorpilová
    Hany Alonso
    Marc Anton
    Ainars Aunins
    Zoltán Benkö
    Gilles Biver
    Malte Busch
    Tomasz Chodkiewicz
    Przemysław Chylarecki
    Dick Coombes
    Elisabetta de Carli
    Juan C. del Moral
    Antoine Derouaux
    Virginia Escandell
    Daniel P. Eskildsen
    Benoît Fontaine
    Ruud P. B. Foppen
    Anna Gamero
    Richard D. Gregory
    Sarah Harris
    Sergi Herrando
    Iordan Hristov
    Magne Husby
    Christina Ieronymidou
    Frédéric Jiquet
    John A. Kålås
    Johannes Kamp
    Primož Kmecl
    Petras Kurlavičius
    Aleksi Lehikoinen
    Lesley Lewis
    Åke Lindström
    Aris Manolopoulos
    David Martí
    Dario Massimino
    Charlotte Moshøj
    Renno Nellis
    David Noble
    Alain Paquet
    Jean-Yves Paquet
    Danae Portolou
    Iván Ramírez
    Cindy Redel
    Jiří Reif
    Jozef Ridzoň
    Hans Schmid
    Benjamin Seaman
    Laura Silva
    Scientific Data, 8
  • [27] Long-term and large-scale multispecies dataset tracking population changes of common European breeding birds
    Brlik, Vojtech
    Silarova, Eva
    Skorpilova, Jana
    Alonso, Hany
    Anton, Marc
    Aunins, Ainars
    Benkoe, Zoltan
    Biver, Gilles
    Busch, Malte
    Chodkiewicz, Tomasz
    Chylarecki, Przemyslaw
    Coombes, Dick
    de Carli, Elisabetta
    del Moral, Juan C.
    Derouaux, Antoine
    Escandell, Virginia
    Eskildsen, Daniel P.
    Fontaine, Benoit
    Foppen, Ruud P. B.
    Gamero, Anna
    Gregory, Richard D.
    Harris, Sarah
    Herrando, Sergi
    Hristov, Iordan
    Husby, Magne
    Ieronymidou, Christina
    Jiquet, Frederic
    Kalas, John A.
    Kamp, Johannes
    Kmecl, Primoz
    Kurlavicius, Petras
    Lehikoinen, Aleksi
    Lewis, Lesley
    Lindstroem, Ake
    Manolopoulos, Aris
    Marti, David
    Massimino, Dario
    Moshoj, Charlotte
    Nellis, Renno
    Noble, David
    Paquet, Alain
    Paquet, Jean-Yves
    Portolou, Danae
    Ramirez, Ivan
    Redel, Cindy
    Reif, Jiri
    Ridzon, Jozef
    Schmid, Hans
    Seaman, Benjamin
    Silva, Laura
    SCIENTIFIC DATA, 2021, 8 (01)
  • [28] Data-Driven Container Marking Detection and Recognition System With an Open Large-Scale Scene Text Dataset
    Xu, Ying
    Liang, Zhangzhao
    Liang, Yanyang
    Li, Xinru
    Pan, Wenfeng
    You, Jie
    Long, Zhihao
    Zhai, Yikui
    Genovese, Angelo
    Piuri, Vincenzo
    Scotti, Fabio
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (05): : 3368 - 3381
  • [29] A large-scale open image dataset for deep learning-enabled intelligent sorting and analyzing of raw coal
    Lv, Ziqi
    Fan, Yuhan
    Sha, Te
    Cui, Yao
    Wu, Yuxin
    Lv, Haimei
    Sun, Meijie
    Tu, Yanan
    Xu, Zhiqiang
    Wang, Weidong
    SCIENTIFIC DATA, 2025, 12 (01)
  • [30] ArchivalQA: A Large-scale Benchmark Dataset for Open-Domain Question Answering over Historical News Collections
    Wang, Jiexin
    Jatowt, Adam
    Yoshikawa, Masatoshi
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3025 - 3035