PediCXR: An open, large-scale chest radiograph dataset for interpretation of common thoracic diseases in children

被引:0
|
作者
Hieu H. Pham
Ngoc H. Nguyen
Thanh T. Tran
Tuan N. M. Nguyen
Ha Q. Nguyen
机构
[1] Smart Health Center,
[2] VinBigData JSC,undefined
[3] College of Engineering & Computer Science,undefined
[4] VinUniversity,undefined
[5] VinUni-Illinois Smart Health Center,undefined
[6] Phu Tho Department of Health,undefined
[7] Training and Direction of Healthcare Activities Center,undefined
[8] Phu Tho General Hospital,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Computer-aided diagnosis systems in adult chest radiography (CXR) have recently achieved great success thanks to the availability of large-scale, annotated datasets and the advent of high-performance supervised learning algorithms. However, the development of diagnostic models for detecting and diagnosing pediatric diseases in CXR scans is undertaken due to the lack of high-quality physician-annotated datasets. To overcome this challenge, we introduce and release PediCXR, a new pediatric CXR dataset of 9,125 studies retrospectively collected from a major pediatric hospital in Vietnam between 2020 and 2021. Each scan was manually annotated by a pediatric radiologist with more than ten years of experience. The dataset was labeled for the presence of 36 critical findings and 15 diseases. In particular, each abnormal finding was identified via a rectangle bounding box on the image. To the best of our knowledge, this is the first and largest pediatric CXR dataset containing lesion-level annotations and image-level labels for the detection of multiple findings and diseases. For algorithm development, the dataset was divided into a training set of 7,728 and a test set of 1,397. To encourage new advances in pediatric CXR interpretation using data-driven approaches, we provide a detailed description of the PediCXR data sample and make the dataset publicly available on https://physionet.org/content/vindr-pcxr/1.0.0/.
引用
收藏
相关论文
共 50 条
  • [31] A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration
    Banda, Juan M.
    Tekumalla, Ramya
    Wang, Guanyu
    Yu, Jingyuan
    Liu, Tuo
    Ding, Yuning
    Artemova, Ekaterina
    Tutubalina, Elena
    Chowell, Gerardo
    EPIDEMIOLOGIA, 2021, 2 (03): : 315 - 324
  • [32] Epidemiology of eight common rheumatic diseases in China: a large-scale cross-sectional survey in Beijing
    Li, Ru
    Sun, Jian
    Ren, Li-Min
    Wang, Hong-Yu
    Liu, Wen-Hong
    Zhang, Xue-Wu
    Chen, Shi
    Mu, Rong
    He, Jing
    Zhao, Yi
    Long, Li
    Liu, Yan-Ying
    Liu, Xia
    Lu, Xiao-Lan
    Li, Yu-Hui
    Wang, Shi-Yao
    Pan, Si-Si
    Li, Chun
    Wang, Hong-Yuan
    Li, Zhan-Guo
    RHEUMATOLOGY, 2012, 51 (04) : 721 - 729
  • [33] Large-scale epidemiological analysis of common skin diseases to identify shared and unique comorbidities and demographic factors
    Li, Qinmengge
    Patrick, Matthew T.
    Sreeskandarajan, Sutharzan
    Kang, Jian
    Kahlenberg, J. Michelle
    Gudjonsson, Johann E.
    He, Zhi
    Tsoi, Lam C.
    FRONTIERS IN IMMUNOLOGY, 2024, 14
  • [34] An Empirical Investigation of Online News Classification on an Open-Domain, Large-Scale and High-Quality Dataset in Vietnamese
    Khanh Quoc Tran
    Phap Ngoc Trinh
    Khoa Nguyen-Anh Tran
    An Tran-Hoai Le
    Luan Van Ha
    Kiet Van Nguyen
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 367 - 379
  • [35] LAION-5B: An open large-scale dataset for training next generation image-text models
    Schuhmann, Christoph
    Beaumont, Romain
    Vencu, Richard
    Gordon, Cade
    Wightman, Ross
    Cherti, Mehdi
    Coombes, Theo
    Katta, Aarush
    Mullis, Clayton
    Wortsman, Mitchell
    Schramowski, Patrick
    Kundurthy, Srivatsa
    Crowson, Katherine
    Schmidt, Ludwig
    Kaczmarczyk, Robert
    Jitsev, Jenia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
    Jin, Di
    Pan, Eileen
    Oufattole, Nassim
    Weng, Wei-Hung
    Fang, Hanyi
    Szolovits, Peter
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [37] Trends in Prescribing Antipsychotics for Children and Adolescents in Japan: A Descriptive Epidemiological Study Using a Large-Scale Pharmacy Dataset
    Nakane, Sayuri
    Tanaka-Mizuno, Sachiko
    Nishiyama, Chika
    Kochi, Kenji
    Yamamoto-Sasaki, Madoka
    Takeuchi, Masato
    Ogawa, Yusuke
    Doi, Yuko
    Arai, Masaru
    Fujii, Yosuke
    Matsunaga, Toshiyuki
    Furukawa, Toshiaki A.
    Kawakami, Koji
    CHILD PSYCHIATRY & HUMAN DEVELOPMENT, 2023, 54 (05) : 1250 - 1257
  • [38] Trends in Prescribing Antipsychotics for Children and Adolescents in Japan: A Descriptive Epidemiological Study Using a Large-Scale Pharmacy Dataset
    Sayuri Nakane
    Sachiko Tanaka-Mizuno
    Chika Nishiyama
    Kenji Kochi
    Madoka Yamamoto-Sasaki
    Masato Takeuchi
    Yusuke Ogawa
    Yuko Doi
    Masaru Arai
    Yosuke Fujii
    Toshiyuki Matsunaga
    Toshiaki A. Furukawa
    Koji Kawakami
    Child Psychiatry & Human Development, 2023, 54 : 1250 - 1257
  • [39] MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
    Feng, Jiazhan
    Sun, Qingfeng
    Xu, Can
    Zhao, Pu
    Yang, Yaming
    Tao, Chongyang
    Zhao, Dongyan
    Lin, Qingwei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7348 - 7363
  • [40] AbdomenAtlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking
    Li, Wenxuan
    Qu, Chongyu
    Chen, Xiaoxi
    Bassi, Pedro R. A. S.
    Shi, Yijia
    Lai, Yuxiang
    Yu, Qian
    Xue, Huimin
    Chen, Yixiong
    Lin, Xiaorui
    Tang, Yutong
    Cao, Yining
    Han, Haoqi
    Zhang, Zheyuan
    Liu, Jiawei
    Zhang, Tiezheng
    Ma, Yujiu
    Wang, Jincheng
    Zhang, Guang
    Yuille, Alan
    Zhou, Zongwei
    MEDICAL IMAGE ANALYSIS, 2024, 97