A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer

被引:19
|
作者
Ding, Kexin [1 ]
Zhou, Mu [2 ]
Wang, He [3 ]
Gevaert, Olivier [4 ]
Metaxas, Dimitris [5 ]
Zhang, Shaoting [6 ]
机构
[1] Univ North Carolina Charlotte, Dept Comp Sci, Charlotte, NC 28262 USA
[2] Sensebrain Res, San Jose, CA 95131 USA
[3] Yale Univ, Dept Pathol, New Haven, CT 06520 USA
[4] Stanford Univ, Stanford Ctr Biomed Informat Res, Dept Med & Biomed Data Sci, Stanford, CA 94305 USA
[5] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08901 USA
[6] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
关键词
HISTOPATHOLOGY;
D O I
10.1038/s41597-023-02125-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The success of training computer-vision models heavily relies on the support of large-scale, real-world images with annotations. Yet such an annotation-ready dataset is difficult to curate in pathology due to the privacy protection and excessive annotation burden. To aid in computational pathology, synthetic data generation, curation, and annotation present a cost-effective means to quickly enable data diversity that is required to boost model performance at different stages. In this study, we introduce a large-scale synthetic pathological image dataset paired with the annotation for nuclei semantic segmentation, termed as Synthetic Nuclei and annOtation Wizard (SNOW). The proposed SNOW is developed via a standardized workflow by applying the off-the-shelf image generator and nuclei annotator. The dataset contains overall 20k image tiles and 1,448,522 annotated nuclei with the CC-BY license. We show that SNOW can be used in both supervised and semi-supervised training scenarios. Extensive results suggest that synthetic-data-trained models are competitive under a variety of model training settings, expanding the scope of better using synthetic images for enhancing downstream data-driven clinical tasks.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
    Kexin Ding
    Mu Zhou
    He Wang
    Olivier Gevaert
    Dimitris Metaxas
    Shaoting Zhang
    [J]. Scientific Data, 10
  • [2] Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening
    Gupta, Aayush
    Zhou, Huan-Xiang
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (09) : 4236 - 4244
  • [3] An AS-OCT image dataset for deep learning-enabled segmentation and 3D reconstruction for keratitis
    Sun, Yiming
    Maimaiti, Nuliqiman
    Xu, Peifang
    Jin, Peng
    Cai, Jingxuan
    Qian, Guiping
    Chen, Pengjie
    Xu, Mingyu
    Jia, Gangyong
    Wu, Qing
    Ye, Juan
    [J]. SCIENTIFIC DATA, 2024, 11 (01)
  • [4] Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography
    Leclerc, Sarah
    Smistad, Erik
    Pedrosa, Joao
    Ostvik, Andreas
    Cervenansky, Frederic
    Espinosa, Florian
    Espeland, Torvald
    Berg, Erik Andreas Rye
    Jodoin, Pierre-Marc
    Grenier, Thomas
    Lartizien, Carole
    D'hooge, Jan
    Lovstakken, Lasse
    Bernard, Olivier
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (09) : 2198 - 2210
  • [5] Deep Learning Hyperspectral Pansharpening on Large-Scale PRISMA Dataset
    Zini, Simone
    Barbato, Mirko Paolo
    Piccoli, Flavio
    Napoletano, Paolo
    [J]. REMOTE SENSING, 2024, 16 (12)
  • [6] Deep learning-enabled segmentation of ambiguous bioimages with deepflash2
    Griebel, Matthias
    Segebarth, Dennis
    Stein, Nikolai
    Schukraft, Nina
    Tovote, Philip
    Blum, Robert
    Flath, Christoph M.
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)
  • [7] Deep learning-enabled segmentation of ambiguous bioimages with deepflash2
    Matthias Griebel
    Dennis Segebarth
    Nikolai Stein
    Nina Schukraft
    Philip Tovote
    Robert Blum
    Christoph M. Flath
    [J]. Nature Communications, 14
  • [8] Segmentation and detection of skin cancer using deep learning-enabled artificial Namib beetle optimization
    Rao, N. Raghava
    Vasumathi, D.
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 96
  • [9] DRESIA: Deep Reinforcement Learning-Enabled Gray Box Approach for Large-Scale Dynamic Cyber-Twin System Simulation
    Lin, Zhouyang
    Li, Kai
    Yang, Yang
    Sun, Fanglei
    Wu, Liantao
    Shi, Panpan
    Ci, Song
    Zuo, Yong
    [J]. IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2021, 2 : 321 - 333
  • [10] ShaleSeg: Deep-learning dataset and models for practical fracture segmentation of large-scale shale CT images
    Wu, Yanfang
    Xiao, Zhuowei
    Li, Juan
    Li, Shouding
    Zhang, Luqing
    Zhou, Jian
    Zhang, Zhaobin
    He, Jianming
    Li, Xiao
    [J]. INTERNATIONAL JOURNAL OF ROCK MECHANICS AND MINING SCIENCES, 2024, 180