Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

被引:0
|
作者
Hari M. Trivedi
Maryam Panahiazar
April Liang
Dmytro Lituiev
Peter Chang
Jae Ho Sohn
Yunn-Yi Chen
Benjamin L. Franc
Bonnie Joe
Dexter Hadley
机构
[1] University of California,Department of Radiology and Biomedical Imaging
[2] University of California,Institute for Computational Health Sciences
[3] University of California School of Medicine,Department of Pathology
[4] University of California,undefined
来源
关键词
IBM Watson; Machine learning; Artificial intelligence; Deep learning; Natural language processing (NLP); Pathology; Mammography;
D O I
暂无
中图分类号
学科分类号
摘要
Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson’s automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson’s NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
引用
收藏
页码:30 / 37
页数:7
相关论文
共 50 条
  • [1] Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning
    Trivedi, Hari M.
    Panahiazar, Maryam
    Liang, April
    Lituiev, Dmytro
    Chang, Peter
    Sohn, Jae Ho
    Chen, Yunn-Yi
    Franc, Benjamin L.
    Joe, Bonnie
    Hadley, Dexter
    [J]. JOURNAL OF DIGITAL IMAGING, 2019, 32 (01) : 30 - 37
  • [2] Addressing medical coding of free-text clinical records in English with deep learning
    Nugmanov, Ramil
    Miftahutdinov, Zulfat
    Tutubalina, Elena
    [J]. EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2019, 49 : 117 - 117
  • [3] Automated Misspelling Detection and Correction in Clinical Free-Text Records
    Nazir, Aiman Khan
    Zafar, Iqra
    Fatima, Alia
    Qamar, Usman
    Shaheen, Asma
    Maqbool, Bilal
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 277 - 280
  • [4] Automated misspelling detection and correction in clinical free-text records
    Lai, Kenneth H.
    Topaz, Maxim
    Goss, Foster R.
    Zhou, Li
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 : 188 - 195
  • [5] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    [J]. BMC Medical Informatics and Decision Making, 8
  • [6] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [7] Statistical Section Segmentation in Free-Text Clinical Records
    Tepper, Michael
    Capurro, Daniel
    Xia, Fei
    Vanderwende, Lucy
    Yetisgen-Yildiz, Meliha
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2001 - 2008
  • [8] Deep Learning to Classify Radiology Free-Text Reports
    Chen, Matthew C.
    Ball, Robyn L.
    Yang, Lingyao
    Moradzadeh, Nathaniel
    Chapman, Brian E.
    Larson, David B.
    Langlotz, Curtis P.
    Amrhein, Timothy J.
    Lungren, Matthew P.
    [J]. RADIOLOGY, 2018, 286 (03) : 845 - 852
  • [9] Fever detection from free-text clinical records for biosurveillance
    Chapman, WW
    Dowling, JN
    Wagner, MM
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) : 120 - 127
  • [10] Semi-automated ontology development scheme via text mining of scientific records
    Tamjid, Somayeh
    Nooshinfard, Fatemeh
    Beheshti, Molouk Sadat Hosseini
    Hariri, Nadjla
    Babalhavaeji, Fahimeh
    [J]. ELECTRONIC LIBRARY, 2024, 42 (02): : 230 - 254