Collaborative Workflow for Analyzing Large-Scale Data for Antimicrobial Resistance: An Experience Report

被引:0
|
作者
Hou, Pei-Yu [1 ]
Ao, Jing [1 ]
Rindos, Andrew [2 ]
Keelara, Shivaramu [1 ]
Fedorka-Cray, Paula J. [1 ]
Chirkova, Rada [1 ]
机构
[1] North Carolina State Univ, Raleigh, NC 27695 USA
[2] IBM Corp, Res Triangle Pk, NC 27709 USA
关键词
data analytics; data integration; antimicrobial resistance; experts-in-the-loop; analysts-in-the-loop;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-life analytics-oriented information-integration projects, the processes of information curation and integration cannot be completely automated. Rather, in each large-scale project the key objectives include maximizing scalability and throughput, while at the same time keeping the processes manageable and productive for the human experts in the loop. In this paper, we describe our experience with addressing these major objectives in the process of building a scalable end-to end data-extraction, integration, and analytics workflow in the domain of antimicrobial resistance (AMR). The workflow is built using open-source tools, with the aims of enhancing the efficiency and accuracy of data collection and integration, while involving an acceptable level of efforts by collaborative multidisciplinary teams of humans-in-the-loop. We present the components of the proposed workflow, outline the challenges encountered in its development and testing, and discuss the experiences and lessons learned in enabling AMR experts and data analysts to interact. with the workflow, with some of the lessons potentially applicable to other application domains.
引用
收藏
页码:4608 / 4617
页数:10
相关论文
共 50 条
  • [1] Experience Analyzing Wind Data for Large-Scale Integration
    Gao Zhi
    Ren Chang
    Freeman, Lavelle A.
    Miller, Nicholas W.
    Shao Miaolei
    [J]. 9TH INTERNATIONAL WORKSHOP ON LARGE-SCALE INTEGRATION OF WIND POWER INTO POWER SYSTEMS AS WELL AS ON TRANSMISSION NETWORKS FOR OFFSHORE WIND POWER PLANTS, 2010, : 686 - +
  • [2] Analyzing large-scale genomic data with cloud data lakes
    Weintraub, Grisha
    Hadar, Noam
    Gudes, Ehud
    Dolev, Shlomi
    Birk, Ohad S.
    [J]. PROCEEDINGS OF THE 16TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, SYSTOR 2023, 2023, : 142 - 142
  • [3] A Workflow for Parallel and Distributed Computing of Large-Scale Genomic Data
    Choi, Hyun-Hwa
    Kim, Byoung-Seob
    Ahn, Shin-Young
    Bae, Seung-Jo
    [J]. 2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 215 - 218
  • [4] Large-Scale Collaborative Analysis and Extraction of Web Data
    Weigel, Felix
    Panda, Biswanath
    Riedewald, Mirek
    Gehrke, Johannes
    Calimlim, Manuel
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1476 - 1479
  • [5] Analyzing syslog data for diagnosing large-scale network failures
    Kimura, Tatsuaki
    [J]. Journal of the Institute of Electronics, Information and Communication Engineers, 2015, 98 (09): : 823 - 828
  • [6] Analyzing large-scale spiking neural data wth HRLAnalysis™
    Thibeault, Corey M.
    O'Brien, Michael J.
    Srinivasa, Narayan
    [J]. FRONTIERS IN NEUROINFORMATICS, 2014, 8
  • [7] Approaches to analyzing binary data for large-scale A/B testing
    Zhou, Wenru
    Kroehl, Miranda
    Meier, Maxene
    Kaizer, Alexander
    [J]. CONTEMPORARY CLINICAL TRIALS COMMUNICATIONS, 2023, 32
  • [8] A data management and publication workflow for a large-scale, heterogeneous sensor network
    Jones, Amber Spackman
    Horsburgh, Jeffery S.
    Reeder, Stephanie L.
    Ramirez, Maurier
    Caraballo, Juan
    [J]. ENVIRONMENTAL MONITORING AND ASSESSMENT, 2015, 187 (06)
  • [9] A data management and publication workflow for a large-scale, heterogeneous sensor network
    Amber Spackman Jones
    Jeffery S. Horsburgh
    Stephanie L. Reeder
    Maurier Ramírez
    Juan Caraballo
    [J]. Environmental Monitoring and Assessment, 2015, 187
  • [10] Analyzing Wikipedia Users' Perceived Quality of Experience: A Large-Scale Study
    Salutari, Flavia
    Da Hora, Diego
    Dubuc, Gilles
    Rossi, Dario
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (02): : 1082 - 1095