Outlier Ranking for Large-Scale Public Health Data

被引:0
|
作者
Joshi, Ananya [1 ]
Townes, Tina [1 ]
Gormley, Nolan [1 ]
Neureiter, Luke [1 ]
Rosenfeld, Roni [1 ]
Wilder, Bryan [1 ]
机构
[1] Carnegie Mellon Univ, 5000 Forbes Rd, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tied outliers returned by univariate outlier detection methods applied to large-scale public health data streams. To help experts distinguish the most important outliers from these thousands of tied outliers, we propose a new task for algorithms to rank the outputs of any univariate method applied to each of many streams. Our novel algorithm for this task, which leverages hierarchical networks and extreme value analysis, performed the best across traditional outlier detection metrics in a human-expert evaluation using public health data streams. Most importantly, experts have used our open-source Python implementation since April 2023 and report identifying outliers worth investigating 9.1x faster than their prior baseline. Other organizations can readily adapt this implementation to create rankings from the outputs of their tailored univariate methods across large-scale streams.
引用
收藏
页码:22176 / 22184
页数:9
相关论文
共 50 条
  • [21] LARGE-SCALE RANKING AND SELECTION USING CLOUD COMPUTING
    Luo, Jun
    Hong, L. Jeff
    [J]. PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 4046 - 4056
  • [22] Unsupervised Domain Ranking in Large-Scale Web Crawls
    Cui, Yi
    Sparkman, Clint
    Lee, Hsin-Tsang
    Loguinov, Dmitri
    [J]. ACM TRANSACTIONS ON THE WEB, 2018, 12 (04)
  • [23] A Novel Ranking Model for a Large-Scale Scientific Publication
    Sohn, Bong-Soo
    Jung, Jai E.
    [J]. MOBILE NETWORKS & APPLICATIONS, 2015, 20 (04): : 508 - 520
  • [24] A Novel Ranking Model for a Large-Scale Scientific Publication
    Bong-Soo Sohn
    Jai E. Jung
    [J]. Mobile Networks and Applications, 2015, 20 : 508 - 520
  • [25] T2Ranking: A Large-scale Chinese Benchmark for Passage Ranking
    Xie, Xiaohui
    Dong, Qian
    Wang, Bingning
    Lv, Feiyang
    Yao, Ting
    Gan, Weinan
    Wu, Zhijing
    Li, Xiangsheng
    Li, Haitao
    Liu, Yiqun
    Ma, Jin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2681 - 2690
  • [26] An experiential account of a large-scale interdisciplinary data analysis of public engagement
    Goni, Julian Inaki
    Fuentes, Claudio
    Paz Raveau, Maria
    [J]. AI & SOCIETY, 2023, 38 (02) : 581 - 593
  • [27] Deriving public transportation timetables with large-scale cell phone data
    Horn, Christopher
    Kern, Roman
    [J]. 6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 : 59 - 66
  • [28] Large-scale public data reuse to model immunotherapy response and resistance
    Jingxin Fu
    Karen Li
    Wubing Zhang
    Changxin Wan
    Jing Zhang
    Peng Jiang
    X. Shirley Liu
    [J]. Genome Medicine, 12
  • [29] An experiential account of a large-scale interdisciplinary data analysis of public engagement
    Julian “Iñaki” Goñi
    Claudio Fuentes
    Maria Paz Raveau
    [J]. AI & SOCIETY, 2023, 38 : 581 - 593
  • [30] Large-scale public data reuse to model immunotherapy response and resistance
    Fu, Jingxin
    Li, Karen
    Zhang, Wubing
    Wan, Changxin
    Zhang, Jing
    Jiang, Peng
    Liu, X. Shirley
    [J]. GENOME MEDICINE, 2020, 12 (01)