Data smashing: uncovering lurking order in data

被引:12
|
作者
Chattopadhyay, Ishanu [1 ,2 ]
Lipson, Hod [3 ,4 ]
机构
[1] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[2] Cornell Univ, Dept Comp Sci, Sch Mech & Aerosp Engn, Ithaca, NY 14853 USA
[3] Cornell Univ, Sch Mech & Aerosp Engn, Ithaca, NY USA
[4] Cornell Univ, Ithaca, NY USA
关键词
feature-free classification; universal metric; probabilistic automata; SET;
D O I
10.1098/rsif.2014.0826
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
From automatic speech recognition to discovering unusual stars, underlying almost all automated discovery tasks is the ability to compare and contrast data streams with each other, to identify connections and spot outliers. Despite the prevalence of data, however, automated methods are not keeping pace. A key bottleneck is that most data comparison algorithms today rely on a human expert to specifywhat 'features' of the data are relevant for comparison. Here, we propose a new principle for estimating the similarity between the sources of arbitrary data streams, using neither domain knowledge nor learning. We demonstrate the application of this principle to the analysis of data from a number of real-world challenging problems, including the disambiguation of electro-encephalograph patterns pertaining to epileptic seizures, detection of anomalous cardiac activity from heart sound recordings and classification of astronomical objects from raw photometry. In all these cases and without access to any domain knowledge, we demonstrate performance on a par with the accuracy achieved by specialized algorithms and heuristics devised by domain experts. We suggest that data smashing principles may open the door to understanding increasingly complex observations, especially when experts do not know what to look for.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Beautiful theory collides with smashing particle data
    Brumfiel, Geoff
    [J]. NATURE, 2011, 471 (7336) : 13 - 14
  • [2] Beautiful theory collides with smashing particle data
    Geoff Brumfiel
    [J]. Nature, 2011, 471 : 13 - 14
  • [3] June’s record-smashing temperatures — in data
    Katharine Sanderson
    [J]. Nature, 2023, 619 (7969) : 232 - 233
  • [4] Plugging project data gaps - Uncovering the missing data
    Ioli, D
    Cazaubon, P
    [J]. CHEMICAL PROCESSING, 1998, 61 (02): : 91 - 91
  • [5] Data Narratives: uncovering tensions in personal data management
    Vertesi, Janet
    Kaye, Jofish
    Jarosewski, Samantha N.
    Khovanskaya, Vera D.
    Song, Jenna
    [J]. ACM CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW 2016), 2016, : 478 - 490
  • [6] Uncovering Data Landscapes through Data Reconnaissance and Task Wrangling
    Crisan, Anamaria
    Munzner, Tamara
    [J]. 2019 IEEE VISUALIZATION CONFERENCE (VIS), 2019, : 46 - 50
  • [7] Big data registries in spine surgery research: the lurking dangers
    Claus, Chad F.
    Lytle, Evan
    Carr, Daniel A.
    Tong, Doris
    [J]. BMJ EVIDENCE-BASED MEDICINE, 2021, 26 (03) : 103 - 105
  • [8] Data, Data, Everywhere: Uncovering Everyday Data Experiences for People with Intellectual and Developmental Disabilities
    Wu, Keke
    Tran, Michelle H.
    Petersen, Emma
    Koushik, Varsha
    Szafr, Danielle Albers
    [J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [9] Uncovering the Evolution History of Data Lakes
    Klettke, Meike
    Awolin, Hannes
    Stoerl, Uta
    Mueller, Daniel
    Scherzinger, Stefanie
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2462 - 2471
  • [10] UNCOVERING PATTERNS IN BIOBEHAVIORAL LONGITUDINAL DATA
    Woods, L.
    [J]. GERONTOLOGIST, 2009, 49 : 173 - 173