Learning from data with structured missingness

被引:15
|
作者
Mitra, Robin [1 ,2 ]
McGough, Sarah F. [3 ]
Chakraborti, Tapabrata [1 ,4 ]
Holmes, Chris [1 ,5 ]
Copping, Ryan [3 ]
Hagenbuch, Niels [6 ]
Biedermann, Stefanie [7 ]
Noonan, Jack [8 ]
Lehmann, Brieuc [2 ]
Shenvi, Aditi [9 ]
Doan, Xuan Vinh [10 ]
Leslie, David [1 ,11 ]
Bianconi, Ginestra [1 ,12 ]
Sanchez-Garcia, Ruben [1 ,13 ]
Davies, Alisha [1 ,14 ,15 ]
Mackintosh, Maxine [1 ,16 ]
Andrinopoulou, Eleni-Rosalina [17 ,18 ]
Basiri, Anahid [1 ,19 ]
Harbron, Chris [20 ]
MacArthur, Ben D. [1 ]
机构
[1] Alan Turing Inst, London, England
[2] UCL, Stat Sci, London, England
[3] Genentech Inc, South San Francisco, CA 94080 USA
[4] UCL, UCL Canc Inst, Dept Med Phys & Biomed Engn, London, England
[5] Univ Oxford, Dept Stat, Oxford, England
[6] F Hoffmann La Roche & Cie AG, Basel, Switzerland
[7] Open Univ, Sch Math & Stat, Milton Keynes, England
[8] Cardiff Univ, Sch Math, Cardiff, Wales
[9] Univ Warwick, Dept Stat, Coventry, England
[10] Univ Warwick, Warwick Business Sch, Coventry, England
[11] Queen Mary Univ London, Digital Environm Res Inst, London, England
[12] Queen Mary Univ London, Sch Math Sci, London, England
[13] Univ Southampton, Math Sci, Southampton, England
[14] Swansea Univ, Fac Hlth & Life Sci, Swansea, Wales
[15] Publ Hlth Wales, Cardiff, Wales
[16] Genom England, London, England
[17] Erasmus MC, Dept Biostat, Rotterdam, Netherlands
[18] Erasmus MC, Dept Epidemiol, Rotterdam, Netherlands
[19] Univ Glasgow, Sch Geog & Earth Sci, Glasgow, Scotland
[20] Roche Pharmaceut, Welwyn Garden City, England
关键词
MULTIPLE IMPUTATION; INFERENCE;
D O I
10.1038/s42256-022-00596-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gathering big datasets has become an essential component of machine learning in many scientific areas, but it is unavoidable that some data values are missing. An important and growing effect that needs careful attention, especially when heterogeneous data sources are combined, is that of structured missingness, where data values are missing not at random, but with a specific structure. Missing data are an unavoidable complication in many machine learning tasks. When data are 'missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such 'structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 50 条
  • [1] Learning from data with structured missingness
    Robin Mitra
    Sarah F. McGough
    Tapabrata Chakraborti
    Chris Holmes
    Ryan Copping
    Niels Hagenbuch
    Stefanie Biedermann
    Jack Noonan
    Brieuc Lehmann
    Aditi Shenvi
    Xuan Vinh Doan
    David Leslie
    Ginestra Bianconi
    Ruben Sanchez-Garcia
    Alisha Davies
    Maxine Mackintosh
    Eleni-Rosalina Andrinopoulou
    Anahid Basiri
    Chris Harbron
    Ben D. MacArthur
    [J]. Nature Machine Intelligence, 2023, 5 : 13 - 23
  • [2] Embedding for Informative Missingness: Deep Learning With Incomplete Data
    Ghorbani, Amirata
    Zou, James Y.
    [J]. 2018 56TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2018, : 437 - 445
  • [3] Missingness-Pattern-Adaptive Learning With Incomplete Data
    Gong, Yongshun
    Li, Zhibin
    Liu, Wei
    Lu, Xiankai
    Liu, Xinwang
    Tsang, Ivor W. W.
    Yin, Yilong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 11053 - 11066
  • [4] Learning comprehensible theories from structured data
    Lloyd, JW
    [J]. ADVANCED LECTURES ON MACHINE LEARNING, 2002, 2600 : 203 - 225
  • [5] Learning from highly structured data by decomposition
    Mac Kinney-Romero, R
    Giraud-Carrier, C
    [J]. PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 436 - 441
  • [6] Learning structured data from unspecific reinforcement
    Biehl, M
    Kühn, R
    Stamatescu, IO
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 2000, 33 (39): : 6843 - 6857
  • [7] Transfer Learning Approach for Learning of Unstructured Data from Structured Data in Medical Domain
    Wankhade, Nishigandha V.
    Potey, Madhuri A.
    [J]. 2013 2ND INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT IN THE KNOWLEDGE ECONOMY (IMKE), 2013, : 86 - 91
  • [8] DIFFER: A Propositionalization approach for Learning from Structured Data
    Karunaratne, Thashmee
    Bostrom, Henrik
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 49 - +
  • [9] On the computational hardness of learning from structured symbolic data
    Jappy, P
    Gascuel, O
    [J]. ORDINAL AND SYMBOLIC DATA ANALYSIS, 1996, : 189 - 200
  • [10] On the hardness of learning queries from tree structured data
    Xianmin Liu
    Jianzhong Li
    [J]. Journal of Combinatorial Optimization, 2015, 29 : 670 - 684