The problem of missing data in geoscience databases

被引:10
|
作者
Henley, Stephen [1 ]
机构
[1] Resources Comp Int Ltd, Matlock DE4 5JA, Derby, England
关键词
relational database; open-world assumption; closed-world assumption; missing data; SQL; logic; fuzzy logic;
D O I
10.1016/j.cageo.2005.12.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
SQL is the (more or less) standardised language that is used by the majority of commercial database management systems. However, it is seriously flawed, as has been documented in detail by Date, Darwen, Pascal, and others. One of the most serious problems with SQL is the way it handles missing data. It uses a special value `NULL' to represent data items whose value is not known. This can have a variety of meanings in different circumstances (such as `inapplicable' or `unknown'). The SQL language also allows an `unknown' truth value in logical expressions. The resulting incomplete three-valued logic leads to inconsistencies in data handling within relational database management systems. Relational database theorists advocate that a strict two-valued logic (true/false) be used instead, with prohibition of the use of NULL, and justify this stance by assertion that it is a true representation of the `real world'. Nevertheless, in real geoscience data there is a complete gradation between exact values and missing data: for example, geochemical analyses are inexact (and the uncertainty should be recorded); the precision of numeric or textual data may also be expressed qualitatively by terms such as `approximately' or `possibly'. Furthermore, some data are by their nature incomplete: for example, where samples could not be collected or measurements could not be taken because of inaccessibility. It is proposed in this paper that the best way to handle such data sets is to replace the closed-world assumption and its concomitant strict two-valued logic, upon which the present relational database model is based, by the open-world assumption which allows for other logical values in addition to the extremes of `true' and `false'. Possible frameworks for such a system are explored, and could use Codd's `marks', Darwen's approach (recording the status of information known about each data item), or other approaches such as fuzzy logic. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1368 / 1377
页数:10
相关论文
共 50 条
  • [1] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    [J]. APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [2] MISSING DATA IN LARGE ICU DATABASES
    Fialho, Andre
    Cismondi, Federico
    Vieira, Susana
    Sousa, Joao
    Reti, Shane
    Welsch, Roy
    Howell, Michael
    Finkelstein, Stan
    [J]. CRITICAL CARE MEDICINE, 2010, 38 (12) : U6 - U6
  • [3] Imputation of Missing Data in Industrial Databases
    Kamakshi Lakshminarayan
    Steven A. Harp
    Tariq Samad
    [J]. Applied Intelligence, 1999, 11 : 259 - 275
  • [4] Methods for interpolating missing data in aerobiological databases
    Picornell, A.
    Oteros, J.
    Ruiz-Mata, R.
    Recio, M.
    Trigo, M. M.
    Martinez-Bracero, M.
    Lara, B.
    Serrano-Garcia, A.
    Galan, C.
    Garcia-Mozo, H.
    Alcazar, P.
    Perez-Badia, R.
    Cabezudo, B.
    Romero-Morte, J.
    Rojo, J.
    [J]. ENVIRONMENTAL RESEARCH, 2021, 200
  • [5] The Problem of Missing Data: Using Imputation Methods To Facilitate Oncology Outcomes Research across Four Databases
    Lau, Edmund L.
    Legg, Jason
    Watson, Heather N.
    Steffey, Duane
    Mowat, Fionna S.
    Kelsh, Michael A.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2011, 20 : S275 - S276
  • [6] Comparative methods for handling missing data in large databases
    Henry, Antonia J.
    Hevelone, Nathanael D.
    Lipsitz, Stuart
    Nguyen, Louis L.
    [J]. JOURNAL OF VASCULAR SURGERY, 2013, 58 (05) : 1353 - +
  • [7] Missing data in medical databases: Impute, delete or classify?
    Cismondi, Federico
    Fialho, Andre S.
    Vieira, Susana M.
    Reti, Shane R.
    Sousa, Joao M. C.
    Finkelstein, Stan N.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2013, 58 (01) : 63 - 72
  • [8] Missing data, part 1. Why missing data are a problem
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 161 (06) : 888 - 889
  • [9] Handling of missing data to improve the mining of large feed databases
    Maroto-Molina, F.
    Gomez-Cabrera, A.
    Guerrero-Ginel, J. E.
    Garrido-Varo, A.
    Sauvant, D.
    Tran, G.
    Heuze, V.
    Perez-Marin, D. C.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2013, 91 (01) : 491 - 500
  • [10] A heuristic approach to handling missing data in biologics manufacturing databases
    Jeanet Mante
    Nishanthi Gangadharan
    David J. Sewell
    Richard Turner
    Ray Field
    Stephen G. Oliver
    Nigel Slater
    Duygu Dikicioglu
    [J]. Bioprocess and Biosystems Engineering, 2019, 42 : 657 - 663