The problem of missing data in geoscience databases

被引：10

作者：

Henley, Stephen ^{[1
]}

机构：

[1] Resources Comp Int Ltd, Matlock DE4 5JA, Derby, England

来源：

COMPUTERS & GEOSCIENCES | 2006年 / 32卷 / 09期

关键词：

relational database; open-world assumption; closed-world assumption; missing data; SQL; logic; fuzzy logic;

D O I：

10.1016/j.cageo.2005.12.008

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

SQL is the (more or less) standardised language that is used by the majority of commercial database management systems. However, it is seriously flawed, as has been documented in detail by Date, Darwen, Pascal, and others. One of the most serious problems with SQL is the way it handles missing data. It uses a special value `NULL' to represent data items whose value is not known. This can have a variety of meanings in different circumstances (such as `inapplicable' or `unknown'). The SQL language also allows an `unknown' truth value in logical expressions. The resulting incomplete three-valued logic leads to inconsistencies in data handling within relational database management systems. Relational database theorists advocate that a strict two-valued logic (true/false) be used instead, with prohibition of the use of NULL, and justify this stance by assertion that it is a true representation of the `real world'. Nevertheless, in real geoscience data there is a complete gradation between exact values and missing data: for example, geochemical analyses are inexact (and the uncertainty should be recorded); the precision of numeric or textual data may also be expressed qualitatively by terms such as `approximately' or `possibly'. Furthermore, some data are by their nature incomplete: for example, where samples could not be collected or measurements could not be taken because of inaccessibility. It is proposed in this paper that the best way to handle such data sets is to replace the closed-world assumption and its concomitant strict two-valued logic, upon which the present relational database model is based, by the open-world assumption which allows for other logical values in addition to the extremes of `true' and `false'. Possible frameworks for such a system are explored, and could use Codd's `marks', Darwen's approach (recording the status of information known about each data item), or other approaches such as fuzzy logic. (c) 2006 Elsevier Ltd. All rights reserved.

引用

页码：1368 / 1377

页数：10

共 50 条

[1] Imputation of missing data in industrial databases
Lakshminarayan, K
Harp, SA
Samad, T
[J]. APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
[2] MISSING DATA IN LARGE ICU DATABASES
Fialho, Andre
Cismondi, Federico
Vieira, Susana
Sousa, Joao
Reti, Shane
Welsch, Roy
Howell, Michael
Finkelstein, Stan
[J]. CRITICAL CARE MEDICINE, 2010, 38 (12) : U6 - U6
[3] Imputation of Missing Data in Industrial Databases
Kamakshi Lakshminarayan
Steven A. Harp
Tariq Samad
[J]. Applied Intelligence, 1999, 11 : 259 - 275
[4] Methods for interpolating missing data in aerobiological databases
Picornell, A.
Oteros, J.
Ruiz-Mata, R.
Recio, M.
Trigo, M. M.
Martinez-Bracero, M.
Lara, B.
Serrano-Garcia, A.
Galan, C.
Garcia-Mozo, H.
Alcazar, P.
Perez-Badia, R.
Cabezudo, B.
Romero-Morte, J.
Rojo, J.
[J]. ENVIRONMENTAL RESEARCH, 2021, 200
[5] The Problem of Missing Data: Using Imputation Methods To Facilitate Oncology Outcomes Research across Four Databases
Lau, Edmund L.
Legg, Jason
Watson, Heather N.
Steffey, Duane
Mowat, Fionna S.
Kelsh, Michael A.
[J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2011, 20 : S275 - S276
[6] Comparative methods for handling missing data in large databases
Henry, Antonia J.
Hevelone, Nathanael D.
Lipsitz, Stuart
Nguyen, Louis L.
[J]. JOURNAL OF VASCULAR SURGERY, 2013, 58 (05) : 1353 - +
[7] Missing data in medical databases: Impute, delete or classify?
Cismondi, Federico
Fialho, Andre S.
Vieira, Susana M.
Reti, Shane R.
Sousa, Joao M. C.
Finkelstein, Stan N.
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2013, 58 (01) : 63 - 72
[8] Missing data, part 1. Why missing data are a problem
Tra My Pham
Pandis, Nikolaos
White, Ian R.
[J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 161 (06) : 888 - 889
[9] Handling of missing data to improve the mining of large feed databases
Maroto-Molina, F.
Gomez-Cabrera, A.
Guerrero-Ginel, J. E.
Garrido-Varo, A.
Sauvant, D.
Tran, G.
Heuze, V.
Perez-Marin, D. C.
[J]. JOURNAL OF ANIMAL SCIENCE, 2013, 91 (01) : 491 - 500
[10] A heuristic approach to handling missing data in biologics manufacturing databases
Jeanet Mante
Nishanthi Gangadharan
David J. Sewell
Richard Turner
Ray Field
Stephen G. Oliver
Nigel Slater
Duygu Dikicioglu
[J]. Bioprocess and Biosystems Engineering, 2019, 42 : 657 - 663

← 1 2 3 4 5 →