Interval versions of statistical techniques with applications to environmental analysis, bioinformatics, and privacy in statistical databases

被引:21
|
作者
Kreinovich, Vladik [1 ]
Longpre, Luc
Starks, Scott A.
Xiang, Gang
Beck, Jan
Kandathi, Raj
Nayak, Asis
Ferson, Scott
Hajagos, Janos
机构
[1] Univ Texas, NASA, PACES, El Paso, TX 79968 USA
[2] Appl Biomath, Setauket, NY 11733 USA
[3] SUNY Stony Brook, Dept Ecol & Evolut, Stony Brook, NY 11794 USA
基金
美国国家航空航天局; 美国国家卫生研究院; 美国国家科学基金会;
关键词
intervals and probabilities; environmental analysis; bioinformatics; privacy; statistical databases;
D O I
10.1016/j.cam.2005.07.041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In many areas of science and engineering, it is desirable to estimate statistical characteristics (mean, variance, covariance, etc.) under interval uncertainty. For example, we may want to use the measured values x(t) of a pollution level in a lake at different moments of time to estimate the average pollution level; however, we do not know the exact values x(t)-e.g., if one of the measurement results is 0, this simply means that the actual (unknown) value of x(t) can be. anywhere between 0 and the detection limit (DL). We must, therefore, modify the existing statistical algorithms to process such interval data. Such a modification is also necessary to process data from statistical databases, where, in order to maintain privacy, we only keep interval ranges instead of the actual numeric data (e.g., a salary range instead of the actual salary). Most resulting computational problems are NP-hard-which means, crudely speaking, that in general, no computationally efficient algorithm can solve all particular cases of the corresponding problem. In this paper, we overview practical situations in which computationally efficient algorithms exist: e.g., situations when measurements are very accurate, or when all the measurements are done with one (or few) instruments. As a case study, we consider a practical problem from bioinformatics: to discover the genetic difference between the cancer cells and the healthy cells, we must process the measurements results and find the concentrations c and h of a given gene in cancer and in healthy cells. This is a particular case of a general situation in which, to estimate states or parameters which are not directly accessible by measurements, we must solve a system of equations in which coefficients are only known with interval uncertainty. We show that in general, this problem is NP-hard, and we describe new efficient algorithms for solving this problem in practically important situations. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:418 / 423
页数:6
相关论文
共 50 条
  • [11] Statistical and bioinformatics applications in biomedical omics research
    Chen, Dung-Tsa
    Chen, Yian Ann
    TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) : 180 - +
  • [12] Extendible arrays for statistical databases and OLAP applications
    Rotem, D
    Zhao, JL
    EIGHTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE SYSTEMS, PROCEEDINGS, 1996, : 108 - 117
  • [13] Privacy in statistical databases:: K-anonymity through microaggregation
    Domingo-Ferrer, Josep
    Solanas, Agusti
    Martinez-Balleste, Antoni
    2006 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, 2006, : 774 - +
  • [14] Extraction and applications of statistical relationships in relational databases
    Hou, WC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) : 939 - 945
  • [15] Sub-linear queries statistical databases: Privacy with power
    Dwork, C
    TOPICS IN CRYPTOLOGY - CT-RSA 2005, PROCEEDINGS, 2005, 3376 : 1 - 6
  • [16] A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases
    Fayyoumi, Ebaa
    Oommen, B. John
    SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (12): : 1161 - 1188
  • [17] Design and debugging databases for statistical analysis
    Rodriguez del Aguila, M. M.
    Garrido-Fernandez, P.
    ALLERGOLOGIA ET IMMUNOPATHOLOGIA, 2009, 37 (02) : 93 - 97
  • [18] Disclosure Analysis and Control in Statistical Databases
    Li, Yingjiu
    Lu, Haibing
    COMPUTER SECURITY - ESORIC 2008, PROCEEDINGS, 2008, 5283 : 146 - +
  • [19] Statistical analysis of RHESSI GRB databases
    Rípa, J.
    Hudec, R.
    Mészáros, A.
    Hajdas, W.
    Wigger, C.
    NUOVO CIMENTO DELLA SOCIETA ITALIANA DI FISICA B-GENERAL PHYSICS RELATIVITY ASTRONOMY AND MATHEMATICAL PHYSICS AND METHODS, 2006, 121 (12): : 1493 - 1494
  • [20] ANSWER-PERTURBATION TECHNIQUES FOR THE PROTECTION OF STATISTICAL DATABASES
    LUCHIAN, H
    STAMATE, D
    STATISTICS AND COMPUTING, 1995, 5 (03) : 203 - 213