The Information Content of Discrete Functions and Their Application in Genetic Data Analysis

被引:7
|
作者
Sakhanenko, Nikita A. [1 ]
Kunert-Graf, James [1 ]
Galas, David J. [1 ]
机构
[1] Pacific Northwest Res Inst, 720 Broadway, Seattle, WA USA
关键词
discrete functions; function classes; genetic interactions; information theory; multivariable dependence; COMPLEXITY;
D O I
10.1089/cmb.2017.0143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysisthat of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise.
引用
收藏
页码:1153 / 1178
页数:26
相关论文
共 50 条
  • [1] Analysis of information content of pharmacokinetic data using generalized sensitivity functions
    Thomaseth, K
    Cobelli, C
    [J]. PROCEEDINGS OF THE 22ND ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4, 2000, 22 : 435 - 437
  • [2] Estimating the information content of genetic sequence data
    Thorvaldsen, Steinar
    Hossjer, Ola
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2023, 72 (05) : 1310 - 1338
  • [3] TEST DATA GENERATION USING GENETIC ALGORITHMS AND INFORMATION CONTENT
    Nutescu, Ciprian-Ionut
    Mocanu, Mariana
    [J]. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2020, 82 (02): : 33 - 44
  • [4] GENETIC DATA-ANALYSIS, METHODS FOR DISCRETE POPULATION GENETIC DATA - WEIR,BS
    CURNOW, RN
    [J]. JOURNAL OF CLASSIFICATION, 1991, 8 (01) : 120 - 121
  • [5] GENETIC DATA-ANALYSIS - METHODS FOR DISCRETE POPULATION GENETIC DATA - WEIR,BS
    HUDSON, RR
    [J]. SCIENCE, 1990, 250 (4980) : 575 - 575
  • [6] GENETIC DATA-ANALYSIS - METHODS FOR DISCRETE POPULATION GENETIC DATA - WEIR,BS
    WEISS, KM
    [J]. AMERICAN JOURNAL OF HUMAN BIOLOGY, 1991, 3 (02) : 212 - 213
  • [7] Genetic design of discrete dynamical basis networks that approximate data sequences and functions
    Jones, KL
    Wild, TN
    Olmsted, DL
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2004, 35 (13-14) : 801 - 814
  • [8] On the information content of discrete phylogenetic characters
    Bordewich, Magnus
    Deutschmann, Ina Maria
    Fischer, Mareike
    Kasbohm, Elisa
    Semple, Charles
    Steel, Mike
    [J]. JOURNAL OF MATHEMATICAL BIOLOGY, 2018, 77 (03) : 527 - 544
  • [9] On the information content of discrete phylogenetic characters
    Magnus Bordewich
    Ina Maria Deutschmann
    Mareike Fischer
    Elisa Kasbohm
    Charles Semple
    Mike Steel
    [J]. Journal of Mathematical Biology, 2018, 77 : 527 - 544
  • [10] Data reduction of discrete responses: an application of cluster analysis
    Borland, J
    Hirschberg, J
    Lye, J
    [J]. APPLIED ECONOMICS LETTERS, 2001, 8 (03) : 149 - 153