The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets

被引:2
|
作者
Lancia, Giuseppe [1 ]
Serafini, Paolo [1 ]
机构
[1] Univ Udine, Dept Math & Comp Sci, Udine, Italy
关键词
FEATURE-SELECTION; CLASSIFICATION; CANCER;
D O I
10.1007/978-3-319-31744-1_1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.
引用
收藏
页码:3 / 12
页数:10
相关论文
共 50 条
  • [1] Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data
    Lancia, Giuseppe
    Serafini, Paolo
    [J]. ALGORITHMS, 2021, 14 (08)
  • [2] On the complexity of some data analysis problems
    Kel'manov, A. V.
    [J]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2010, 50 (11) : 1941 - 1947
  • [3] On the complexity of some data analysis problems
    A. V. Kel’manov
    [J]. Computational Mathematics and Mathematical Physics, 2010, 50 : 1941 - 1947
  • [4] Number of Solutions for Some Special Logical Analysis Problems of Integer Data
    A. P. Djukova
    E. V. Djukova
    [J]. Journal of Computer and Systems Sciences International, 2023, 62 : 817 - 826
  • [5] Number of Solutions for Some Special Logical Analysis Problems of Integer Data
    Djukova, A. P.
    Djukova, E. V.
    [J]. JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2023, 62 (05) : 817 - 826
  • [6] Multidimensional scaling for large genomic data sets
    Jengnan Tzeng
    Henry Horng-Shing Lu
    Wen-Hsiung Li
    [J]. BMC Bioinformatics, 9
  • [7] Multidimensional scaling for large genomic data sets
    Tzeng, Jengnan
    Lu, Henry Horng-Shing
    Li, Wen-Hsiung
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [8] Choice of optimal complexity of the class of logical decision functions in pattern recognition problems
    Berikov, V. B.
    Lbov, G. S.
    [J]. DOKLADY MATHEMATICS, 2007, 76 (03) : 969 - 971
  • [9] Choice of optimal complexity of the class of logical decision functions in pattern recognition problems
    V. B. Berikov
    G. S. Lbov
    [J]. Doklady Mathematics, 2007, 76 : 969 - 971
  • [10] A heuristic algorithm for pattern identification in large multivariate analysis of geophysical data sets
    da Silva Pereira, Joao Eduardo
    Strieder, Adelir Jose
    Amador, Janete Pereira
    Silverio da Silva, Jose Luiz
    Volcato Descovi Filho, Leonidas Luiz
    [J]. COMPUTERS & GEOSCIENCES, 2010, 36 (01) : 83 - 90