The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets

被引：2

作者：

Lancia, Giuseppe ^{[1
]}

Serafini, Paolo ^{[1
]}

机构：

[1] Univ Udine, Dept Math & Comp Sci, Udine, Italy

来源：

BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2016) | 2016年 / 9656卷

关键词：

FEATURE-SELECTION; CLASSIFICATION; CANCER;

D O I：

10.1007/978-3-319-31744-1_1

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.

引用

页码：3 / 12

页数：10

共 50 条

[1] Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data
Lancia, Giuseppe
Serafini, Paolo
[J]. ALGORITHMS, 2021, 14 (08)
[2] On the complexity of some data analysis problems
Kel'manov, A. V.
[J]. COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2010, 50 (11) : 1941 - 1947
[3] On the complexity of some data analysis problems
A. V. Kel’manov
[J]. Computational Mathematics and Mathematical Physics, 2010, 50 : 1941 - 1947
[4] Number of Solutions for Some Special Logical Analysis Problems of Integer Data
A. P. Djukova
E. V. Djukova
[J]. Journal of Computer and Systems Sciences International, 2023, 62 : 817 - 826
[5] Number of Solutions for Some Special Logical Analysis Problems of Integer Data
Djukova, A. P.
Djukova, E. V.
[J]. JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2023, 62 (05) : 817 - 826
[6] Multidimensional scaling for large genomic data sets
Jengnan Tzeng
Henry Horng-Shing Lu
Wen-Hsiung Li
[J]. BMC Bioinformatics, 9
[7] Multidimensional scaling for large genomic data sets
Tzeng, Jengnan
Lu, Henry Horng-Shing
Li, Wen-Hsiung
[J]. BMC BIOINFORMATICS, 2008, 9 (1)
[8] Choice of optimal complexity of the class of logical decision functions in pattern recognition problems
Berikov, V. B.
Lbov, G. S.
[J]. DOKLADY MATHEMATICS, 2007, 76 (03) : 969 - 971
[9] Choice of optimal complexity of the class of logical decision functions in pattern recognition problems
V. B. Berikov
G. S. Lbov
[J]. Doklady Mathematics, 2007, 76 : 969 - 971
[10] A heuristic algorithm for pattern identification in large multivariate analysis of geophysical data sets
da Silva Pereira, Joao Eduardo
Strieder, Adelir Jose
Amador, Janete Pereira
Silverio da Silva, Jose Luiz
Volcato Descovi Filho, Leonidas Luiz
[J]. COMPUTERS & GEOSCIENCES, 2010, 36 (01) : 83 - 90

← 1 2 3 4 5 →