The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets

被引:2
|
作者
Lancia, Giuseppe [1 ]
Serafini, Paolo [1 ]
机构
[1] Univ Udine, Dept Math & Comp Sci, Udine, Italy
关键词
FEATURE-SELECTION; CLASSIFICATION; CANCER;
D O I
10.1007/978-3-319-31744-1_1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.
引用
收藏
页码:3 / 12
页数:10
相关论文
共 50 条
  • [31] Problems and Opportunities of Working with a Telco's Large Data Sets of Mobile Data
    Ferres, Leo
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 229 - 229
  • [32] ON SOME PROBLEMS IN DEFINING SETS FOR Q-ANALYSIS
    COUCLELIS, H
    [J]. ENVIRONMENT AND PLANNING B-PLANNING & DESIGN, 1983, 10 (04): : 423 - 438
  • [33] The correctness of large scale analysis of genomic data
    Wojciechowski, Pawel
    Krause, Karol
    Lukasiak, Piotr
    Blazewicz, Jacek
    [J]. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2021, 46 (04) : 423 - 436
  • [34] EXTRACTION OF INFORMATION FROM LARGE DATA SETS BY PATTERN-RECOGNITION
    MASSART, DL
    [J]. FRESENIUS ZEITSCHRIFT FUR ANALYTISCHE CHEMIE, 1982, 311 (04): : 318 - 318
  • [35] EXTRACTION OF INFORMATION FROM LARGE DATA SETS BY PATTERN-RECOGNITION
    DERDE, MP
    MASSART, DL
    [J]. FRESENIUS ZEITSCHRIFT FUR ANALYTISCHE CHEMIE, 1982, 313 (06): : 484 - 495
  • [36] On complexity of some problems of cluster analysis of vector sequences
    Kel'manov A.V.
    Pyatkin A.V.
    [J]. Journal of Applied and Industrial Mathematics, 2013, 7 (3) : 363 - 369
  • [37] Some thoughts on the impact of large data sets on regional science
    Getis, A
    [J]. ANNALS OF REGIONAL SCIENCE, 1999, 33 (02): : 145 - 150
  • [38] Some thoughts on the impact of large data sets on regional science
    Arthur Getis
    [J]. The Annals of Regional Science, 1999, 33 : 145 - 150
  • [39] A design pattern for efficient retrieval of large data sets from remote data sources
    Long, B
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2002: COOPLS, DOA, AND ODBASE, 2002, 2519 : 650 - 660
  • [40] Two realizations of the pattern informativity idea for the method of data logical analysis
    Kuzmich, R., I
    Mashinets, E. E.
    Povazhnyuk, I
    Stupina, A. A.
    [J]. II INTERNATIONAL SCIENTIFIC CONFERENCE ON APPLIED PHYSICS, INFORMATION TECHNOLOGIES AND ENGINEERING 25, PTS 1-5, 2020, 1679