HCS-hierarchical algorithm for simulation of omics datasets

被引:0
|
作者
Stomma, Piotr [1 ,2 ]
Rudnicki, Witold R. [1 ,2 ]
机构
[1] Univ Bialystok, Fac Comp Sci, PL-15245 Bialystok, Poland
[2] Univ Bialystok, Computat Ctr, PL-15245 Bialystok, Poland
关键词
IDENTIFICATION; NETWORKS; MODULES; MATRIX;
D O I
10.1093/bioinformatics/btae392
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Analysis of the omics data with the help of machine learning (ML) methods is limited by small sample sizes and a large number of variables. One possible approach to deal with such data is using algorithms for feature selection and reducing the dataset to include only those variables that are related to the studied phenomena. Existing simulators of the omics data were mostly developed with the goal of improving the methods for generations of high-quality data, that correspond with the highest possible fidelity to the real level of molecular markers in the biological materials. The current study aims to simulate the data on a higher level of generalization. Such datasets can then be used to perform tests of the feature selection and ML algorithms on systems that have structures mimicking those of real data, but where the ground truth may be implanted by design. They can also be used to generate contrast variables with the desired correlation structure for the feature selection.Results We proposed the algorithm for the reconstruction of the omic dataset that, with high fidelity, preserves the correlation structure of the original data with a reduced number of parameters. It is based on the hierarchical clustering of variables and uses principal components of the clusters. It reproduces well topological descriptors of the correlation structure. The correlation structure of the principal components of the clusters then is used to obtain datasets with correlation structures similar to the original data but not correlated with the original variables.Availability and implementation The code and data is available at: https://github.com/p100mma/hcrs_omics.
引用
收藏
页码:ii98 / ii104
页数:7
相关论文
共 50 条
  • [1] Hierarchical trees of unsteady simulation datasets
    Gayer, M
    Slavik, P
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON APPLIED SIMULATION AND MODELLING, 2004, : 303 - 308
  • [2] A biobjective feature selection algorithm for large omics datasets
    Cavique, Luis
    Mendes, Armando B.
    Martiniano, Hugo F. M. C.
    Correia, Luis
    [J]. EXPERT SYSTEMS, 2018, 35 (04)
  • [3] DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets
    Zhang, Wei
    Zhang, Gongxuan
    Chen, Xiaohui
    Liu, Yueqi
    Zhou, Xiumin
    Zhou, Junlong
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (04)
  • [4] NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets
    Zhang, Wei
    Zhang, Gongxuan
    Wang, Yongli
    Zhu, Zhaomeng
    Li, Tao
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2015, 9 (03) : 307 - 331
  • [5] A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets
    Zhou, Rong
    Zhang, Yong
    Feng, Shengzhong
    Luktarhan, Nurbol
    [J]. COMPLEXITY, 2018,
  • [6] Alignment of Single-cell Datasets with the Hierarchical Cell Matching Algorithm
    Koca, Mehmet Burak
    Sevilgen, Fatih Erdogan
    [J]. 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [7] Integrating omics datasets with the OmicsPLS package
    Said el Bouhaddani
    Hae-Won Uh
    Geurt Jongbloed
    Caroline Hayward
    Lucija Klarić
    Szymon M. Kiełbasa
    Jeanine Houwing-Duistermaat
    [J]. BMC Bioinformatics, 19
  • [8] Integrating omics datasets with the OmicsPLS package
    el Bouhaddani, Said
    Uh, Hae-Won
    Jongbloed, Geurt
    Hayward, Caroline
    Klaric, Lucija
    Kielbasa, Szymon M.
    Houwing-Duistermaat, Jeanine
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [9] A hierarchical exact accelerated stochastic simulation algorithm
    Orendorff, David
    Mjolsness, Eric
    [J]. JOURNAL OF CHEMICAL PHYSICS, 2012, 137 (21):
  • [10] General hierarchical circuit modeling and simulation algorithm
    Tan, SXD
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2005, 24 (03) : 418 - 434