HCS-hierarchical algorithm for simulation of omics datasets

被引:0
|
作者
Stomma, Piotr [1 ,2 ]
Rudnicki, Witold R. [1 ,2 ]
机构
[1] Univ Bialystok, Fac Comp Sci, PL-15245 Bialystok, Poland
[2] Univ Bialystok, Computat Ctr, PL-15245 Bialystok, Poland
关键词
IDENTIFICATION; NETWORKS; MODULES; MATRIX;
D O I
10.1093/bioinformatics/btae392
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Analysis of the omics data with the help of machine learning (ML) methods is limited by small sample sizes and a large number of variables. One possible approach to deal with such data is using algorithms for feature selection and reducing the dataset to include only those variables that are related to the studied phenomena. Existing simulators of the omics data were mostly developed with the goal of improving the methods for generations of high-quality data, that correspond with the highest possible fidelity to the real level of molecular markers in the biological materials. The current study aims to simulate the data on a higher level of generalization. Such datasets can then be used to perform tests of the feature selection and ML algorithms on systems that have structures mimicking those of real data, but where the ground truth may be implanted by design. They can also be used to generate contrast variables with the desired correlation structure for the feature selection.Results We proposed the algorithm for the reconstruction of the omic dataset that, with high fidelity, preserves the correlation structure of the original data with a reduced number of parameters. It is based on the hierarchical clustering of variables and uses principal components of the clusters. It reproduces well topological descriptors of the correlation structure. The correlation structure of the principal components of the clusters then is used to obtain datasets with correlation structures similar to the original data but not correlated with the original variables.Availability and implementation The code and data is available at: https://github.com/p100mma/hcrs_omics.
引用
收藏
页码:ii98 / ii104
页数:7
相关论文
共 50 条
  • [21] SGI: automatic clinical subgroup identification in omics datasets
    Buyukozkan, Mustafa
    Suhre, Karsten
    Krumsiek, Jan
    [J]. BIOINFORMATICS, 2022, 38 (02) : 573 - 576
  • [22] Hierarchical partitioning algorithm for optimistic distributed simulation of DEVS models
    Kim, KH
    Kim, TG
    Park, KH
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 1998, 44 (6-7) : 433 - 455
  • [23] Hierarchical clustering algorithms for document datasets
    Zhao, Y
    Karypis, G
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (02) : 141 - 168
  • [24] Hierarchical Clustering Algorithms for Document Datasets
    Ying Zhao
    George Karypis
    Usama Fayyad
    [J]. Data Mining and Knowledge Discovery, 2005, 10 : 141 - 168
  • [25] A Hierarchical Clustering Approach for Image Datasets
    Pandey, Shreelekha
    Khanna, Pritee
    [J]. 2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 194 - +
  • [26] Hierarchical clustering algorithms for large datasets
    Stekh, Yuri
    Kernytskyy, Andriy
    Lobur, Mykhaylo
    [J]. TCSET 2006: MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2006, : 388 - 390
  • [27] Hierarchical σ-octree for visualization of ultrasound datasets
    Lim, Sukhyun
    Shin, Byeong-Seok
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 1044 - 1053
  • [28] HCS: A Fast and Efficient Algorithm to Smooth Hash Collision
    Xie, Yun
    Qiao, Dengke
    Sun, Yong
    Liu, Jingang
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 2738 - 2742
  • [29] An integrative imputation method based on multi-omics datasets
    Dongdong Lin
    Jigang Zhang
    Jingyao Li
    Chao Xu
    Hong-Wen Deng
    Yu-Ping Wang
    [J]. BMC Bioinformatics, 17
  • [30] Improving the discoverability, accessibility, and citability of omics datasets: a case report
    Darlington, Yolanda F.
    Naumov, Alexey
    McOwiti, Apollo
    Kankanamge, Wasula H.
    Becnel, Lauren B.
    McKenna, Neil J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (02) : 388 - 393