Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders

被引:0
|
作者
Ferre, Quentin [1 ,2 ]
Cheneby, Jeanne [1 ]
Puthier, Denis [1 ]
Capponi, Cecile [2 ]
Ballester, Benoit [1 ]
机构
[1] Aix Marseille Univ, TAGC, INSERM, Marseille, France
[2] Aix Marseille Univ, Univ Toulon, LIS, CNRS, Marseille, France
关键词
Genomic assay; Anomaly detection; Cis regulatory element; Unsupervised curation; Convolutional autoencoder; ChIP-seq peak quality; Model interpretability; CHIP-SEQ; INTEGRATIVE ANALYSIS; REGULATORY REGIONS;
D O I
10.1186/s12859-021-04359-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions' representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database's large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Unsupervised Anomaly Detection in Noisy Business Process Event Logs Using Denoising Autoencoders
    Nolle, Timo
    Seeliger, Alexander
    Muehlhaeuser, Max
    DISCOVERY SCIENCE, (DS 2016), 2016, 9956 : 442 - 456
  • [32] Fuzzy Clustering based Anomaly Detection for Distributed Multi-view Data
    Wang, Hongmei
    Chen, Tianhua
    Wang, Hongtao
    Shao, Xuqiang
    Su, Pan
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [33] Anomaly Detection in Dynamic Networks using Multi-view Time-Series Hypersphere Learning
    Teng, Xian
    Lin, Yu-Ru
    Wen, Xidao
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 827 - 836
  • [34] A multi-view genomic data simulator
    Fratello, Michele
    Serra, Angela
    Fortino, Vittorio
    Raiconi, Giancarlo
    Tagliaferri, Roberto
    Greco, Dario
    BMC BIOINFORMATICS, 2015, 16
  • [35] Online Unsupervised Multi-view Feature Selection
    Shao, Weixiang
    He, Lifang
    Lu, Chun-Ta
    Wei, Xiaokai
    Yu, Philip S.
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1203 - 1208
  • [36] Generalized Multi-view Unsupervised Feature Selection
    Liu, Yue
    Zhang, Changqing
    Zhu, Pengfei
    Hu, Qinghua
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 469 - 478
  • [37] A multi-view genomic data simulator
    Michele Fratello
    Angela Serra
    Vittorio Fortino
    Giancarlo Raiconi
    Roberto Tagliaferri
    Dario Greco
    BMC Bioinformatics, 16
  • [38] Hierarchical unsupervised multi-view feature selection
    Chen, Tingjian
    Yuan, Haoliang
    Yin, Ming
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2022, 20 (06)
  • [39] Unsupervised Multi-view Object Proposal Ranking
    Man, Hong
    Dai, Shuanglu
    Lawrence, Victor
    LaPeruta, Thomas
    Hohil, Myron
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
  • [40] Collaborative Unsupervised Multi-View Representation Learning
    Zheng, Qinghai
    Zhu, Jihua
    Li, Zhongyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4202 - 4210