Ambiguous genes due to aligners and their impact on RNA-seq data analysis

被引:2
|
作者
Szabelska-Beresewicz, Alicja [1 ]
Zyprych-Walczak, Joanna [1 ]
Siatkowski, Idzi [1 ]
Okoniewski, Michal [2 ]
机构
[1] Poznan Univ Life Sci, Dept Math & Stat Methods, Wojska Polskiego 28, PL-60637 Poznan, Poland
[2] Swiss Fed Inst Technol, Sci IT Serv, Weinbergstr 11, CH-8092 Zurich, Switzerland
关键词
REPRODUCIBILITY;
D O I
10.1038/s41598-023-41085-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The main scope of the study is ambiguous genes, i.e. genes whose expression is difficult to estimate from the data produced by next-generation sequencing technologies. We focused on the RNA sequencing (RNA-Seq) type of experiment performed on the Illumina platform. It is crucial to identify such genes and understand the cause of their difficulty, as these genes may be involved in some diseases. By giving misleading results, they could contribute to a misunderstanding of the cause of certain diseases, which could lead to inappropriate treatment. We thought that the ambiguous genes would be difficult to map because of their complex structure. So we looked at RNA-seq analysis using different mappers to find genes that would have different measurements from the aligners. We were able to identify such genes using a generalized linear model with two factors: mappers and groups introduced by the experiment. A large proportion of ambiguous genes are pseudogenes. High sequence similarity of pseudogenes to functional genes may indicate problems in alignment procedures. In addition, predictive analysis verified the performance of difficult genes in classification. The effectiveness of classifying samples into specific groups was compared, including the expression of difficult and not difficult genes as covariates. In almost all cases considered, ambiguous genes have less predictive power.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Clinker: visualizing fusion genes detected in RNA-seq data
    Schmidt, Breon M.
    Davidson, Nadia M.
    Hawkins, Anthony D. K.
    Bartolo, Ray
    Majewski, Ian J.
    Ekert, Paul G.
    Oshlack, Alicia
    GIGASCIENCE, 2018, 7 (07):
  • [22] Automated identification of reference genes based on RNA-seq data
    Rosario Carmona
    Macarena Arroyo
    María José Jiménez-Quesada
    Pedro Seoane
    Adoración Zafra
    Rafael Larrosa
    Juan de Dios Alché
    M. Gonzalo Claros
    BioMedical Engineering OnLine, 16
  • [23] Key Genes in Stomach Adenocarcinoma Identified via Network Analysis of RNA-Seq Data
    Shen, Li
    Zhao, Lizhi
    Tang, Jiquan
    Wang, Zhiwei
    Bai, Weisong
    Zhang, Feng
    Wang, Shouli
    Li, Weihua
    PATHOLOGY & ONCOLOGY RESEARCH, 2017, 23 (04) : 745 - 752
  • [24] Critical genes of hepatocellular carcinoma revealed by network and module analysis of RNA-seq data
    Yang, M. -R.
    Zhang, Y.
    Wu, X. -X.
    Chen, W.
    EUROPEAN REVIEW FOR MEDICAL AND PHARMACOLOGICAL SCIENCES, 2016, 20 (20) : 4248 - 4256
  • [25] Statistical Issues in the Analysis of ChIP-Seq and RNA-Seq Data
    Ghosh, Debashis
    Qin, Zhaohui S.
    GENES, 2010, 1 (02) : 317 - 334
  • [26] Oqtans: a multifunctional workbench for RNA-seq data analysis
    Sreedharan, Vipin T.
    Schultheiss, Sebastian J.
    Jean, Geraldine
    Kahles, Andre
    Bohnert, Regina
    Drewe, Philipp
    Mudrakarta, Pramod
    Goernitz, Nico
    Zeller, Georg
    Raetsch, Gunnar
    BMC BIOINFORMATICS, 2014, 15
  • [27] Improving the Flexibility of RNA-Seq Data Analysis Pipelines
    Phan, John H.
    Wu, Po-Yen
    Wang, May D.
    2012 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS), 2012, : 70 - 73
  • [28] Differential expression analysis for paired RNA-seq data
    Chung, Lisa M.
    Ferguson, John P.
    Zheng, Wei
    Qian, Feng
    Bruno, Vincent
    Montgomery, Ruth R.
    Zhao, Hongyu
    BMC BIOINFORMATICS, 2013, 14 : 110
  • [29] Computational analysis of alternative polyadenylation from standard RNA-seq and single-cell RNA-seq data
    Gao, Yipeng
    Li, Wei
    MRNA 3' END PROCESSING AND METABOLISM, 2021, 655 : 225 - 243
  • [30] Multivariate approach to the analysis of correlated RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1783 - 1786