The importance of data transformation in RNA-Seq preprocessing for bladder cancer subtyping

被引:0
|
作者
Acedo-Terrades, Ariadna [1 ]
Perera-Bel, Julia [1 ]
Nonell, Lara [2 ]
机构
[1] Hosp del Mar Res Inst HMRI, Barcelona, Spain
[2] Vall dHebron Inst Oncol, Bioinformat Unit, Barcelona, Spain
关键词
Molecular subtypes; RNA sequencing; Preprocessing; Bladder cancer; MOLECULAR TAXONOMY;
D O I
10.1186/s13104-025-07138-x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
ObjectiveRNA-Seq provides an accurate quantification of gene expression levels and it is widely used for molecular subtype classification in cancer, with special importance in prognosis. However, the reliability and validity of these analyses can significantly be influenced by how data are processed. In this study we evaluate how RNA-Seq preprocessing methods influence molecular subtype classification in bladder cancer. By benchmarking various aligners, quantifiers and methods of normalization and transformation, we stress the importance of preprocessing choices for accurate and consistent subtype classification.ResultsOur findings highlight that log-transformation plays a crucial role in centroid-based classifiers such as consensusMIBC and TCGAclas, while distribution-free algorithms like LundTax offer robustness to preprocessing variations. Non log-transformed data resulted in low classification rates and poor agreement with reference classifications in consensusMIBC and TCGAclas classifiers. Additionally, LundTax consistently demonstrated better separation among subtypes, compared to consensusMIBC and TCGAclas, regardless of preprocessing methods. Nonetheless, the study is limited by the lack of a true reference for objective assessment of the accuracy of the assigned subtypes. Hence, future work will be necessary to determine the robustness and scalability of the obtained results.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Transcript quantification with RNA-Seq data
    Bohnert, Regina
    Behr, Jonas
    Raetsch, Gunnar
    BMC BIOINFORMATICS, 2009, 10 : P5
  • [22] Statistical Modeling of RNA-Seq Data
    Salzman, Julia
    Jiang, Hui
    Wong, Wing Hung
    STATISTICAL SCIENCE, 2011, 26 (01) : 62 - 83
  • [23] Analysis of clustered RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 19 - 31
  • [24] Transcript quantification with RNA-Seq data
    Regina Bohnert
    Jonas Behr
    Gunnar Rätsch
    BMC Bioinformatics, 10
  • [25] RNA-Seq Data: A Complexity Journey
    Capobianco, Enrico
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2014, 11 (19): : 123 - 130
  • [26] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Chunxiang Wang
    Xin Gao
    Juntao Liu
    BMC Bioinformatics, 21
  • [27] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Wang, Chunxiang
    Gao, Xin
    Liu, Juntao
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [28] An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data
    Sun, Xifang
    Sun, Shiquan
    Yang, Sheng
    CELLS, 2019, 8 (10)
  • [29] Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data
    Zhang, Jiaming
    Hou, Weibo
    Zhao, Qi
    Xiao, Songling
    Linghu, Hongye
    Zhang, Lixin
    Du, Jiawei
    Cui, Hongdi
    Yang, Xu
    Ling, Shukuan
    Su, Jianzhong
    Kong, Qingran
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (09)
  • [30] SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data
    Peng, Tao
    Zhu, Qin
    Yin, Penghang
    Tan, Kai
    GENOME BIOLOGY, 2019, 20 (1)