The importance of data transformation in RNA-Seq preprocessing for bladder cancer subtyping

被引：0

作者：

Acedo-Terrades, Ariadna ^{[1
]}

Perera-Bel, Julia ^{[1
]}

Nonell, Lara ^{[2
]}

机构：

[1] Hosp del Mar Res Inst HMRI, Barcelona, Spain

[2] Vall dHebron Inst Oncol, Bioinformat Unit, Barcelona, Spain

来源：

BMC RESEARCH NOTES | 2025年 / 18卷 / 01期

关键词：

Molecular subtypes; RNA sequencing; Preprocessing; Bladder cancer; MOLECULAR TAXONOMY;

D O I：

10.1186/s13104-025-07138-x

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

ObjectiveRNA-Seq provides an accurate quantification of gene expression levels and it is widely used for molecular subtype classification in cancer, with special importance in prognosis. However, the reliability and validity of these analyses can significantly be influenced by how data are processed. In this study we evaluate how RNA-Seq preprocessing methods influence molecular subtype classification in bladder cancer. By benchmarking various aligners, quantifiers and methods of normalization and transformation, we stress the importance of preprocessing choices for accurate and consistent subtype classification.ResultsOur findings highlight that log-transformation plays a crucial role in centroid-based classifiers such as consensusMIBC and TCGAclas, while distribution-free algorithms like LundTax offer robustness to preprocessing variations. Non log-transformed data resulted in low classification rates and poor agreement with reference classifications in consensusMIBC and TCGAclas classifiers. Additionally, LundTax consistently demonstrated better separation among subtypes, compared to consensusMIBC and TCGAclas, regardless of preprocessing methods. Nonetheless, the study is limited by the lack of a true reference for objective assessment of the accuracy of the assigned subtypes. Hence, future work will be necessary to determine the robustness and scalability of the obtained results.

引用

页数：8

共 50 条

[21] Transcript quantification with RNA-Seq data
Bohnert, Regina
Behr, Jonas
Raetsch, Gunnar
BMC BIOINFORMATICS, 2009, 10 : P5
[22] Statistical Modeling of RNA-Seq Data
Salzman, Julia
Jiang, Hui
Wong, Wing Hung
STATISTICAL SCIENCE, 2011, 26 (01) : 62 - 83
[23] Analysis of clustered RNA-seq data
Park, Hyunjin
Lee, Seungyeoun
Kim, Ye Jin
Choi, Myung-Sook
Park, Taesung
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 19 - 31
[24] Transcript quantification with RNA-Seq data
Regina Bohnert
Jonas Behr
Gunnar Rätsch
BMC Bioinformatics, 10
[25] RNA-Seq Data: A Complexity Journey
Capobianco, Enrico
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2014, 11 (19): : 123 - 130
[26] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
Chunxiang Wang
Xin Gao
Juntao Liu
BMC Bioinformatics, 21
[27] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
Wang, Chunxiang
Gao, Xin
Liu, Juntao
BMC BIOINFORMATICS, 2020, 21 (01)
[28] An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data
Sun, Xifang
Sun, Shiquan
Yang, Sheng
CELLS, 2019, 8 (10)
[29] Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data
Zhang, Jiaming
Hou, Weibo
Zhao, Qi
Xiao, Songling
Linghu, Hongye
Zhang, Lixin
Du, Jiawei
Cui, Hongdi
Yang, Xu
Ling, Shukuan
Su, Jianzhong
Kong, Qingran
JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (09)
[30] SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data
Peng, Tao
Zhu, Qin
Yin, Penghang
Tan, Kai
GENOME BIOLOGY, 2019, 20 (1)

← 1 2 3 4 5 →