Genome-wide computational identification and manual annotation of human long noncoding RNA genes

被引:304
|
作者
Jia, Hui [1 ]
Osak, Maureen [2 ]
Bogu, Gireesh K. [3 ]
Stanton, Lawrence W. [3 ]
Johnson, Rory [3 ]
Lipovich, Leonard [1 ]
机构
[1] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48202 USA
[2] Hillsdale Coll, Lee & Roland Witte Nat Sci Div, Hillsdale, MI 49242 USA
[3] Genome Inst Singapore, Stem Cell & Dev Biol Grp, Singapore 138672, Singapore
关键词
lncRNA; noncoding RNA; transcriptome; hypothetical protein; CPC; ORF-Predictor; DATABASE; EXPRESSION; REVEALS; LOCI;
D O I
10.1261/rna.1951310
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from; 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
引用
收藏
页码:1478 / 1487
页数:10
相关论文
共 50 条
  • [31] A comprehensive genome-wide analysis of long noncoding RNA expression profile in hepatocellular carcinoma
    Cui, Hongxia
    Zhang, Yunxing
    Zhang, Qiujie
    Chen, Wenming
    Zhao, Haibo
    Liang, Jun
    CANCER MEDICINE, 2017, 6 (12): : 2932 - 2941
  • [32] Genome-wide analysis of long noncoding RNA expression profile in papillary thyroid carcinoma
    Lan, Xiabin
    Zhang, Hao
    Wang, Zhihong
    Dong, Wenwu
    Sun, Wei
    Shao, Liang
    Zhang, Ting
    Zhang, Dalin
    GENE, 2015, 569 (01) : 109 - 117
  • [33] Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding
    Lin, Jie
    Wen, Yujian
    He, Sha
    Yang, Xiaoxue
    Zhang, Hai
    Zhu, Hao
    NATURE PROTOCOLS, 2019, 14 (03) : 795 - 818
  • [34] RNA sequencing discloses the genome-wide profile of long noncoding RNAs in dilated cardiomyopathy
    Huang, Guangyong
    Liu, Jingwen
    Yang, Chuansheng
    Xiang, Youzhang
    Wang, Yuehai
    Wang, Jing
    Cao, Miaomiao
    Yang, Wenbo
    MOLECULAR MEDICINE REPORTS, 2019, 19 (04) : 2569 - 2580
  • [35] Genome-Wide Expression Analysis of Long Noncoding RNAs and Their Target Genes in Metafemale Drosophila
    Liu, Xinyu
    Yan, Ran
    Liu, Haosheng
    Zhang, Shuai
    Wang, Ruixue
    Zhang, Bowen
    Sun, Lin
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (09)
  • [36] Psychoimmunological effect of depression on cervical carcinoma: Genome-wide long noncoding RNA implications
    Dai, Wanjun
    Yu, Yinhua
    Kang, Yu
    CANCER RESEARCH, 2016, 76
  • [37] Genome-Wide Analysis of mRNA and Long Noncoding RNA Profiles in Chronic Actinic Dermatitis
    Lei, Dongyun
    Lv, Lechun
    Yang, Li
    Wu, Wenjuan
    Liu, Yong
    Tu, Ying
    Xu, Dan
    Jin, Yumei
    Nong, Xiang
    He, Li
    BIOMED RESEARCH INTERNATIONAL, 2017, 2017
  • [38] Pipelines for cross-species and genome-wide prediction of long noncoding RNA binding
    Jie Lin
    Yujian Wen
    Sha He
    Xiaoxue Yang
    Hai Zhang
    Hao Zhu
    Nature Protocols, 2019, 14 : 795 - 818
  • [39] Genome-Wide Analysis of Long Noncoding RNA Profile in Human Gastric Epithelial Cell Response to Helicobacter pylori
    Yang, Liu
    Long, Yupeng
    Li, Cong
    Cao, Liang
    Gan, Haiyan
    Huang, Kailing
    Jia, Yujie
    JAPANESE JOURNAL OF INFECTIOUS DISEASES, 2015, 68 (01) : 63 - 66
  • [40] Genome-Wide Analysis Identified a Number of Dysregulated Long Noncoding RNA (lncRNA) in Human Pancreatic Ductal Adenocarcinoma
    Hao, Sijie
    Yao, Lie
    Huang, Jiaxin
    He, Hang
    Yang, Feng
    Di, Yang
    Jin, Chen
    Fu, Deliang
    TECHNOLOGY IN CANCER RESEARCH & TREATMENT, 2018, 17 : 1 - 11