Genome-wide computational identification and manual annotation of human long noncoding RNA genes

被引:304
|
作者
Jia, Hui [1 ]
Osak, Maureen [2 ]
Bogu, Gireesh K. [3 ]
Stanton, Lawrence W. [3 ]
Johnson, Rory [3 ]
Lipovich, Leonard [1 ]
机构
[1] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48202 USA
[2] Hillsdale Coll, Lee & Roland Witte Nat Sci Div, Hillsdale, MI 49242 USA
[3] Genome Inst Singapore, Stem Cell & Dev Biol Grp, Singapore 138672, Singapore
关键词
lncRNA; noncoding RNA; transcriptome; hypothetical protein; CPC; ORF-Predictor; DATABASE; EXPRESSION; REVEALS; LOCI;
D O I
10.1261/rna.1951310
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from; 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
引用
收藏
页码:1478 / 1487
页数:10
相关论文
共 50 条
  • [21] Genome-wide analyses of long noncoding RNA expression profiles in lung adenocarcinoma
    Peng, Zhenzi
    Wang, Jun
    Shan, Bin
    Yuan, Fulai
    Li, Bin
    Dong, Yeping
    Peng, Wei
    Shi, Wenwen
    Cheng, Yuanda
    Gao, Yang
    Zhang, Chunfang
    Duan, Chaojun
    SCIENTIFIC REPORTS, 2017, 7
  • [22] Genome-Wide Analysis of Long Noncoding RNA (lncRNA) Expression in Hepatoblastoma Tissues
    Dong, Rui
    Jia, Deshui
    Xue, Ping
    Cui, Ximao
    Li, Kai
    Zheng, Shan
    He, Xianghuo
    Dong, Kuiran
    PLOS ONE, 2014, 9 (01):
  • [23] Genome-wide analyses of long noncoding RNA expression profiles in lung adenocarcinoma
    Zhenzi Peng
    Jun Wang
    Bin Shan
    Fulai Yuan
    Bin Li
    Yeping Dong
    Wei Peng
    Wenwen Shi
    Yuanda Cheng
    Yang Gao
    Chunfang Zhang
    Chaojun Duan
    Scientific Reports, 7
  • [24] Genome-Wide Analysis of Human SNPs at Long Intergenic Noncoding RNAs
    Chen, Geng
    Qiu, Chengxiang
    Zhang, Qipeng
    Liu, Bing
    Cui, Qinghua
    HUMAN MUTATION, 2013, 34 (02) : 338 - 344
  • [25] Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review
    Ponting, Chris P.
    Haerty, Wilfried
    ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2022, 23 : 153 - 172
  • [26] Genome-wide computational identification of functional RNA elements in Trypanosoma brucei
    Mao, Yuan
    Najafabadi, Hamed Shateri
    Salavati, Reza
    BMC GENOMICS, 2009, 10
  • [27] Genome-wide computational identification of functional RNA elements in Trypanosoma brucei
    Yuan Mao
    Hamed Shateri Najafabadi
    Reza Salavati
    BMC Genomics, 10
  • [28] Genome-wide identification of Arabidopsis long noncoding RNAs in response to the blue light
    Zhenfei Sun
    Kai Huang
    Zujing Han
    Pan Wang
    Yuda Fang
    Scientific Reports, 10
  • [29] Genome regulation by long noncoding RNA genes
    Chang, H. Y.
    CANCER RESEARCH, 2020, 80 (04)
  • [30] Genome-wide identification of Arabidopsis long noncoding RNAs in response to the blue light
    Sun, Zhenfei
    Huang, Kai
    Han, Zujing
    Wang, Pan
    Fang, Yuda
    SCIENTIFIC REPORTS, 2020, 10 (01)