Genome-wide computational identification and manual annotation of human long noncoding RNA genes

被引:304
|
作者
Jia, Hui [1 ]
Osak, Maureen [2 ]
Bogu, Gireesh K. [3 ]
Stanton, Lawrence W. [3 ]
Johnson, Rory [3 ]
Lipovich, Leonard [1 ]
机构
[1] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI 48202 USA
[2] Hillsdale Coll, Lee & Roland Witte Nat Sci Div, Hillsdale, MI 49242 USA
[3] Genome Inst Singapore, Stem Cell & Dev Biol Grp, Singapore 138672, Singapore
关键词
lncRNA; noncoding RNA; transcriptome; hypothetical protein; CPC; ORF-Predictor; DATABASE; EXPRESSION; REVEALS; LOCI;
D O I
10.1261/rna.1951310
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from; 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
引用
收藏
页码:1478 / 1487
页数:10
相关论文
共 50 条
  • [1] Genome-Wide Identification of Long Intergenic Noncoding RNA Genes and Their Potential Association with Domestication in Pigs
    Zhou, Zhong-Yin
    Li, Ai-Min
    Adeola, Adeniyi C.
    Liu, Yan-Hu
    Irwin, David M.
    Xie, Hai-Bing
    Zhang, Ya-Ping
    GENOME BIOLOGY AND EVOLUTION, 2014, 6 (06): : 1387 - 1392
  • [2] Genome-wide computational analysis of potential long noncoding RNA mediated DNA:DNA:RNA triplexes in the human genome
    Saakshi Jalali
    Amrita Singh
    Souvik Maiti
    Vinod Scaria
    Journal of Translational Medicine, 15
  • [3] Genome-wide computational analysis of potential long noncoding RNA mediated DNA: DNA:RNA triplexes in the human genome
    Jalali, Saakshi
    Singh, Amrita
    Maiti, Souvik
    Scaria, Vinod
    JOURNAL OF TRANSLATIONAL MEDICINE, 2017, 15
  • [4] Genome-Wide Identification of Long Noncoding RNAs in Human Intervertebral Disc Degeneration by RNA Sequencing
    Zhao, Bo
    Lu, Minjuan
    Wang, Dong
    Li, Haopeng
    He, Xijing
    BIOMED RESEARCH INTERNATIONAL, 2016, 2016
  • [5] Genome-wide identification and functional annotation of Plasmodium falciparum long noncoding RNAs from RNA-seq data
    Qi Liao
    Jia Shen
    Jianfa Liu
    Xi Sun
    Guoguang Zhao
    Yanzi Chang
    Leiting Xu
    Xuerong Li
    Ya Zhao
    Huanqin Zheng
    Yi Zhao
    Zhongdao Wu
    Parasitology Research, 2014, 113 : 1269 - 1281
  • [6] Genome-wide identification and functional annotation of Plasmodium falciparum long noncoding RNAs from RNA-seq data
    Liao, Qi
    Shen, Jia
    Liu, Jianfa
    Sun, Xi
    Zhao, Guoguang
    Chang, Yanzi
    Xu, Leiting
    Li, Xuerong
    Zhao, Ya
    Zheng, Huanqin
    Zhao, Yi
    Wu, Zhongdao
    PARASITOLOGY RESEARCH, 2014, 113 (04) : 1269 - 1281
  • [7] Genome-Wide Identification of Long Noncoding RNA and Their Potential Interactors in ISWI Mutants
    Zhang, Ludan
    Zhang, Shuai
    Wang, Ruixue
    Sun, Lin
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (11)
  • [8] Genome-wide analysis of long noncoding RNA stability
    Clark, Michael B.
    Johnston, Rebecca L.
    Inostroza-Ponta, Mario
    Fox, Archa H.
    Fortini, Ellen
    Moscato, Pablo
    Dinger, Marcel E.
    Mattick, John S.
    GENOME RESEARCH, 2012, 22 (05) : 885 - 898
  • [9] Long noncoding RNA study: Genome-wide approaches
    Tao, Shuang
    Hou, Yarui
    Diao, Liting
    Hu, Yanxia
    Xu, Wanyi
    Xie, Shujuan
    Xiao, Zhendong
    GENES & DISEASES, 2023, 10 (06) : 2491 - 2510
  • [10] Genome-wide identification of long noncoding RNA genes and their potential association with mammary gland development in water buffalo
    Jin, Yuhan
    Ouyang, Yina
    Fan, Xinyang
    Huang, Jing
    Guo, Wenbo
    Miao, Yongwang
    ANIMAL BIOSCIENCE, 2022, 35 (11) : 1656 - 1665