Finding Protein-Coding Genes through Human Polymorphisms

被引:4
|
作者
Wijaya, Edward [1 ,2 ]
Frith, Martin C. [2 ]
Horton, Paul [2 ]
Asai, Kiyoshi [1 ,2 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Kashiwa, Chiba, Japan
[2] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo, Japan
来源
PLOS ONE | 2013年 / 8卷 / 01期
关键词
DATABASE; PREDICTION; SEQUENCE; TRANSCRIPTS; PROJECT;
D O I
10.1371/journal.pone.0054210
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Distinguishing protein-coding and noncoding genes in the human genome
    Clamp, Michele
    Fry, Ben
    Kamal, Mike
    Xie, Xiaohui
    Cuff, James
    Lin, Michael F.
    Kellis, Manolis
    Lindblad-Toh, Kerstin
    Lander, Eric S.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) : 19428 - 19433
  • [2] De Novo Origin of Human Protein-Coding Genes
    Wu, Dong-Dong
    Irwin, David M.
    Zhang, Ya-Ping
    [J]. PLOS GENETICS, 2011, 7 (11)
  • [3] Natural selection on protein-coding genes in the human genome
    Carlos D. Bustamante
    Adi Fledel-Alon
    Scott Williamson
    Rasmus Nielsen
    Melissa Todd Hubisz
    Stephen Glanowski
    David M. Tanenbaum
    Thomas J. White
    John J. Sninsky
    Ryan D. Hernandez
    Daniel Civello
    Mark D. Adams
    Michele Cargill
    Andrew G. Clark
    [J]. Nature, 2005, 437 : 1153 - 1157
  • [4] Natural selection on protein-coding genes in the human genome
    Bustamante, CD
    Fledel-Alon, A
    Williamson, S
    Nielsen, R
    Hubisz, MT
    Glanowski, S
    Tanenbaum, DM
    White, TJ
    Sninsky, JJ
    Hernandez, RD
    Civello, D
    Adams, MD
    Cargill, M
    Clark, AG
    [J]. NATURE, 2005, 437 (7062) : 1153 - 1157
  • [5] An atlas of the protein-coding genes in the human, pig, and mouse brain
    Sjostedt, Evelina
    Zhong, Wen
    Fagerberg, Linn
    Karlsson, Max
    Mitsios, Nicholas
    Adori, Csaba
    Oksvold, Per
    Edfors, Fredrik
    Limiszewska, Agnieszka
    Hikmet, Feria
    Huang, Jinrong
    Du, Yutao
    Lin, Lin
    Dong, Zhanying
    Yang, Ling
    Liu, Xin
    Jiang, Hui
    Xu, Xun
    Wang, Jian
    Yang, Huanming
    Bolund, Lars
    Mardinoglu, Adil
    Zhang, Cheng
    von Feilitzen, Kalle
    Lindskog, Cecilia
    Ponten, Fredrik
    Luo, Yonglun
    Hokfelt, Tomas
    Uhlen, Mathias
    Mulder, Jan
    [J]. SCIENCE, 2020, 367 (6482) : 1090 - +
  • [6] Human protein-coding genes and gene feature statistics in 2019
    Piovesan, Allison
    Antonaros, Francesca
    Vitale, Lorenza
    Strippoli, Pierluigi
    Pelleri, Maria Chiara
    Caracausi, Maria
    [J]. BMC RESEARCH NOTES, 2019, 12 (1)
  • [7] Recent de novo origin of human protein-coding genes
    Knowles, David G.
    McLysaght, Aoife
    [J]. GENOME RESEARCH, 2009, 19 (10) : 1752 - 1759
  • [8] Human protein-coding genes and gene feature statistics in 2019
    Allison Piovesan
    Francesca Antonaros
    Lorenza Vitale
    Pierluigi Strippoli
    Maria Chiara Pelleri
    Maria Caracausi
    [J]. BMC Research Notes, 12
  • [9] Protein-coding repeat polymorphisms strongly shape diverse human phenotypes
    Mukamel, Ronen E.
    Handsaker, Robert E.
    Sherman, Maxwell A.
    Barton, Alison R.
    Zheng, Yiming
    McCarroll, Steven A.
    Loh, Po-Ru
    [J]. SCIENCE, 2021, 373 (6562) : 1499 - +
  • [10] Transcription of eukaryotic protein-coding genes
    Lee, TI
    Young, RA
    [J]. ANNUAL REVIEW OF GENETICS, 2000, 34 : 77 - 137