Identification of endogenous retroviral reading frames in the human genome
- Equal contributors
1 Bioinformatics Research Center, University of Aarhus, Høegh-Guldbergs Gade 10, Bldg. 090, DK-8000 Aarhus, Denmark
2 Department of Molecular Biology, University of Aarhus, C. F. Møllers Allé, Bldg. 130, DK-8000 Aarhus, Denmark
3 Department of Medical Microbiology and Immunology, University of Aarhus, DK-8000 Aarhus, Denmark
Retrovirology 2004, 1:32 doi:10.1186/1742-4690-1-32Published: 11 October 2004
Human endogenous retroviruses (HERVs) comprise a large class of repetitive retroelements. Most HERVs are ancient and invaded our genome at least 25 million years ago, except for the evolutionary young HERV-K group. The far majority of the encoded genes are degenerate due to mutational decay and only a few non-HERV-K loci are known to retain intact reading frames. Additional intact HERV genes may exist, since retroviral reading frames have not been systematically annotated on a genome-wide scale.
By clustering of hits from multiple BLAST searches using known retroviral sequences we have mapped 1.1% of the human genome as retrovirus related. The coding potential of all identified HERV regions were analyzed by annotating viral open reading frames (vORFs) and we report 7836 loci as verified by protein homology criteria. Among 59 intact or almost-intact viral polyproteins scattered around the human genome we have found 29 envelope genes including two novel gammaretroviral types. One encodes a protein similar to a recently discovered zebrafish retrovirus (ZFERV) while another shows partial, C-terminal, homology to Syncytin (HERV-W/FRD).
This compilation of HERV sequences and their coding potential provide a useful tool for pursuing functional analysis such as RNA expression profiling and effects of viral proteins, which may, in turn, reveal a role for HERVs in human health and disease. All data are publicly available through a database at http://www.retrosearch.dk webcite.