Open Access Highly Accessed Open Badges Research

The role of recombination in the emergence of a complex and dynamic HIV epidemic

Ming Zhang12*, Brian Foley1, Anne-Kathrin Schultz3, Jennifer P Macke1, Ingo Bulla3, Mario Stanke3, Burkhard Morgenstern3, Bette Korber14 and Thomas Leitner1*

Author Affiliations

1 Theoretical Biology & Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA

2 Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA

3 Institut für Mikrobiologie und Genetik, Abteilung Bioinformatik, Goldschmidtstraße 1, 37077 Göttingen, Germany

4 The Santa Fe Institute, Santa Fe, NM 87501, USA

For all author emails, please log on.

Retrovirology 2010, 7:25  doi:10.1186/1742-4690-7-25

Published: 23 March 2010



Inter-subtype recombinants dominate the HIV epidemics in three geographical regions. To better understand the role of HIV recombinants in shaping the current HIV epidemic, we here present the results of a large-scale subtyping analysis of 9435 HIV-1 sequences that involve subtypes A, B, C, G, F and the epidemiologically important recombinants derived from three continents.


The circulating recombinant form CRF02_AG, common in West Central Africa, appears to result from recombination events that occurred early in the divergence between subtypes A and G, followed by additional recent recombination events that contribute to the breakpoint pattern defining the current recombinant lineage. This finding also corrects a recent claim that G is a recombinant and a descendant of CRF02, which was suggested to be a pure subtype. The BC and BF recombinants in China and South America, respectively, are derived from recent recombination between contemporary parental lineages. Shared breakpoints in South America BF recombinants indicate that the HIV-1 epidemics in Argentina and Brazil are not independent. Therefore, the contemporary HIV-1 epidemic has recombinant lineages of both ancient and more recent origins.


Taken together, we show that these recombinant lineages, which are highly prevalent in the current HIV epidemic, are a mixture of ancient and recent recombination. The HIV pandemic is moving towards having increasing complexity and higher prevalence of recombinant forms, sometimes existing as "families" of related forms. We find that the classification of some CRF designations need to be revised as a consequence of (1) an estimated > 5% error in the original subtype assignments deposited in the Los Alamos sequence database; (2) an increasing number of CRFs are defined while they do not readily fit into groupings for molecular epidemiology and vaccine design; and (3) a dynamic HIV epidemic context.