Abstract
Background
The acute phase of immunodeficiency virus infection plays a crucial role in determining steadystate virus load and subsequent progression of disease in both humans and nonhuman primates. The acute period is also the time when vaccinemediated effects on host immunity are likely to exert their major effects on virus infection. Recently we developed a MonteCarlo (MC) simulation with mathematical analysis of viral evolution during primary HIV1 infection that enables classification of new HIV1 infections originating from multiple versus single transmitted viral strains and the estimation of time elapsed following infection.
Results
A total of 322 SIV nef SIV sequences, collected during the first 3 weeks following experimental infection of two rhesus macaques with the SIVmac239 clone, were analyzed and found to display a comparable level of genetic diversity, 0.015% to 0.052%, with that of env sequences from acute HIV1 infection, 0.005% to 0.127%. We confirmed that the acute HIV1 infection model correctly identified the experimental SIV infections in rhesus macaques as "homogenous" infections, initiated by a single founder strain. The consensus sequence of the sampled strains corresponded to the transmitted sequence as the model predicted. However, measured sequential decrease in diversity at day 7, 11, and 18 post infection violated the model assumption, neutral evolution without any selection.
Conclusion
While nef gene evolution over the first 3 weeks of SIV infection originating from a single transmitted strain showed a comparable rate of sequence evolution to that observed during acute HIV1 infection, a purifying selection for the founder nef gene was observed during the early phase of experimental infection of a nonhuman primate.
Background
Genetic evolution in the primary phase of HIV1 infection has been characterized by single genome amplification and nested polymerase chain reaction (PCR) of HIV1 genes in parallel with mathematical/computational modeling [13]. Major goals of such analyses include the characterization of the transmitted strains, estimating the timing of infection based on the level of sequence diversity, and distinguishing between single virus strain/variant infections (referred to hereafter as "homogenous" infection) versus two or more virus strains/variants infections (referred to hereafter as "heterogenous" infection). Heterogeneous infection is associated with faster sequence diversification and accelerated disease progression due to the rapid emergence of virus variants with enhanced replicative fitness [47].
To quantitatively assess whether HIV1 infections were initiated by single or multiple viral strains, we recently developed a mathematical model and MonteCarlo (MC) simulation model of HIV1 evolution early in infection and applied this to the analysis of 102 individuals with acute HIV1 infection [2]. Further, in cases of single strain (homogeneous) infections, the model provided a theoretical basis for identifying early founder (possibly transmitted) env genes.
In this study, we tested the validity of our primary HIV1 infection model using a nonhuman primate (NHP) model for HIV1/AIDS. This model has played a key role in the development of candidate HIV1 vaccines, and provided critical insights into disease pathogenesis [810]. Studies in the macaque/simian immunodeficiency virus (SIV) model have contributed to our understanding of the close association between the extent of virus replication during the acute phase of infection and the subsequent virus set point and disease course [11] as reported in HIV1 infections [1214]. Genetic evolution during SIV infection has been well documented in comparison with the evolution of HIV1 population [1518].
We examined evolution of the viral nef genes from a single transmitted strain. Nef, a small accessory protein, was selected because the virus can tolerate significant variability in the nef protein, as evidenced by high levels of polymorphism longitudinally throughout infection and at the population level [1922]. We sequenced fulllength nef genes longitudinally during the very early phase of SIV infection using the method of single genome amplification (SGA). The SGA method more accurately represents HIV1 quasispecies when compared to conventional PCR amplification [1,23,24]. We showed that our sequence evolution model correctly classified the experimental SIV infections as homogeneous infections. As predicted by the model, the consensus sequence of the sampled strains from these homogeneous infections corresponded to the transmitted sequence. However, our systematic evaluation showed that a sequential decrease of the diversity within the first 3 weeks of infection was associated with a purifying selection for the transmitted sequence (and was not a consequence of the limited sample size in our analysis).
Results
Longitudinal nucleotide and amino acid mutations
We visualized longitudinal sequence evolution, nucleotide and amino acid point mutations in reference to the founder nef gene/Nef protein in Figure 1. From a total of 322 nef sequences sampled from the two animals, we observed 41 nucleotide base substitutions (excluding gaps) from the infecting nef sequence of SIVmac239, within the first 21 days following virus infection; out of these 41 mutations, 10 were determined to be GtoA hypermutation patterns with APOBEC signatures (red characters in Figure 1) [25]. However, none of these APOBEC signatures were statistically significant (p > 0.05 from a Fisher exact test, Hypermut tool http://www.hiv.lanl.gov webcite). As we predicted in our model [2], the group sequences identical to the consensus sequence indeed corresponded to the transmitted nef sequence. Limited base substitutions observed in all nef genes were sparse and did not align with each other – as we have seen in env genes sampled from HIV1 acute subjects classified as having homogeneous infection [2]. Out of 41 total mutations, 16 mutations were synonymous and the rest were nonsynonymous base substitutions.
Figure 1. Nucleotide and amino acid base substitutions within 3 weeks post SIV infection. Longitudinal nucleotide (A) and amino acid (B) base substitutions from the founder nef gene/Nef protein of sequence samples taken at day 4, 7, 11 and 18 postinfection from animal r00065, which was infected intravenously with SIVmac239. C and D display base substitutions in reference to the founder sequence from the samples taken at day 7, 14, and 21 postinfection from animal r98018, which was infected by intrarectal inoculation with SIVmac239. Numbers in the left column in each figure represent the number of a specific sequence out of total sampled sequences at a given day post infection. Each clone was obtained via the method of single genome amplification.
Figure 1 shows that all the mutant nef genes except one were not sampled again in the next time point, while the transmitted nef gene was conserved in sequential samples from both animals. A single mutation fixed in the sequence population from animal r00065, CtoT at position 520, was synonymous one. We examined whether loss of mutant sequences in the sequential samples could be reproduced in the MC simulation. We sampled 30 sequences at days 6, 12, 18, and 24 post infection in the asynchronous infection MC simulation, and then counted the number of mutant sequences that remained at more than one time point, by repeating 10^{2 }simulations. Figure 2 shows the histogram of the observed number of mutant sequences sampled in any of the sequential time points, N_{m}. The 95% confidence intervals were calculated by repeating 10^{2 }of 10^{2 }MC runs. The simulation confirmed that loss of mutant sequences is frequent. While the transmitted, founder nef gene remains as the majority of the sampled sequences throughout the early infection period, the mutant sequences are not fixed in the population due to i) only a finite number of sequences are sampled in an exponentially growing population and ii) more mutations to the mutant genes are accumulated by further reverse transcription events.
Figure 2. Histogram of the observed number of mutant sequences sampled at more than one time point, N_{m}. At day 6, 12, 18, and 24 post infection, 30 nef sequences were sampled. The observed number of mutant sequences which were present at more than one time point was counted from the total of 120 sequences sampled sequentially over 4 time points. For example, N_{m }= 0 denotes that no mutant sequence from the founder gene appeared at more than one time point. The histogram of N_{m }with 95% CIs was constructed by repeating 10^{2 }asynchronous MC infection simulations. While the founder nef gene remains as the majority of the sampled sequences, loss of mutant sequences in the serial samples was frequently observed.
Dynamics of divergence, diversity, variance, maximum HD, and sequence identity
Viral diversification in early infection can be probed with several quantities based on Hamming distances among the sampled sequences. Here Hamming distance denotes the number of bases at which any two sequences differ. We measured the kinetics of divergence, diversity, variance, maximum Hamming distance (HD), and sequence identity in the two experimentally infected macaques (Table 1). Divergence is defined as average Hamming distance per site from the transmitted nef gene. Diversity is defined as average intersequence Hamming distance per site, variance as variance of intersequence per base Hamming distance distribution, maximum HD as measured maximum Hamming distance between all sequence pairs, and sequence identity as the proportion of identical sequences to the transmitted strain.
Table 1. Animal Information and analysis using the acute HIV1 infection model.
Figure 3 displays the kinetics of these quantities compared to the viral load dynamics for animal r00065 and animal r98018. Each measurement was in the range of the prediction made by our acute HIV1 sequence evolution model, however, the dynamics of each quantity from the two serial samples was not consistent with that from the model prediction. For instance, the average HD from the founder nef gene, divergence, decreases from 0.018% to 0.0081% over a time interval of 11 days for animal r00065, which is opposite to the trend predicted by the model. Also the proportion of identical sequences to the transmitted one was serially elevated from day 7 to day 18, suggesting either a purifying selection back to the founder strain during the early stage of infection or stochastic fluctuations due to the limited sample size.
Figure 3. Viral load kinetics and the dynamics of divergence, diversity, variance, maximum HD, and sequence identity from homogeneous SIV infection. A. Viral load kinetics of animal r00065 (r65, black) and animal r98018 (r98, red). Animal r00065, which was infected by intravenous injection, displays a greater level of viral replication in comparison with animal r98018 which was infected by intrarectal inoculation. Dynamics of divergence (B), diversity (C), variance (D), maximum HD (E), and sequence identity (F) of nef sequences from animals r00065 (black) and r98018 (red). Each average value of simulated quantity from 10^{3 }simulations is represented with a brown line [2]. We sampled 31 sequences at a given time point in each run.
To address whether the acute stage sequence evolution in animal r00065 indeed shows a purifying selection back to the founder strain, we performed a MC simulation by starting with 41 nef sequences identical to those sampled at day 7 from animal r00065. Then we sampled 50 sequences at day 11 (4 days since the "starting" day 7) and 31 sequences at day 18 (11 days since the "starting" day 7) to replicate the experimental sampling from animal r00065. Figure 4 shows each measure of divergence, diversity, variance, and sequence identity with 95% confidence intervals from 1000 MC runs. The measured divergence at day 18, 0.0081%, from animal r00065 is located outside of the 95% confidence intervals of the predicted divergence at day 18, [0.00815%, 0.057%], denoting a violation of the model assumption, neutral evolution without selection. We conclude that the serial decrease in divergence observed in animal r00065 is reflective of a purifying selection rather than a stochastic effect from the finite size of sampling.
Figure 4. Predicted divergence, diversity, variance, and sequence identity from a simulation performed by starting with 41 sampled nef sequences obtained at day 7 from animal r00065. 50 sequences at day 11 and 31 sequences at day 18 were sampled by starting a simulation with the 41 sampled nef genes that were obtained at day 7 from animal r00065. The sampling time points were chosen to reflect those used in our initial simulation (i.e., day 11 corresponds to day 4 following the "initial" infection in this simulation, and day 18 corresponds to day 11 following the "initial" infection. The measured divergence at day 18, 0.0081%, from animal r00065 is located outside of the 95% confidence intervals of the predicted divergence at day 18, [0.00815%, 0.057%].
The maximum HD of r98018 at day 21 is 5 due to the presence of a strain with 3 base substitutions from the founder strain. All three of these mutations are G to A hypermutation with APOBEC3G/F signatures [2527], although the signatures were not found to be statistically significant (p > 0.05 from a Fisher exact test, Hypermut tool http://www.hiv.lanl.gov webcite). Nonetheless, we tentatively attribute the deviation from the prediction generated by our model to these putative APOBEC3G/F signatures. The rate of virus sequence evolution in animal r00065 was slower than in animal r98018 – even though the virus replication rate (virus load) in animal r00065 was higher than that for animal r98018.
Single Variant (Homogeneous) Infection with Neutral Evolution
Our MC simulation and mathematical calculation is based on the premise that the SIV sequence population diversifies through random base substitutions without any selection or recombination during the first 2–3 weeks of infection, prior to initiation of the host nefspecific immune response that could select viral escape variant. Based on this assumption, the Hamming distance distribution can be approximated as a Poission distribution which is characterized as mean (diversity) equals variance [2,28]. The equality will not be exact due to stochastic effects and sample size dependency. However, we can use the simulation output to capture these effects, and construct a conical region delimited by 95% CIs over mean and variance within which values from a sample from homogeneous infection should lie (Figure 5). If we sample more sequences, the area of the cone decreases. The two conditions for the single variant homogeneous infection without any selection or recombination are: i) measured diversity and variance of the sequence sample should be located inside the cone, between the upper and lower limits of the 95% CIs, and ii) diversity should be less than the upper limit of the 95% CIs of simulated diversity at a given time point (grey lines in Figure 5). Here the cone diagram in Figure 5 was constructed by measuring diversity and variance for 20 (red) or 60 (blue) nef genes at each time point of each MC run. We performed 5000 MC runs. All the homogeneous 7 sequence samples from the two animals satisfy the above two conditions, as Figure 5 depicts. Our model successfully classified the virus sequence pattern in the two animals as being derived from a "homogeneous" infection as opposed to a "heterogeneous" infection with two or more strains.
Figure 5. Classification diagram for homogeneous infection. The diversity and the variance of the sampled sequences from animals with homogeneous infection (i.e. infections with a single founder strain without any selection pressure or recombination) are expected to be located within the conical region. Here, the red (blue) conical region represents the 95% CIs from 5 × 10^{3 }runs where 20 (60) sequences were sampled at each time point. The black diagonal line denotes the average relationship between diversity and variance. The grey vertical line denotes the upper limit of the 95% CIs of simulated diversity at each time point. All of the sequence sets sampled from the two primates within 3 weeks since infection were successfully classified as homogeneous infections; measured diversity and variance are located within the red and blue conical regions and the diversity is less than the upper limit of the 95% CIs of diversity at week 1 from the homogeneous infection simulations.
Estimating Days since Infection: Poisson Fit
For each sequence data set, which was sampled from each animal at a time point following infection, we constructed the distribution of Hamming distances from the founder strain, HD_{0 }(Figure 6). The distribution of Hamming distances from the founder strain, HD_{0}, was calculated as a weighted sum of Binomial distributions in the asynchronous infection mathematical model. The weighted sum of Binomial was approximated as a Poisson distribution,
Figure 6. Estimation of days since infection based on Hamming distance distribution. The Hamming distance (HD_{0}) distribution (multiplied by the number of sampled sequences) from the founder nef strain, SIVmac239, is shown for each sequence sample from each animal (black boxes) with the best fitting Poisson distribution (red lines). The goodnessoffit p value of each fit is listed in Table 1. The bottom right corner panel shows a comparison between actual days post infection and the estimated days since infection based on HD_{0 }distribution for animals r00065 (black) and r00098 (blue). The correlation coefficient between the actual and estimated dates postinfection for r00065 is 0.91 and for r98018 is 0.47.
with the mean of
where . Here t is days post infection, ε is the HIV1 single replication cycle error rate per base, N_{B }is the number of bases of sampled genes, and R_{0 }is the basic reproductive ratio.
We used a Maximum Likelihood method to fit a Poisson distribution to the observed data, and then assessed the goodness of fit through a ChiSquare statistic. Table 1 summarizes the estimated days since infection obtained from the Poisson fit using the relationship between mean of Poisson distribution, λ_{0 }and days post infection, t in Eq. (2), along with 95% CIs obtained by bootstrapping the HD_{0 }distribution 10^{5 }times. All of the 7 samples yielded a goodnessoffit pvalue of greater than 0.5, suggesting that measured HD_{0 }statistically follows a Poisson distribution. In this goodness of fit test the null hypothesis was that the two distributions tested were statistically the same, hence a low pvalue would yield rejection of the null hypothesis. Analysis of all the sequence samples showed that the actual number of days elapsed following infection for the sequence samples fell within the 95% CIs of estimated days post infection by a Poisson fit to the HD_{0 }distribution (Table 1). However, as we expected from the observed decrease in divergence and the increase in sequence identity as infection progresses, the correlation coefficient between actual days since infection and the estimated days post infection (based on the Poisson fit for animal r00065) was 0.91. The correlation coefficient for animal r98018 was 0.47.
Discussion
The present study was undertaken to explore the applicability of a recently developed model for primary HIV1 infection, to the analysis of acute SIV infection in rhesus macaques [2]. The level of measured diversity ranged from 0.015% to 0.052% during primary SIV infection, before set point, which is comparable to the range of measured diversity, 0.005% to 0.127%, from 68 single strain infected patients at the primary stage of HIV1 infection [2]. Analysis of the SIV nef sequences showed that the MC simulation model was able to successfully classify 7 sequence samples, from two animals during the first 3 weeks following experimental infection of two rhesus macaques with SIVmac239, as homogeneous infection. We also confirmed that the consensus virus sequence in these animals was identical to the transmitted nef sequence of the infecting SIVmac239.
We observed an unexpected decline in the divergence and the diversity from animal r00065 at an early point following infection. We first hypothesized that the serial decline in the divergence might be due to fluctuations arising from the limited sample size, 31–50 sequences per time point. To address this concern, we performed a second simulation, starting with the actually sampled 41 nef genes obtained at day 7 from animal r00065 (which showed the divergence of 0.018%). The MC simulation was performed with the assumption of neutral evolution, and 31 sequences were sampled at day 18. The measured 95% CIs of the divergence from such 1000 simulations provided the basis for the rejection of the null hypothesis (neutral evolution without selection), implying a preferential selection process for the founder strain. We conclude that the decrease in the divergence observed in animal r00065 is reflective of a purifying selection rather than a stochastic effect due to small sample size. We speculate that the purifying selection can be explained as a result of either: (i) lower fitness of the emerging mutant viruses relative to the founder virus, or (ii) selective loss of mutant sequences due to linked, unfavorable changes elsewhere in the genome (i.e., the phenomenon of hitchhiking [29,30]). The roles of Nef in viral fitness, such as promoting viral replication and infectivity and interfering T cell activation, have been well documented [3133].
The time points in our study were chosen to precede the emergence of cytotoxic T cell lymphocyte (CTL) escape variants. As we expected, Figure 1 shows that all the mutants from the inoculated SIVmac239 nef gene are different each other, at the predicted amino acid level. This is not consistent with the expected outcome of CTL pressure, which classically results in changes confined within one or at most a handful of immunodominant epitopes. The main expected impact of CTLinduced changes on the model can be linked with a deviation from a starlike phylogeny [34], the absence of outgrowth in a particular mutant lineage. We have presented an examination of the property of star phylogeny in Figure 7 where all the 7 samples from two macaques satisfy the expected relationship for starlike phylogeny, diversity = 2 × divergence. The relationship arises from the property that the intersequence hamming distance frequency distribution coincides with the selfconvolution of the frequency distribution of the hamming distances from the founder virus. The property of starlike phylogeny was preserved in all the samples from animal r00065 which displayed a sequential decrease in the divergence and the diversity (i.e., a purifying selection). Under the purifying selection preferential for the founder strain, a starlike phylogeny can be retained since there is no outgrowth in a particular mutant lineage except the center of the star, the founder virus.
Figure 7. Examination of starlike phylogeny. The starphylogeny can be examined by testing whether the level of diversity is two times of the level of divergence, which occurs when there is neutral selection in the absence of selective pressure for specific mutant strains. All of the 7 samples from animals r00065 and r98018 satisfy the relationship, diversity = 2 × divergence (blue line).
We observed that rapid viral replication kinetics were not necessarily associated with a greater rate of sequence evolution. Animal r00065 displayed a greater level of viral replication in comparison to animal r98018 while less diversification of nef genes was observed in animal r00065. We interrogated the relationship between HIV1 sequence diversity and viral load from 28 subjects with homogeneous HIV1 infection in Fiebig stage II, where viral RNA and p24 antigens are positive without detectable HIV1 serum antibodies [2]. We observed little correlation between plasma viral load and diversity (σ^{2 }= 0.18) in HIV1 acute infection.
Disconnect between the replication rate and the rate of evolution during early SIV and HIV infections may be partly explained by the unusual small effective population size, which has been estimated ranging from 10^{3 }to 10^{4 }[3538]. The effective population size is defined from the process of transforming an actual, census population into a neutral, constant size population with nonoverlapping generations. The difference between the effective population size and the real size can arise from many factors such as varying population size, purifying or diversifying selection and the existence of subpopulation. These factors should be associated with low level of correlation between viral load and the level of diversity in acute HIV1 and SIV infections.
Another aspect we may consider is that low level of correlation might be explained within our model scheme where the reproductive ratio and the generation time are set as independent parameters. Viral sequence diversity is influenced more strongly by generation time and to much lesser extent by the reproductive ratio. Hence for a given viral generation time, if the reproductive ratio changes significantly, the rampup slope of infected cell varies accordingly while the rate of sequence diversification remains relatively stable, implying little correlation between the rate of evolution and the rate of replication. For instance, our calculation from the asynchronous infection model study shows that when we change the basic reproductive ratio from 6 to 12, the rampup slope of infected cells increases 45% but the slope of diversity increases only 6%. With the assumption that the basic reproductive ratio varies considerably among acute HIV1 subjects, for example by the level of activated CD4 T cell at the transmission, we may observe a great level of variation in the viral load but less in the sequence diversity. Under this circumstance, a minor correlation can be detected at the population level with another factor for dampening the correlation, fluctuations arising from the limited sample size of genes.
An important caveat to the work reported here is that a limited number of clones were examined at specific time points in only 2 SIV infected animals. SGA sequencing is resourceintensive, precluding the use of more animals and time points in this study. In the future, nextgeneration pyrosequencing technologies [39] may facilitate the examination of far greater numbers of SIV sequences with economy that is impossible to achieve with Sangerbased sequencing. We expect that the acute infection model will be refined and improved as additional sequences become available.
Conclusion
This study verifies the robust nature of our MC simulation model for primary HIV1 infection, and shows that it can be successfully applied to the analysis of acute SIV infection in rhesus macaques. The model predicted the level of SIV sequence diversification during the acute phase of SIVmac239 infection in two rhesus macaques, and it correctly identified "homogenous" virus transmission in this model system. SIV acute sequence samples confirmed that the consensus sequence of each sample was indeed the transmitted strain. Finally, a sequential decrease in viral diversity was observed during the first 3 weeks of infection in one macaque, and was found to be due to a purifying selection for the transmitted sequence.
Methods
Animals and SIVmac239 challenge
Two rhesus macaques were experimentally infected with the clonal SIV isolate SIVmac239, derived from a molecular clone [40]. The SIVmac239 inoculum was sequenced by non limiting dilution PCR. The sequence of the infecting strain was identical to the clone from which it was derived with potential small errors during in vitro amplification. We have indicated the limitation in the revised manuscript. However, we note that our method is the best way for obtaining the clonal nature of the infecting inoculum as far as we can. Animal r00065 (r65) was infected with 100 TCID_{50 }SIVmac239 by intravenous injection. Animal r00098 (r98) was infected by intrarectal inoculation with 10 MID_{50 }SIVmac239. Viral RNA was isolated from frozen plasma samples from animal r00065 collected at days 4, 7, 11, and 18 following virus infection. From animal r00098, viral RNA was isolated from frozen plasma samples collected at days 4, 7, 21 during infection. Virallyinfected animals were cared for according to the regulations of the University of Wisconsin Institutional Animal Care and Use Committee, and the NIH.
Viral RNA isolation and cDNA synthesis
Viral RNA was isolated from each animal at defined time points following infection. Cellfree plasma was prepared from EDTA anticoagulated whole blood by ficoll density gradient centrifugation. Viral RNA isolation was performed using the QIAamp MinElute Virus Spin Kit (QIAGEN, Valencia, CA) according to the manufacturer's instructions. Single strand cDNA was generated using oligo dT primers and the Superscript III reverse transcription kit (Invitrogen, Carlsbad, California, USA) according to the manufacturer's instructions.
Limiting Dilution and nested PCR
cDNA template was diluted to ~1 viral genome per microliter. The dilution factor necessary to achieve single viral genomes was defined as the template dilution for which only 30% of reactions produced a product. According to a Poisson distribution, the cDNA dilution that yields PCR products in no more than 30% of wells contains one amplifiable cDNA template per positive PCR more than 80% of the time. This was empirically determined using a dilution series and varied between samples and cDNA preps. The dilution series and PCR reactions were set up using a QIAGEN BR3000 liquid handling robot (QIAGEN, Valencia, CA). All PCR reactions used Phusion HighFidelity polymerase (Finnzymes, Espoo, Finland). A nested PCR approach was used for all amplifications. The following primers designed to amplify a region of the viral Nef gene were used for the first round of PCR: 5'CAAAGAAGGAGACGGTGGAG3' and 5'CATCAAGAAAGTGGGCGTTC3'. Second round PCR was conducted using 2 ul of the first round PCR product and the following internal primers were used for nested PCR: 5'TCAGCAACTGCAGAACCTTG3' and 5'CGTAACATCCCCTTGTGGAA3'. For all PCR reactions, the following conditions were used: 98C for 30 s, 30 cycles of: 98C for 5 s, 63C for 1 s and 72C for 10 s, followed by 72C for 5 min. PCR products were run on a 1.5% agaroe gel. PCR products were purified using the Chargeswitch kit (Invitrogen, Carlsbad, Calfornia, USA) according to the manufacturer's instructions. Samples were bidirectionally sequenced susing ETterminator chemistry on an Applied Biosystems 3730 Sequencer (Applied Biosystems, Foster City, California, USA) and the internal primers described above. DNA sequence alignments were performed using CodonCode Aligner version 2.0 (CodonCode Corporation, Dedham, Massachusetts, USA).
Modeling Sequence Evolution in Primary HIV1/SIV Infection
The details of our model for characterizing sequence evolution in acute HIV1 infection will be described by Lee et al. (HY Lee, EE Giorgi, BF Keele, B Gaschen, GS Athreya, JF SalazarGonzalez, KT Pham, PA Geopfert, JM Kilby, MS Saag, EL Delwart, MP Busch, BH Hahn, GM Shaw, BT Korber, T Bhattacharya, and AS Perelson, Modeling Sequence Evolution in Acute HIV1 Infection, submitted for publication). We provide here an overview of the salient features of the model and its underlying assumptions. After transmission we assume that a systematic infection starts with a single infected cell in a new host. The number of secondary infections caused by one infected cell placed in a population of cells fully susceptible to infection is called the basic reproductive number, R_{0}. The available data in humans infected with HIV1 and in monkeys infected with SIV and SHIV show that virus grows exponentially until a viral load peak is attained a few weeks after infection [4143]. Following the peak, viral levels decline and establish a setpoint. At the setpoint each infected cell, on average, successfully infects one other cell during its lifetime.
We assumed a homogeneous infection in which the virus grows exponentially with no selection pressure, no recombination, and a constant mutation rate across positions and across lineages. Cell infections occur randomly by the viruses released from an infected cell. Viral production starts on average about 24 hours after a cell is initially infected [44,45], and most likely continues until cell death. While each of the R_{0 }infections could occur at different times, we took a first step in assessing the role of asynchrony by assuming the infections occur at two different times. The average time to new infection defines the viral generation time, τ. Each new infection entails a single round of reverse transcription introducing errors in the proviral DNAs with the number of mutations given by the Binomial distribution, Binom(n; N_{B}, ε), where n is the number of new base substitutions. Binomial distribution implies that base substitutions occur independently with the probability of ε at each site of SIV genome with the length N_{B }in each reverse transcription cycle. The MonteCarlo model explicitly emulates all the new infection procedures with mutations, tracking the population of proviral nef genes of the infected cells by introducing base substitutions as infection propagates in a new host.
In Ref. [2], we determined that the MC simulation and the mathematical model showed a good agreement with the level of sequence diversity sampled from acute HIV1 subjects presumably infected with a single variant. Based on the prediction made by the model, the group of identical sequences, usually the consensus sequence of sampled strains, was presumed to be the initial founder strain established by the systematic infection in each host. The parameters used in the acute HIV1 model were: i) the average generation time of productively infected cells, defined as the average time interval between the infection of a target cell and the subsequent infection of new cells by progeny virions, estimated as 2 days [44], ii) HIV1 single cycle forward mutation rate, estimated as ε = 2.16 × 10^{5 }per site per cycle [46], and iii) the basic reproductive ratio, defined as the number of newly infected cells that arise from any one infected cell when almost all cells are uninfected, estimated as R_{0 }= 6[41]. In the asynchronous infection model, the first time at which a newly infected cell infects other cells, τ, is chosen as 1.5 days. The length of nef gene, N_{B}, we simulated is 792. We used these parameter values to analyze our data set. For example, calculated R_{0 }values during primary SIV infection from viral rampup slope ranged from 2.2 to 68 [43], which justifies the choice of R_{0 }= 6. Improvement of the model requires more accurate estimations for these basic parameters during SIV early infection.
The mutation rate, ε, and the generation time, τ, control the rate of increase in divergence and hence diversity. The larger the mutation rate, the faster the genomes mutate, hence the steeper the growth in diversity. The greater the generation time, the slower the genomes diversify, hence the smaller the growth in diversity. The slope of diversification is approximately proportional to ε/τ. On the other hand, R_{0 }mainly controls the growth in the infected cell population size. As the viral population grows, the number of cells one infected cell infects decreases due to the fact that fewer cells are available for infection. The basic reproductive ratio, R_{0}, affects the rate of evolution in a relatively minor way. Low values (e.g. 2 ≤ R_{0 }≤ 4), slow down the growth in the infected cell population, thus affecting the speed of evolution. For example, from R_{0 }= 6 to R_{0 }= 2 there is a 15.9% increase in the slope of diversity. On the other hand, for R_{0 }≥ 6, the dependence of the rate of diversification on R_{0}is reduced. The slope of diversity increases by 5.5% as we increase R_{0 }from 6 to 10. The dynamics of diversity do not depend on the number of initial infected cells.
Once we sample a finite number of sequences from the MC simulation at a given time, we first measure the Hamming distance (HD_{0}) between each sampled sequence and the founder sequence and the Hamming distance (HD) between sequences sampled at the same time. Here Hamming distance is the number of base substitutions between two sequences. Based on the calculated HD_{0 }and HD, we define the basic measurements for quantifying the evolution of HIV1 sequence populations. Divergence is defined as the average HD_{0 }per base from the initial founder strain; diversity is defined as the average intersequence Hamming distance per base among sequence pairs at a given time; variance is defined as the variance of the intersequence per base HD distribution; maximum HD is defined as the measured maximum HD between all sequence pairs sampled, and sequence identity is defined as the proportion of sequences identical to the founder strain. Both the MC simulation and mathematical calculation showed that divergence, diversity, and variance increase linearly as a function of time and sequence identity decays exponentially as a function of time [Fig. 2]. These behaviours are characteristics of neutral evolution, characterized as Poisson distribution and starphylogeny topology. It has been shown that the distribution of pairwise genetic distances is an approximate Poisson in the evolution of mitochondrial DNA [28]. To address the issue of the finite size of samples, we repeated MC simulations sampling a finite number of nef genes at a given time and computed 95% CIs for each quantity. Then we examined whether the measurement of SIV nef gene samples was compatible with the model prediction or not. To infer the number of days elapsed since infection based on sampled strains, first we fit the Poisson distribution to the observed distribution of Hamming distances between sampled nef genes and the transmitted nef gene; we then determined the mean of the Poisson distribution and calculated days post infection using Eq. (2).
A key property of the Poisson distribution arising from neutral evolution without selection and recombination is that the level of diversity is comparable to that of variance. We used this property to examine whether sampled strains had evolved from a single founder strain or not. In each MC run, we obtained the values of diversity and variance from the sampled sequences with a given sample size at each time and located those values in the plane of diversity and variance. By repeating MC simulations, we collected all the values of diversity and variance and computed 95% CIs in the plane of diversity and variance. The computed 95% CIs form a conical region within which diversity and variance of the sampled sequences from the animal with homogeneous infection (i.e. infections with a single founder strain without any selection pressure or recombination) are expected to be located [Figure 5]. As we sample more, the conical region becomes smaller [Figure 5]. Another requirement for homogeneous infection is that the sequence diversity should be less than the upper limit of the 95% CIs of the diversity at a given time following infection with a single virus strain.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
BNB and DHO performed the animal experiment and nef gene SGA sequencing. HL performed the sequence data analysis and model simulations. EEG and ALA were responsible for the statistical analysis including the Poisson fit. PC, BK, SD, and HL were responsible for design and writing of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank B. T. Korber, B. F. Keele, T. Bhattacharya, and A. S. Perelson for critical reading and comments and M. Draheim for technical support. This publication was supported by NIAID/NIH grant AI083115, NIH grant AI049781, NCRR/NIH grant P51 RR000167, Research Facilities Improvement Program grant numbers RR1545901 and RR02014101, University of Rochester Developmental Center for AIDS research (NIH P30AI078498), and NIH P01 AI056356.
References

SalazarGonzalez JF, Bailes E, Pham KT, Salazar MG, Guffey MB, Keele BF, Derdeyn CA, Farmer P, Hunter E, Allen S, et al.: Deciphering Human Immunodeficiency Virus Type 1 Transmission and Early Envelope Diversification by Single Genome Amplification and Sequencing.
J Virol 2008, 82:395270. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Keele BF, SalazarGonzalez JF, Pham KT, Salazar MG, Sun C, Grayson T, Decker JM, Wei X, Wang S, Goepfert PA, et al.: Identification and characterization of transmitted and early founder virus envelopes in primary HIV1 Infection.
Proc Natl Acad Sci USA 2008, 105:75527557. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gottlieb GS, Heath L, Nickle DC, Wong KG, Leach SE, Jacobs B, Gezahegne S, van 't Wout AB, Jacobson LP, Margolick JB, Mullins JI: HIV1 variation before seroconversion in men who have sex with men: analysis of acute/early HIV infection in the multicenter AIDS cohort study.
J Infect Dis 2008, 197:10111015. PubMed Abstract  Publisher Full Text

Kuyl AC, Cornelissen M: Identifying HIV1 dual infections.
Retrovirology 2007, 4:67. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Gottlieb GS, Nickle DC, Jensen MA, Wong KG, Grobler J, Li F, Liu SL, Rademeyer C, Learn GH, Karim SS, et al.: Dual HIV1 infection associated with rapid disease progression.
Lancet 2004, 363:619622. PubMed Abstract  Publisher Full Text

Gottlieb GS, Nickle DC, Jensen MA, Wong KG, Kaslow RA, Shepherd JC, Margolick JB, Mullins JI: HIV type 1 superinfection with a dualtropic virus and rapid progression to AIDS: a case report.
Clin Infect Dis 2007, 45:501509. PubMed Abstract  Publisher Full Text

Costa LJ, Mayer AJ, Busch MP, Diaz RS: Evidence for Selection of more Adapted Human Immunodeficiency Virus Type 1 Recombinant Strains in a Dually Infected Transfusion Recipient.
Virus Genes 2004, 28:259272. PubMed Abstract  Publisher Full Text

Haigwood NL: Predictive value of primate models for AIDS.
AIDS Rev 2004, 6:187198. PubMed Abstract

Hu SL: Nonhuman primate models for AIDS vaccine research.
Curr Drug Targets Infect Disord 2005, 5:193201. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lackner AA, Veazey RS: Current concepts in AIDS pathogenesis: insights from the SIV/macaque model.
Annu Rev Med 2007, 58:461476. PubMed Abstract  Publisher Full Text

Staprans SI, Dailey PJ, Rosenthal A, Horton C, Grant RM, Lerche N, Feinberg MB: Simian immunodeficiency virus disease course is predicted by the extent of virus replication during primary infection.
J Virol 1999, 73:48294839. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mellors JW, Kingsley LA, Rinaldo CR Jr, Todd JA, Hoo BS, Kokka RP, Gupta P: Quantitation of HIV1 RNA in plasma predicts outcome after seroconversion.
Ann Intern Med 1995, 122:573579. PubMed Abstract  Publisher Full Text

Mellors JW, Rinaldo CR Jr, Gupta P, White RM, Todd JA, Kingsley LA: Prognosis in HIV1 infection predicted by the quantity of virus in plasma.
Science 1996, 272:11671170. PubMed Abstract  Publisher Full Text

Centlivre M, Sala M, WainHobson S, Berkhout B: In HIV1 pathogenesis the die is cast during primary infection.
Aids 2007, 21:111. PubMed Abstract  Publisher Full Text

Overbaugh J, Bangham CR: Selection forces and constraints on retroviral sequence variation.
Science 2001, 292:11061109. PubMed Abstract  Publisher Full Text

Rybarczyk BJ, Montefiori D, Johnson PR, West A, Johnston RE, Swanstrom R: Correlation between env V1/V2 region diversification and neutralizing antibodies during primary infection by simian immunodeficiency virus sm in rhesus macaques.
J Virol 2004, 78:35613571. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Allen TM, O'Connor DH, Jing P, Dzuris JL, Mothe BR, Vogel TU, Dunphy E, Liebl ME, Emerson C, Wilson N, et al.: Tatspecific cytotoxic T lymphocytes select for SIV escape variants during resolution of primary viraemia.
Nature 2000, 407:386390. PubMed Abstract  Publisher Full Text

O'Connor DH, Allen TM, Vogel TU, Jing P, DeSouza IP, Dodds E, Dunphy EJ, Melsaether C, Mothe B, Yamamoto H, et al.: Acute phase cytotoxic T lymphocyte escape is a hallmark of simian immunodeficiency virus infection.
Nat Med 2002, 8:493499. PubMed Abstract  Publisher Full Text

Lichterfeld M, Yu XG, Cohen D, Addo MM, Malenfant J, Perkins B, Pae E, Johnston MN, Strick D, Allen TM, et al.: HIV1 Nef is preferentially recognized by CD8 T cells in primary HIV1 infection despite a relatively high degree of genetic diversity.
AIDS 2004, 18:13831392. PubMed Abstract  Publisher Full Text

Ueno T, Motozono C, Dohki S, Mwimanzi P, Rauch S, Fackler OT, Oka S, Takiguchi M: CTLmediated selective pressure influences dynamic evolution and pathogenic functions of HIV1 Nef.
J Immunol 2008, 180:11071116. PubMed Abstract  Publisher Full Text

Huang KJ, Wooley DP: A new cellbased assay for measuring the forward mutation rate of HIV1.
J Virol Methods 2005, 124:95104. PubMed Abstract  Publisher Full Text

Kirchhoff F, Easterbrook PJ, Douglas N, Troop M, Greenough TC, Weber J, Carl S, Sullivan JL, Daniels RS: Sequence variations in human immunodeficiency virus type 1 Nef are associated with different stages of disease.
J Virol 1999, 73:54975508. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Palmer S, Kearney M, Maldarelli F, Halvas EK, Bixby CJ, Bazmi H, Rock D, Falloon J, Davey RT Jr, Dewar RL, et al.: Multiple, linked human immunodeficiency virus type 1 drug resistance mutations in treatmentexperienced patients are missed by standard genotype analysis.
J Clin Microbiol 2005, 43:406413. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shriner D, Rodrigo AG, Nickle DC, Mullins JI: Pervasive genomic recombination of HIV1 in vivo.
Genetics 2004, 167:15731583. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Harris RS, Liddament MT: Retroviral restriction by APOBEC proteins.
Nat Rev Immunol 2004, 4:868877. PubMed Abstract  Publisher Full Text

Simon V, Zennou V, Murray D, Huang Y, Ho DD, Bieniasz PD: Natural variation in Vif: differential impact on APOBEC3G/3F and a potential role in HIV1 diversification.
PLoS Pathog 2005, 1:e6. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bourara K, Liegler TJ, Grant RM: Target cell APOBEC3C can induce limited GtoA mutation in HIV1.
PLoS Pathog 2007, 3:14771485. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Slatkin M, Hudson RR: Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations.
Genetics 1991, 129:555562. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Smith JM, Haigh J: The hitchhiking effect of a favourable gene.
Genet Res 1974, 23:2335. PubMed Abstract

Charlesworth D, Morgan MT, Charlesworth B: The effect of linkage and population size on inbreeding depression due to mutational load.
Genet Res 1992, 59:4961. PubMed Abstract

Miller MD, Warmerdam MT, Gaston I, Greene WC, Feinberg MB: The human immunodeficiency virus1 nef gene product: a positive factor for viral infection and replication in primary lymphocytes and macrophages.
J Exp Med 1994, 179:101113. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sinclair E, Barbosa P, Feinberg MB: The nef gene products of both simian and human immunodeficiency viruses enhance virus infectivity and are functionally interchangeable.
J Virol 1997, 71:36413651. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Arien KK, Verhasselt B: HIV Nef: role in pathogenesis and viral fitness.
Curr HIV Res 2008, 6:200208. PubMed Abstract  Publisher Full Text

Wakeley J: Coalescent Theory: An Introduction. Robert & Company Publishers; 2008.

Brown AJ: Analysis of HIV1 env gene sequences reveals evidence for a low effective number in the viral population.
Proc Natl Acad Sci USA 1997, 94:18621865. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Achaz G, Palmer S, Kearney M, Maldarelli F, Mellors JW, Coffin JM, Wakeley J: A robust measure of HIV1 population turnover within chronically infected individuals.
Mol Biol Evol 2004, 21:19021912. PubMed Abstract  Publisher Full Text

Shriner D, Liu Y, Nickle DC, Mullins JI: Evolution of intrahost HIV1 genetic diversity during chronic infection.
Evolution 2006, 60:11651176. PubMed Abstract

Rouzine IM, Coffin JM: Linkage disequilibrium test implies a large effective population number for HIV in vivo.
Proc Natl Acad Sci USA 1999, 96:1075810763. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ronaghi M, Uhlen M, Nyren P: A sequencing method based on realtime pyrophosphate.
Science 1998, 281:363365. PubMed Abstract  Publisher Full Text

Kestler H, Kodama T, Ringler D, Marthas M, Pedersen N, Lackner A, Regier D, Sehgal P, Daniel M, King N, et al.: Induction of AIDS in rhesus monkeys by molecularly cloned simian immunodeficiency virus.
Science 1990, 248:11091112. PubMed Abstract  Publisher Full Text

Stafford MA, Corey L, Cao Y, Daar ES, Ho DD, Perelson AS: Modeling plasma virus concentration during primary HIV infection.
J Theor Biol 2000, 203:285301. PubMed Abstract  Publisher Full Text

Mattapallil JJ, Douek DC, Hill B, Nishimura Y, Martin M, Roederer M: Massive infection and loss of memory CD4+ T cells in multiple tissues during acute SIV infection.
Nature 2005, 434:10931097. PubMed Abstract  Publisher Full Text

Nowak MA, Lloyd AL, Vasquez GM, Wiltrout TA, Wahl LM, Bischofberger N, Williams J, Kinter A, Fauci AS, Hirsch VM, Lifson JD: Viral dynamics of primary viremia and antiretroviral therapy in simian immunodeficiency virus infection.
J Virol 1997, 71:75187525. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Perelson AS, Neumann AU, Markowitz M, Leonard JM, Ho DD: HIV1 dynamics in vivo: virion clearance rate, infected cell lifespan, and viral generation time.
Science 1996, 271:15821586. PubMed Abstract  Publisher Full Text

Markowitz M, Louie M, Hurley A, Sun E, Di Mascio M, Perelson AS, Ho DD: A novel antiviral intervention results in more accurate assessment of human immunodeficiency virus type 1 replication dynamics and Tcell decay in vivo.
J Virol 2003, 77:50375038. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mansky LM, Temin HM: Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase.
J Virol 1995, 69:50875094. PubMed Abstract  Publisher Full Text  PubMed Central Full Text