Hierarchical Bayesian Modeling of the HIV
Response to Therapy
Shane T. Jensen
Department of Statistics, The Wharton School, University of Pennsylvania
March 23, 2010
Therapy: Disrupting the HIV infection cycle
Drugs are a popular medical strategies for keep viral load down by disrupting infection cycle of HIV
Drug therapies: drugs designed to bind to surface of HIV
and prevent it from attaching to target cells We will model a promising new type of treatment
Antisense gene therapy: allow HIV to bind to target cell and release viral RNA, but then attack viral RNA directly before it can be integrated into target cell genome HIV will try toevolve(change either protein or RNA) to escape this therapy
Drug Therapy versus Gene Therapy Illustration
!"#"$%&"'()*$
+',-$%&"'()*$
HIV Virion HIV Drug Therapy nucleus cytosolHIV viral RNA
HIV Gene Therapy
Mutation and Recombination
Primary mechanisms for evolution:
Mutation: change of identity of a single RNA nucleotide.
Can also delete nucleotides
Recombination: two viral RNA sequences are spliced to produce a hybrid sequence.
!"#$"#"%
!"$$"#"%
!"#$"#"%
!"$$"#"%
HIV has one of the highest rates of mutation/recombination of any organism ever seen
Population Genetics
Due to high mutation and recombination rates, individuals infected with HIV can have several distinct HIVstrains
Need to model HIV as apopulation of sequences, not a specific sequence, that evolves in response to therapy
Issues with Evolutionary Response
Goal is to model the evolutionary response (through mutation and recombination) of HIV to therapy
Three crucial components of problem must be addressed:
Mutation vs. recombination
Rates for both processes of sequence change must be modeled simultaneously
Spatial heterogeneity
Therapies target specific regions of HIV genome and so evolution could also be in specific locations
Two sample comparison
Real interest is differences in mutation and recombination rates between treatment and control sequences
Overview of our Approach
Thecoalescent with recombination: a population
genetics model we build upon to model mutation and recombination rates for a population of sequences
We expand previous coalescent-based approaches to allow changes at nucleotide level (instead of protein level)
Blocking structurefor mutation and recombination rates
Allows for spatial heterogeneity while still sharing information between neighboring sequence regions
Hierarchical prior distributionhandles two sample
structure
Allows for differential treatment effect while pooling information between treatment and control sequences
Notation
Dataare aligned nucleotide sequencesH= (HC,HT)
HC= (hC
1, . . . ,hCn)are control sequences of lengthL
HT = (hT
1, . . . ,hTm)are treatment sequences of lengthL Parametersof interest areΘ= (ρC,ρT,µC,µT)
ρare recombination rates (treatment and control)
µare mutation rates (treatment and control)
All rates also vary spatially along length of sequences
AncestryGrelates all sequences to each other
Coalescent model for sequence history G
Coalescent: sequences coalesce into common lineages
back to their most recent common ancestor
!" #" $" %" &"
'()(*+,-)"."/01**().2" '()(*+,-)".3!" '()(*+,-)".3#" '()(*+,-)".3$"
Mutationratesµeasy to build into coalescent modelG
Estimation with Coalescent
Sequence ancestryGis not of direct interest: goal is mutation and recombination ratesΘ
Maximum likelihood estimation
sup
Θ,G
p(H|Θ,G)
orintegrationover all possible ancestriesG
p(H|Θ) =X
G
p(H|Θ,G)p(G)
are both very difficult tasks given thelarge spaceofG
Even for a relatively small number of sequences, such as 100, the space of possibleGis huge.
Product of Approximate Conditionals
Marginal likelihoodp(H|Θ)intractable over allG
PAC (Product of approximate conditionals) likelihood p(H|Θ) = p(h1|Θ)p(h2|h1,Θ)p(h3|h2,h1,Θ)· · ·
≈ pˆ(h1|Θ) ˆp(h2|h1,Θ)ˆp(h3|h2,h1,Θ)· · ·
Approximate sequencehk+1as a mosaic of sequences
(h1, . . . ,hk)generated by ahidden Markov model
ˆ
p(hk+1|h1:k,Θ)calculated using forward summing algorithm for HMMs. Depends on ordering of sequences so average calculation over several different orderings
Structure on Mutation and Recombination
PAC likelihood allows us to more easilyintegrate out
ancestryGso we can focus on modeling of parametersΘ
Θincludes mutation ratesµand recombination ratesρ Now need additional structure in our model to address:
Spatial heterogeneity: different rates for mutation and recombination in different sequence regions
Two-samples comparison: want to estimate differential rates between treatment vs. control populations
Should allow us to estimate differential evolution response to therapy
Hierarchical Blocking Structure
Hierarchical prioron mutationµand recombinationρ
Rates vary along sequence inpiece-wise constantway e.g. Bµcontiguous blocks(µ1, . . . , µBµ)of mutation rates
Blocking Structure Example
Grand central mutation and recombination rates (gray) Central mutation/recombination rate for each block (blue) Treatment and control rates around central rate (black)
Model Implementation
PAC likelihoodpˆ(H|Θ)gives sequencesHas function of mutation and recombination ratesΘ
Hierarchical prior distributionP(Θ)for spatial heterogeneity and two sample comparison Focus onposterior distributionfor inference:
p(Θ|H)∝pˆ(H|Θ)p(Θ)
MCMC implementation: Gibbs and Metropolis-Hastings moves for most parameters as well as reversible jump moves for the blocking structure
MCMC moves
1 Reversible jump moves for blocking structure:
1 Choose block uniformly to split or merge with a neighbor 2 Move block boundary to the left or right
2 Gibbs moves for rate parameters 1 Sample treatment vs control rates(µT
j , µCj )for each block
2 Sample central mutation rateµj for each block 3 Sample grand central mutation rateµ0
4 Sample variance of treatment and control mutation ratesσµ2 5 Sample variance of central mutation ratesσ2
µ0
3 Same set of blocking and rate moves for recombination 4 MH move for transition/transversion ratioκ
Application to Antisense Gene Therapy
VIRxSYS gene therapy: data generatedin vitrofrom a sample of wt-HIV that were exposed to VIRxSYS gene therapy and a control sample
Focus ontreatment effects: differential mutation rate
µT −µCand recombination rateρT −ρCfor each location along the sequence
Mutation Treatment Effect of Antisense Gene Therapy
The large increase in mutation overlaps with the antisense target region.
Increases in mutation to the left of antisense target region are consistent with other gene therapy studies.
Recombination Treatment Effect of Antisense Gene Therapy The area of decreased recombination corresponds to the area of increased mutation, but it is not significant.
Wide posterior intervals in part because recombination does not seem to have a strong spatial signal
Simpler Approaches for Mutation
Simplest approach would just be to examine mutation directly throughsegregating sites
Segregating sites are nucleotide locations where at least one sequence in the sample differs from the others.
A
!
GA
T
TACA
!
CAT
!
ATT
AC
C
A
!
GA
C
TACA
G
CAT
!
ATT
AC
C
A
!
GA
T
TACA
!
CAT
!
ATT
GC
C
A
!
GA
T
TACA
!
CAT
!
ATT
AC
C
A
!
GA
T
TACA
!
CAT
!
ATT
AA
C
Comparison to Segregating Sites
Compare segregating sites (blue = treatment, red = control) to posterior differential mutation rateµT −µC
Higher densityof segregating sites around elevated
mutation area, but our model allows sharing of information between closely located sites
Summary
Our sophisticated model allows us to measure viral
evolutionary changethrough spatially-varying
recombination and mutation at the nucleotide level. Our model measures pairwise differences in mutation and recombination between treatment and control groups, allowing estimation of spatially varying treatment effects. Our methodology able to detectbiologically relevant signal in two HIV applications:
Identified drug-resistant mutations in Enfuvirtide drug therapy
Detect elevated mutation rates that overlap with antisense target in VIRxSYS gene therapy