Data Mining Builds Process Understanding
for Vaccine Manufacturing
WCBP 2009
Current Topics in Vaccine Development January 14, 2009
Julia O’Neill, Principal Engineer
2
Merck develops and applies
the most powerful data mining
techniques
to untangle the complexities
3 Bu lk P o te n cy
Bulk Potency by Lot Sequence
An Example – Manufacturing History of a Vaccine Bulk
The inherent variability of biologics manufacturing presents challenges to developing process understanding.
4
“Traditional” Approach to Building Process Understanding:
Examine One Change at a Time
1. Identify potency shifts. 2. Identify process changes.
3. Match timing of shifts to changes.
1 2 3 4 5 6
1 2 3 4 5 6
Bulk Potency by Lot Sequence Bulk Potency by Lot Sequence
5
Example Vaccine Manufacturing Process
Cell Growth
~ 3 weeks
Cell Growth and Virus propagation ~ 4 weeks Purification, Inactivation, etc. ~ 2 weeks
Simplified schematic of a viral vaccine manufacturing process. Assay to determine bulk potency Dilution to appropriate strength in vials.
Bioreactors
Downstream
6
Biologics mantra: “the product is the process.” *
Bioreactors
Downstream
Cell Bank Lot exhausted;
new lot introduced. Chromatography
resin lots exhausted and
replaced.
Improved assay implemented
* Building on Steven Kozlowski’s Monday talk. Virus Stock Seed Lot exhausted;
new lot introduced. Raw Material preparation
7
New Approach to Building Process Understanding:
Apply Multivariate Data Mining
X’s Y = Potency
Investment in creation of electronic database: 900 + X variablesRaw material lots
Bioreactor monitored variables Time to conduct process steps Known changes
8
Tree-Based Predictors
X’s Y = Potency
Raw material lots
Bioreactor monitored variables Time to conduct process steps Known changes
etc.
( 900 + X variables ) Lots a,b,c
Lots d,e,f Tree is grown by sequentially splitting Potency
9
Acknowledgements
•
Collaboration across many functional areas within Merck:
– Applied Computer Science & Mathematics
– Bioprocess & Bioanalytical Research & Development – Fermentation & Cell Culture
– Global Vaccine Technology & Engineering – Merck Lean Six Sigma
– Process Analytical Technology – Regulatory & Analytical Sciences – Vaccine Manufacturing Operations
•
External statistical consultant:
10
Random Forests
• A collection of trees with controlled variations.
• Trees “vote” for the best predictors.
• Advantages:
– Consistently matches or outperforms accuracy of other data mining methods. – Handles a large number of inputs, resistant to over-fitting.
– Robust to outliers.
– Very fast.
– Not confounded by confounding.
11
Variable Importance for Bulk Potency by Random Forests
BR process change 1
BR raw material change
BR Day 4 Glucose
DS raw material change 1
BR Day 1 DO BR input 1 CE Split II variable 1 CE Split II variable 2 BR Day 3 pH BR Day 2 Lactate CE Split I variable 1 BR timing variable BR GUR
BR raw material prep
CE Split III variable 3
CE Split I variable 1
CE Split III variable 4
CE Split III variable 5
CE Cell Bank lot change
CE Split II variable 4
CE Split III variable 1
BR Day 5 DO
CE Split II variable 5
BR Day 8 DO
BR variable 6
CE Split III input variable
BR Variable 7
BR Day 2 temperature
BR input variables
Important variables were suspected in advance of random forests analysis. Only 1 variable is Downstream – all others are Bioreactor or Cell Expansion.
12
Simple Regression Model
predictions based on 1
st, 2
nd, and 4
thvariables on list
Although a large percentage of the variation is explained overall, the predictions are not satisfactory for recent production.
13
Raw Material Lot Change Timing
New raw material lot
New raw material lot
Growth Propagation Purification
Bulk product lots
Weeks
14
Bioreactor - subtle shifts in Glucose
BR - Day 3 Glucose (mg/dL) (1) 0 20 40 60 80 100 120 140 160 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot # BR - Day 4 Glucose (mg/dL) (2) 0 20 40 60 80 100 120 140 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 Lot #
BR - Day 6 Glucose (mg/dL) (2) of Bioreactors (2)
0 10 20 30 40 50 60 70 80 90 100 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot #
15
Partial Least Squares model improves predictions
Predictions based on 1
st, 2
nd,
and 4
thsuspect variables
alone.
Partial Least Squares
predictions
incorporating all
bioreactor monitored
variables.
16
Causes of Bulk Potency Changes
Bioreactors
Downstream
Higher output from bioreactors
due to known raw material and process changes.
Yield shifts related to variation across raw material lots.
Contributing factor: Bioreactor performance cycling. Newly discovered pre-existing variability (Kozlowski)
17
Results
Merck develops and applies
the most powerful data mining
techniques
to untangle the complexities
of manufacturing biologic
products.
Additional benefits:
Ability to
predict potency
before assay results are available.
- Monitor against a forecast potency.
- Builds our understanding of the biology.
Basis for revising
CPP’s
.