Data Mining Builds Process Understanding for Vaccine Manufacturing

(1)

Data Mining Builds Process Understanding

for Vaccine Manufacturing

WCBP 2009

Current Topics in Vaccine Development January 14, 2009

Julia O’Neill, Principal Engineer

(2)

2

Merck develops and applies

the most powerful data mining

techniques

to untangle the complexities

(3)

3 Bu lk P o te n cy

Bulk Potency by Lot Sequence

An Example – Manufacturing History of a Vaccine Bulk

The inherent variability of biologics manufacturing presents challenges to developing process understanding.

(4)

4

“Traditional” Approach to Building Process Understanding:

Examine One Change at a Time

1. Identify potency shifts. 2. Identify process changes.

3. Match timing of shifts to changes.

1 2 3 4 5 6

Bulk Potency by Lot Sequence Bulk Potency by Lot Sequence

(5)

5

Example Vaccine Manufacturing Process

Cell Growth

~ 3 weeks

Cell Growth and Virus propagation ~ 4 weeks Purification, Inactivation, etc. ~ 2 weeks

Simplified schematic of a viral vaccine manufacturing process. Assay to determine bulk potency Dilution to appropriate strength in vials.

Bioreactors

Downstream

(6)

6

Biologics mantra: “the product is the process.” *

Bioreactors

Downstream

Cell Bank Lot exhausted;

new lot introduced. Chromatography

resin lots exhausted and

replaced.

Improved assay implemented

* Building on Steven Kozlowski’s Monday talk. Virus Stock Seed Lot exhausted;

new lot introduced. Raw Material preparation

(7)

7

New Approach to Building Process Understanding:

Apply Multivariate Data Mining

X’s Y = Potency

Investment in creation of electronic database: 900 + X variables

Raw material lots

Bioreactor monitored variables Time to conduct process steps Known changes

(8)

8

Tree-Based Predictors

X’s Y = Potency

Raw material lots

Bioreactor monitored variables Time to conduct process steps Known changes

etc.

( 900 + X variables ) Lots a,b,c

Lots d,e,f Tree is grown by sequentially splitting Potency

(9)

9

Acknowledgements

•

Collaboration across many functional areas within Merck:

– Applied Computer Science & Mathematics

– Bioprocess & Bioanalytical Research & Development – Fermentation & Cell Culture

– Global Vaccine Technology & Engineering – Merck Lean Six Sigma

– Process Analytical Technology – Regulatory & Analytical Sciences – Vaccine Manufacturing Operations

•

External statistical consultant:

(10)

10

Random Forests

• A collection of trees with controlled variations.

• Trees “vote” for the best predictors.

• Advantages:

– Consistently matches or outperforms accuracy of other data mining methods. – Handles a large number of inputs, resistant to over-fitting.

– Robust to outliers.

– Very fast.

– Not confounded by confounding.

(11)

11

Variable Importance for Bulk Potency by Random Forests

BR process change 1

BR raw material change

BR Day 4 Glucose

DS raw material change 1

BR Day 1 DO BR input 1 CE Split II variable 1 CE Split II variable 2 BR Day 3 pH BR Day 2 Lactate CE Split I variable 1 BR timing variable BR GUR

BR raw material prep

CE Split III variable 3

CE Split I variable 1

CE Cell Bank lot change

CE Split II variable 4

BR Day 5 DO

CE Split II variable 5

BR Day 8 DO

BR variable 6

CE Split III input variable

BR Variable 7

BR Day 2 temperature

BR input variables

Important variables were suspected in advance of random forests analysis. Only 1 variable is Downstream – all others are Bioreactor or Cell Expansion.

(12)

12

Simple Regression Model

predictions based on 1

st

, 2

nd

, and 4

th

variables on list

Although a large percentage of the variation is explained overall, the predictions are not satisfactory for recent production.

(13)

13

Raw Material Lot Change Timing

New raw material lot

Growth Propagation Purification

Bulk product lots

Weeks

(14)

14

Bioreactor - subtle shifts in Glucose

BR - Day 3 Glucose (mg/dL) (1) 0 20 40 60 80 100 120 140 160 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot # BR - Day 4 Glucose (mg/dL) (2) 0 20 40 60 80 100 120 140 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 Lot #

BR - Day 6 Glucose (mg/dL) (2) of Bioreactors (2)

0 10 20 30 40 50 60 70 80 90 100 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot #

(15)

15

Partial Least Squares model improves predictions

Predictions based on 1

st

, 2

nd

,

and 4

th

suspect variables

alone.

Partial Least Squares

predictions

incorporating all

bioreactor monitored

variables.

(16)

16

Causes of Bulk Potency Changes

Bioreactors

Downstream

Higher output from bioreactors

due to known raw material and process changes.

Yield shifts related to variation across raw material lots.

Contributing factor: Bioreactor performance cycling. Newly discovered pre-existing variability (Kozlowski)

(17)

17

Results

Merck develops and applies

the most powerful data mining

techniques

to untangle the complexities

of manufacturing biologic

products.

Additional benefits:

Ability to

predict potency

before assay results are available.

- Monitor against a forecast potency.

- Builds our understanding of the biology.

Basis for revising

CPP’s

.