• No results found

Data Mining Builds Process Understanding for Vaccine Manufacturing

N/A
N/A
Protected

Academic year: 2021

Share "Data Mining Builds Process Understanding for Vaccine Manufacturing"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Mining Builds Process Understanding

for Vaccine Manufacturing

WCBP 2009

Current Topics in Vaccine Development January 14, 2009

Julia O’Neill, Principal Engineer

(2)

2

Merck develops and applies

the most powerful data mining

techniques

to untangle the complexities

(3)

3 Bu lk P o te n cy

Bulk Potency by Lot Sequence

An Example – Manufacturing History of a Vaccine Bulk

The inherent variability of biologics manufacturing presents challenges to developing process understanding.

(4)

4

“Traditional” Approach to Building Process Understanding:

Examine One Change at a Time

1. Identify potency shifts. 2. Identify process changes.

3. Match timing of shifts to changes.

1 2 3 4 5 6

1 2 3 4 5 6

Bulk Potency by Lot Sequence Bulk Potency by Lot Sequence

(5)

5

Example Vaccine Manufacturing Process

Cell Growth

~ 3 weeks

Cell Growth and Virus propagation ~ 4 weeks Purification, Inactivation, etc. ~ 2 weeks

Simplified schematic of a viral vaccine manufacturing process. Assay to determine bulk potency Dilution to appropriate strength in vials.

Bioreactors

Downstream

(6)

6

Biologics mantra: “the product is the process.” *

Bioreactors

Downstream

Cell Bank Lot exhausted;

new lot introduced. Chromatography

resin lots exhausted and

replaced.

Improved assay implemented

* Building on Steven Kozlowski’s Monday talk. Virus Stock Seed Lot exhausted;

new lot introduced. Raw Material preparation

(7)

7

New Approach to Building Process Understanding:

Apply Multivariate Data Mining

X’s Y = Potency

Investment in creation of electronic database: 900 + X variables

Raw material lots

Bioreactor monitored variables Time to conduct process steps Known changes

(8)

8

Tree-Based Predictors

X’s Y = Potency

Raw material lots

Bioreactor monitored variables Time to conduct process steps Known changes

etc.

( 900 + X variables ) Lots a,b,c

Lots d,e,f Tree is grown by sequentially splitting Potency

(9)

9

Acknowledgements

Collaboration across many functional areas within Merck:

– Applied Computer Science & Mathematics

– Bioprocess & Bioanalytical Research & Development – Fermentation & Cell Culture

– Global Vaccine Technology & Engineering – Merck Lean Six Sigma

– Process Analytical Technology – Regulatory & Analytical Sciences – Vaccine Manufacturing Operations

External statistical consultant:

(10)

10

Random Forests

• A collection of trees with controlled variations.

• Trees “vote” for the best predictors.

• Advantages:

– Consistently matches or outperforms accuracy of other data mining methods. – Handles a large number of inputs, resistant to over-fitting.

– Robust to outliers.

– Very fast.

– Not confounded by confounding.

(11)

11

Variable Importance for Bulk Potency by Random Forests

BR process change 1

BR raw material change

BR Day 4 Glucose

DS raw material change 1

BR Day 1 DO BR input 1 CE Split II variable 1 CE Split II variable 2 BR Day 3 pH BR Day 2 Lactate CE Split I variable 1 BR timing variable BR GUR

BR raw material prep

CE Split III variable 3

CE Split I variable 1

CE Split III variable 4

CE Split III variable 5

CE Cell Bank lot change

CE Split II variable 4

CE Split III variable 1

BR Day 5 DO

CE Split II variable 5

BR Day 8 DO

BR variable 6

CE Split III input variable

BR Variable 7

BR Day 2 temperature

BR input variables

Important variables were suspected in advance of random forests analysis. Only 1 variable is Downstream – all others are Bioreactor or Cell Expansion.

(12)

12

Simple Regression Model

predictions based on 1

st

, 2

nd

, and 4

th

variables on list

Although a large percentage of the variation is explained overall, the predictions are not satisfactory for recent production.

(13)

13

Raw Material Lot Change Timing

New raw material lot

New raw material lot

Growth Propagation Purification

Bulk product lots

Weeks

(14)

14

Bioreactor - subtle shifts in Glucose

BR - Day 3 Glucose (mg/dL) (1) 0 20 40 60 80 100 120 140 160 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot # BR - Day 4 Glucose (mg/dL) (2) 0 20 40 60 80 100 120 140 264 292 301 334 347 355 368 375 381 401 460 463 466 469 472 475 478 481 485 488 492 503 510 528 531 535 539 Lot #

BR - Day 6 Glucose (mg/dL) (2) of Bioreactors (2)

0 10 20 30 40 50 60 70 80 90 100 26 4 29 2 30 1 33 4 34 7 35 5 36 8 37 5 38 1 40 1 46 0 46 3 46 6 46 9 47 2 47 5 47 8 48 1 48 5 48 8 49 2 50 3 51 0 52 8 53 1 53 5 53 9 Lot #

(15)

15

Partial Least Squares model improves predictions

Predictions based on 1

st

, 2

nd

,

and 4

th

suspect variables

alone.

Partial Least Squares

predictions

incorporating all

bioreactor monitored

variables.

(16)

16

Causes of Bulk Potency Changes

Bioreactors

Downstream

Higher output from bioreactors

due to known raw material and process changes.

Yield shifts related to variation across raw material lots.

Contributing factor: Bioreactor performance cycling. Newly discovered pre-existing variability (Kozlowski)

(17)

17

Results

Merck develops and applies

the most powerful data mining

techniques

to untangle the complexities

of manufacturing biologic

products.

Additional benefits:

Ability to

predict potency

before assay results are available.

- Monitor against a forecast potency.

- Builds our understanding of the biology.

Basis for revising

CPP’s

.

References

Related documents

The standard process of data mining is to take this large set of data and divide it, using a portion of the data (the training set ) for development of the model (no matter

Finally, batch processing tools are more widely adopted than real-time stream process- ing frameworks, and most pipelines tackle the analysis phase using a common script- based

Cleaning, integration of data, selection of data, transformation of data are different steps which are followed in data preprocessing and after these steps data is

It includes (automated) process discovery (i.e., extracting process models from an event log), conformance checking (i.e., monitoring deviations by comparing model

• Modeling and variable selection • Data collection • Data analysis Communicating and Acting On Results • Results presentation and action. Quantitative Analysis 3 Stages and

Our challenge is to match the customer needs of tomorrow. The speed and complexity of today’s changes require a different approach to process improvement. Process mining, or

• Data Mining can be used to discover design constraints and then help to improve the product design, (if manufacturing process is acceptable). • Data Mining can be used

For example, by analysing an event log from an organization (deploying seasonal processes), we should be able to detect that process changes happen and that the changes happen at