Magruder Statistics & Data Analysis

(1)

Magruder Statistics &

Data Analysis

(2)

Based Closely On:

“The International Harmonized Protocol for the Proficiency Testing of Analytical

Laboratories”, 2006 (IHP), MICHAEL THOMPSON, STEPHEN L. R. ELLISON AND ROGER WOOD

 AMC supported (Analytical Methods Committee of the RSC)

 Uses ISO statistical models - ISO 13528, 2005 and ISO 5725-2, 1994  Robust statistics used as described in the IHP and ISO 13528

 Duplicate analysis supports method precision calculations.  Proficiency testing often required for Laboratory Accreditation.  Independent documentation on how it all works. IHP is free!  Makes full use of Web based data transfer.

(3)

Magruder Proficiency Testing Reports Overview

True Proficiency Testing

 Analyte reports and report cards using your Method of choice.

 Support for Guarantees.

 Support for IA’s.

Individual Method Proficiency Testing

 Method reports and report cards.

Method Precision Data

 Duplicates allow calculation of Repeatability and Reproducibility.

(4)

Magruder Check Sample Program

Robust Statistics

The International Harmonized Protocol For The Proficiency Testing Of Analytical Chemistry Laboratories, 2006

ISO 13528 Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparisons, 2005 – Algorithm A

(5)

Why Robust Statistics?

 Most “real world” data distributions do not follow the Normal Gaussian

Model, they are more like “contaminated” Normals.

 Distributions have “Fat Tails” and Outliers that skew the Mean and

inflate the Standard Deviation (Normal estimators are very sensitive!).

 Even Outliers contain information. We need to weight it properly.

 We need a Robust estimate of Location for the data center.

 We need a Robust estimate of the data Dispersion.

 We need to identify and weight the “Reliable” data.

John Tukey, Peter Huber and Frank Hampel credited with founding the discipline. All since Tukey’s landmark paper in 1960

(6)

But First: We must remove

The Pathological Data!

(7)

avg _{+ 2 sd}

- 2 sd

Raw

Needs fair representation Unwarranted influence

Robust Statistics

(8)

avg _{+ 2 sd} - 2 sd avg _{+ 2 sd} - 2 sd

Raw

Robust

Robust Statistics

(9)

avg _{+ 2 sd}

- 2 sd

Robust

Z ! Z Z Z Z

Robust Statistics

Per: Frank Sikora, 2015

(10)

-8 -6 -4 -2 0 2 4 6 8 SD

Contaminated Normal

Observed Distribution Reliable Data Contamination

Fat Tails

(11)

Calculating Robust Statistics

We use Peter Huber’s H15 method and Winsorize the Data.

 Sequentially brings the outer data in towards the Median.

 Down weights Outliers and Fat tails.

 Draws the Data towards a reliable standard Normal.

Iterate this process until the mean converges.

 The new Mean X_a = Robust estimate of Location for the data center (Assigned Value).

 The new Standard Deviation = σ_rob as a fit-for-purpose Robust estimate of the data Dispersion.

Uncertainty in

X

_a

U

n

a rob





2

(12)

Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash

Tools I use

(13)

Kernel Density Plot

Let’s call it a “more precise” Histogram















 





n i i

h

X

nh

h

X

f

1

1 )

,

(

Φ is the Standard Normal density function

(14)

Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash

Tools I use

(15)

Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash

Tools I use

(16)

QQ Plot for Soluble Potash 4 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 7 -5 -4 -3 -2 -1 0 1 2 3 4 5 Raw Data Robust Data Normal Q Compare:

Raw Data (Green) Robust Data (Red)

Normal Quantile (Blue)

Tools I use

The “Sweet Spot”!

Where the curves overlap is “Reliable” Data

Normal Theoretical Quantiles or Rank Based Z Value

Da

ta Q

ua

ntil

es

Raw Data is ranked.

Robust Data is ranked.

Normal Quantiles Calculated.

All plotted against the Rank Based Z Value

(17)

Normal QQ Plot

The Blue Line:

Replace Y axis with

Normalized Data values. X_a + Z *

σ

_rob

0 + Z * 1

Random Normal Data Zero centered SD = 1

(18)

Data (Red) on Kernel Density Envelope.

Normal Curve (Grey)

Acid Soluble Iron

0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density

(19)

Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In 0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density

(20)

Acid Soluble Iron Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density

(21)

QQ Plot for Acid Soluble Iron 0.2 0.34 0.48 0.62 0.76 0.9 1.04 1.18 1.32 1.46 1.6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Raw Data Robust Data Normal Q

(22)

In summary:

from the Huber H15 Process we now have:



An Assigned Value X

_a

(robust measure of location).



A “fit for purpose” σ

_rob

standard deviation (robust

measure of dispersion).



An estimate of uncertainty in the assigned value U

_a

.

(23)

Sulfur Analysis in 150611

QQ Plots Reveal A Problem

(24)

4.7 5.29 5.88 6.47 7.06 7.65 8.24 8.83 9.42 10.01 10.6 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q

Elemental Sulfur (5%) QQ Plot

Where’s

The

Sweet

Spot ??

(25)

(26)

(27)

7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q

Total Sulfur (10%) QQ Plot

(28)

Imagine if the discrepancy was not so obvious.

Not Statistically Discernable!

It is vitally important for Clients to

submit Data

for the

CORRECT

Analyte

(29)

Reporting Data Below the LOD

A Word About Detection Limits

(30)

Detection Limits

“Definitions are not standardized!”

Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )

Establishes the “Noise” of the instrument or method. Let’s call S_BLANK = “Noise” -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

(31)

LOD Limit Of Detection 3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )

Establishes the “Noise” of the instrument or method. Let’s call S_BLANK = “Noise”

Above the “Noise” but still 50% chance of a false negative.

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Detection Limits

(32)

LOD Limit Of Detection 3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )

Reporting Limit, 6 x Noise

Protects against false negatives.

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Detection Limits

(33)

LOQ Limit Of Quantitation

10 x Noise

LOD Limit Of Detection

3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )

Protects against false negatives. Safe limit for reporting reliable quantities. -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Detection Limits

(34)

LOQ Limit Of Quantitation

10 x Noise

LOD Limit Of Detection

3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11

Repeated measurement of values in here can produce a usable

estimate.

This is similar to signal averaging. CYA, only useful in litigation.

Detection Limits

“Definitions are not standardized!”

If you are not comfortable with your result do not report 0 - report nothing! ----Got it!

(35)