Magruder Statistics &
Data Analysis
Based Closely On:
“The International Harmonized Protocol for the Proficiency Testing of Analytical
Laboratories”, 2006 (IHP), MICHAEL THOMPSON, STEPHEN L. R. ELLISON AND ROGER WOOD
AMC supported (Analytical Methods Committee of the RSC)
Uses ISO statistical models - ISO 13528, 2005 and ISO 5725-2, 1994 Robust statistics used as described in the IHP and ISO 13528
Duplicate analysis supports method precision calculations. Proficiency testing often required for Laboratory Accreditation. Independent documentation on how it all works. IHP is free! Makes full use of Web based data transfer.
Magruder Proficiency Testing Reports Overview
True Proficiency Testing
Analyte reports and report cards using your Method of choice.
Support for Guarantees.
Support for IA’s.
Individual Method Proficiency Testing
Method reports and report cards.
Method Precision Data
Duplicates allow calculation of Repeatability and Reproducibility.
Magruder Check Sample Program
Robust Statistics
The International Harmonized Protocol For The Proficiency Testing Of Analytical Chemistry Laboratories, 2006
ISO 13528 Statistical Methods for Use in Proficiency Testing by Interlaboratory Comparisons, 2005 – Algorithm A
Why Robust Statistics?
Most “real world” data distributions do not follow the Normal Gaussian
Model, they are more like “contaminated” Normals.
Distributions have “Fat Tails” and Outliers that skew the Mean and
inflate the Standard Deviation (Normal estimators are very sensitive!).
Even Outliers contain information. We need to weight it properly.
We need a Robust estimate of Location for the data center.
We need a Robust estimate of the data Dispersion.
We need to identify and weight the “Reliable” data.
John Tukey, Peter Huber and Frank Hampel credited with founding the discipline. All since Tukey’s landmark paper in 1960
But First: We must remove
The Pathological Data!
avg + 2 sd
- 2 sd
Raw
Needs fair representation Unwarranted influence
Robust Statistics
avg + 2 sd - 2 sd avg + 2 sd - 2 sd
Raw
Robust
Robust Statistics
avg + 2 sd
- 2 sd
Robust
Z ! Z Z Z Z
Robust Statistics
Per: Frank Sikora, 2015
-8 -6 -4 -2 0 2 4 6 8 SD
Contaminated Normal
Observed Distribution Reliable Data ContaminationFat Tails
Calculating Robust Statistics
We use Peter Huber’s H15 method and Winsorize the Data.
Sequentially brings the outer data in towards the Median.
Down weights Outliers and Fat tails.
Draws the Data towards a reliable standard Normal.
Iterate this process until the mean converges.
The new Mean Xa = Robust estimate of Location for the data center (Assigned Value).
The new Standard Deviation = σrob as a fit-for-purpose Robust estimate of the data Dispersion.
Uncertainty in
X
aU
n
a rob
2
Data (Red) on Kernel Density Envelope. Normal Curve (Grey) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash
Tools I use
Kernel Density Plot
Let’s call it a “more precise” Histogram
n i ih
X
X
nh
h
X
f
11
)
,
(
Φ is the Standard Normal density function
Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash
Tools I use
Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 0 0.5 1 1.5 2 2.5 3 3.5 4 4.6 5.35 6.1 Winsorized Data Data Robust Normal Normal Kernel Density Soluble Potash
Tools I use
QQ Plot for Soluble Potash 4 4.3 4.6 4.9 5.2 5.5 5.8 6.1 6.4 6.7 7 -5 -4 -3 -2 -1 0 1 2 3 4 5 Raw Data Robust Data Normal Q Compare:
Raw Data (Green) Robust Data (Red)
Normal Quantile (Blue)
Tools I use
The “Sweet Spot”!
Where the curves overlap is “Reliable” Data
Normal Theoretical Quantiles or Rank Based Z Value
Da
ta Q
ua
ntil
es
Raw Data is ranked.
Robust Data is ranked.
Normal Quantiles Calculated.
All plotted against the Rank Based Z Value
Normal QQ Plot
The Blue Line:
Replace Y axis with
Normalized Data values. Xa + Z *
σ
rob0 + Z * 1
Random Normal Data Zero centered SD = 1
Data (Red) on Kernel Density Envelope.
Normal Curve (Grey)
Acid Soluble Iron
0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density
Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In 0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density
Acid Soluble Iron Data (Red) on Kernel Density Envelope. Normal Curve (Grey) Winsorizing Squeezes Some Data Points In Robust Normal Is Calculated 0 0.5 1 1.5 2 2.5 0.27 1.185 2.1 Winsorized Data Data Robust Normal Normal Kernel Density
QQ Plot for Acid Soluble Iron 0.2 0.34 0.48 0.62 0.76 0.9 1.04 1.18 1.32 1.46 1.6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Raw Data Robust Data Normal Q
In summary:
from the Huber H15 Process we now have:
An Assigned Value X
a(robust measure of location).
A “fit for purpose” σ
robstandard deviation (robust
measure of dispersion).
An estimate of uncertainty in the assigned value U
a.
Sulfur Analysis in 150611
QQ Plots Reveal A Problem
4.7 5.29 5.88 6.47 7.06 7.65 8.24 8.83 9.42 10.01 10.6 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q
Elemental Sulfur (5%) QQ Plot
Where’s
The
Sweet
Spot ??
4.9 5.27 5.64 6.01 6.38 6.75 7.12 7.49 7.86 8.23 8.6 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q
4.5 5.21 5.92 6.63 7.34 8.05 8.76 9.47 10.18 10.89 11.6 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q
7 7.5 8 8.5 9 9.5 10 10.5 11 11.5 12 -5 -3 -1 1 3 5 Raw Data Robust Data Normal Q
Total Sulfur (10%) QQ Plot
Imagine if the discrepancy was not so obvious.
Not Statistically Discernable!
It is vitally important for Clients to
submit Data
for the
CORRECT
Analyte
Reporting Data Below the LOD
A Word About Detection Limits
Detection Limits
“Definitions are not standardized!”
Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )
Establishes the “Noise” of the instrument or method. Let’s call SBLANK = “Noise” -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
LOD Limit Of Detection 3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )
Establishes the “Noise” of the instrument or method. Let’s call SBLANK = “Noise”
Above the “Noise” but still 50% chance of a false negative.
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Detection Limits
LOD Limit Of Detection 3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )
Reporting Limit, 6 x Noise
Establishes the “Noise” of the instrument or method. Let’s call SBLANK = “Noise”
Above the “Noise” but still 50% chance of a false negative.
Protects against false negatives.
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Detection Limits
LOQ Limit Of Quantitation
10 x Noise
LOD Limit Of Detection
3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )
Reporting Limit, 6 x Noise
Establishes the “Noise” of the instrument or method. Let’s call SBLANK = “Noise”
Above the “Noise” but still 50% chance of a false negative.
Protects against false negatives. Safe limit for reporting reliable quantities. -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Detection Limits
LOQ Limit Of Quantitation
10 x Noise
LOD Limit Of Detection
3 x Noise Blank set to 0 Unit s (S tand ar d Dev ia tio n of th e Bl ank )
Reporting Limit, 6 x Noise
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11
Repeated measurement of values in here can produce a usable
estimate.
This is similar to signal averaging. CYA, only useful in litigation.
Detection Limits
“Definitions are not standardized!”
If you are not comfortable with your result do not report 0 - report nothing! ----Got it!