WillettK ISTIBenchmarking EdinUni OCT2014

(1)

The International Surface Temperature

Initiative and Benchmarking for

Homogenisation Algorithms

Uni. Edinburgh, October 2014

Dr Kate Willett

(Met Office Hadley Centre, UK)

(2)

Talk Outline

•

The ISTI – facilitating robust climate

analysis

•

Our inhomogeneous observing system

•

Benchmarking Basics

•

Creating a 'clean' synthetic world

•

Creating a set of dirty/error filled worlds

•

Assessing homogenisation algorithm skill

against the benchmarks

•

Where are we now...

(3)

The ISTI

Facilitation of

Robust Climate

Analysis

(4)

The real world observing

system is not perfect …

(5)

Its more like these …

Many more examples on www.surfacestations.org

(6)

Inhomogeneities: annual mean minimum

temperature at Reno, Nevada, USA

(7)

Underlying these are two absolutely

fundamental issues …

•

A lack of adequate documentation

(

metadata

) of the ubiquitous changes

(station location, shelter, observing

time etc.)

•

A lack of

traceability

to known

standards (metrology) and to original

hard copy data sources for most

historical records

(8)

No doubt that it is warming – the rate and temporal

/ spatial details are the issue

(NCDC Graphics Team (above), John Kennedy, Met Office Hadley Centre (right))

(9)

The ISTI

•

A framework for creation of multiple robust

independent estimates of land surface temperature

to answer scientific questions and societal demands

of the 21

st

Century

–

DATABANK: open, transparent processing, traceability to

standards and source

–

BENCHMARKING: Consistent performance evaluation and

methodological uncertainty estimates

–

USER TOOLS: fit for purpose, visualisation, intercomparison

(10)

Step 1: Data rescue and

provision

www.surfacetemperatures.org/databank

Lawrimore et al., 2013: Responding to the Need for Better Global Temperature

Data, EOS, 94 (6), 61–62 DOI: 10.1002/2013EO060002

Rennie et al., 2014: The International Surface

Temperature Initiative Global Land Surface Databank:

Monthly Temperature Data Version 1 Release Description and Methods. Geosciences Data Journal, DOI:

(11)

www.iedro.org

(12)

(13)

Benchmarking cycle

Willett, et al., 2014: A

framework for benchmarking of homogenisation algorithm performance on the global scale, Geoscientific

(14)

(15)

Our

Inhomogenous

Observing System

(16)

Effects of Changes that

are not of Climate Origin

STATION MOVE: EXPOSURE AND MICROCLIMATE = abrupt change in mean and diurnal extremes - may affect seasonal cycle extremes

SHELTER CHANGE: EXPOSURE = abrupt change in diurnal extremes - may affect seasonal cycle extremes

OBSERVING PRACTICE CHANGE: SAMPLING = abrupt change possible in mean and

extremes

INSTRUMENT CHANGE: CALIBRATION = abrupt change in mean and possibly extremes

LANDUSE CHANGE: EXPOSURE AND

(17)

Adjustment

Adjust mean

Adjust variance

By month? By season? By full homogeneous

period?

Use candidate – neighbour fields?

Use SNHT, PHA, MISHMASH, SPLIDHOM etc?

(18)

Confidence in

Adjustments Made?

Type I Error

Do not detect or adjust when there has been a

changepoint

Type II Error

Detect and adjust when no actual changepoint

occurred

Missed adjustments vs false alarms: which is

worse?

What about adjustments in the wrong direction?

(19)

Benchmarking

Basics

(20)

'TRUTH' UNKNOWN

XTRUTH

_t,l

=

S

_t,l

+

L

_t,l

+

V

_t,l

+

M

_t,l

XTRUTH = a climate element at time t and location l

S = seasonal cycle L = long-term trends

V = variability (ENSO,

NAO, Volcanoes, Solar Cycles...)

M = microclimate (topography, proximity to coast, prevailing wind, local environment...)

XOB

_t,l

= XTRUTH

_t,l

+

ɛ

_t,l_t,l

+

λ

_t,l

XOB = observation at time t, location l and height h

ε

ε = random error at time/place/height = random error at time/place/height (recording error, instrument error etc.)

(recording error, instrument error etc.)

λ = systematic error at time/place/height *possibly correlated* (station move, exposure change, instrument change,

(21)

Benchmarking Cycle

Example use of benchmark data for USHCN

Create c.10 analog-error-worlds

– Simulate 'clean' spatio-temporal characteristics of actual stations underpinned by low frequency variability from a climate model to maintain

plausible spatial correlation

– Add abrupt and gradual changepoints to approximate our best guess real world error structures

– Run homogenisation algorithms on the test data and assess ability to recover original 'clean' data

(22)

Creating a 'Clean'

Synthetic World

(23)

Team Creation:

How To...

●

Real Station climatology = seasonal cycle (S)

●

Real Station standard deviation = variability (V)

and microclimate (M)

●

GCM interpolated to station location = long-term

trend (L) and variability (V)

●

Vector Autoregressive (VAR) model = variability

(V) and microclimate (M)

IMPORTANT:

●

Retain serial correlation at at least lag 1

●

Retain cross-correlations with neighbours

(24)

Station Cross-Correlation

(25)

The Neighbour Disconnect Method

Station 1 t-1

Station 2 t-1

Station 1 t=0

Station 2 t=0

(26)

Why use a GCM?

●

Global, spatially correlated trend and variability fields

●

Ability to play with different background trends (no

warming vs burn everything)

●

Ability to use other variables from the GCM to apply

errors (e.g., humidity, wind speed, radiation etc.)

●

Consistency across other variables should we want to

(27)

Voila!

● Vector Autoregression to

simulate standardised anomaly time series for networks

● Multiply by real station

standard deviations

● Add interpolated

smoothed GCM trend

● Add back real station

(28)

Inhomogeneities: annual mean minimum

temperature at Reno, Nevada, USA

(29)

Creating a set of

Error-filled Worlds

(30)

Effects of Changes that

are not of Climate Origin

STATION MOVE: EXPOSURE AND MICROCLIMATE = abrupt change in mean and diurnal extremes - may affect seasonal cycle extremes

SHELTER CHANGE: EXPOSURE = abrupt change in diurnal extremes - may affect seasonal cycle extremes

OBSERVING PRACTICE CHANGE: SAMPLING = abrupt change possible in mean and

extremes

INSTRUMENT CHANGE: CALIBRATION = abrupt change in mean and possibly extremes

LANDUSE CHANGE: EXPOSURE AND

(31)

Team Corruption

XERRORWORLD

_t,l

=

XTRUTH

_t,l

+

λERRORWORLD

_t,l

Example error models applied to stations SURFACE

TEMPERATURE DATABANK

World 1: no breaks

World 2: few large breaks

World 3: many small breaks

World 4: abrupt and gradual breaks

World 5: many small complex breaks

(32)

Adjustment

Adjust mean

Adjust variance

By month? By season? By other variables e.g,

humidity, wind, solar radiation?

Spatially clustered breaks

Seasonal cycles/features

(33)

Assessing

Homogenisation

Algorithm Skill

Against the

Benchmarks

(34)

Levels of Assessment

Level 1: Return of original climate features e.g., trends, climatology, variability (UNC)

Level 2: Ability to detect changepoints and characterise them (ALG)

Level 3: detailed assessment of strengths and weaknesses against specific types of inhomogeneities (ALG)

Level 4: Assess performance on benchmarks alongside results from real world homogenisation (how realistic are the

benchmarks?) (BEN) RETURN:

Homogenised stations

(35)

Team Validation

BETTER

False alarm rate

H it r a te WORSE 2 3 4 1 5

67 8

Trend recovery percentage

Contingency Table

(36)

Other Benchmarks

(37)

Daily Benchmarks for the USA using a GAM

(38)

Monthly T and Precip benchmarks for small

networks

COST HOME

(39)

Monthly USHCN Tmax, Tmin and Tmean

benchmarks

(40)

Come and play, eventually

www.surfacetemperatures.org

~30000 stations, 10 blind worlds, 4 open worlds, monthly mean T

[email protected]

(41)

Methods

A_t = Φ₁ A_t−1 + Z_t

A = matrix of 1 to J stations for all times 1 to T Φ₁_{= autoregressive parameter matrix}

Zt = matrix of residuals (random shocks)

Φ₁ = Γ₍₁₎Γ₍₀₎-1

Γ₍₀₎= covariance matrix of all stations Γ₍₁₎= covariance matrix of all stations at lag 1

Corr(x,y) = a * exp((-bd_hor) + (-cd_vert))

Cross-Correlation: a=0.989, b=0.0009, c=0.2505 Cross-Correlation at lag 1: a=0.3, b=0.0015, c=0.8579

Z_t = A_t_{− Φ}₁A_t-1

Σ_Z= Γ₍₀₎– Φ₁Γ₍₁₎T

(42)

VAR Parameters

A_t = Φ1 A_t−1 + Z_t

A = matrix of 1 to J stations for all times 1 to T Φ₁_{= autoregressive parameter matrix}

Zt = matrix of error terms/shocks (residuals)

vec(Φ1) = (I ⊗ Γ(0))-1 vec(Γ(1))

Γ₍₀₎= covariance matrix of all stations

Γ₍₁₎= covariance matrix of all stations at lag 1

For simulation, use exponential decay function to proxy correlation by distance (get fit from real data)

e.g. f<-function(x,a,b){a*exp(-b*x)} Cross-Correlation: a=0.9162, b=0.0005

Cross-Correlation at lag 1: a=0.4162, b=0.0005

HELP: Fit for lag 1 correlation not strong. Is this sensible?

(43)

Obtaining realistic

spatially correlated shocks

Zt = At − Φ1 At-1

Shock terms need to follow the observed distribution and be spatially correlated

● Create global networks and neighbourhoods across the globe ● Create covariance for each network+neighbourhood using

distance as a proxy

● Run a factorisation algorithm to create global fields of

spatially correlated shocks that are MVN

(44)

Obtaining realistic

spatially correlated shocks

Zt = At − Φ1 At-1

Shock terms need to follow the observed distribution and be spatially correlated

● Create global networks and neighbourhoods across the globe ● Create covariance for each network+neighbourhood using

distance as a proxy

● Run a factorisation algorithm to create global fields of

spatially correlated shocks that are MVN

(45)

The Random Shock Part:

Spatially Correlated

(46)

The Random Shock Part:

Spatially Correlated

(47)

The VAR Part:

Spatio-temporally Correlated

Station 1 t-1

Station 2 t-1

Station 1 t=0

(48)

Results: Station AR(1)

correlation

Station

Autocorrelation at lag 1:

(49)

Results: Station to

Neighbours cross-correlation

Station Cross correlation with 40 nearest neighbours:

CLEAN too high compared to REAL world

Real world contaminated by random error, inhomogeneity, missing data.

(50)

Results

Mean SD and AR(1) Corr of

(51)

Remaining Issues

St Dev of Difference series too low: Compare with USCRN

which is clean but very short

AR(1) of station is too high: adjust distance function

Cross-correlations possibly too high: adjust distance

function (maybe vary with latitude?)

Have not used all 30000+ stations: figure out how

Errors

www.surfacetemperatures.org

WillettK ISTIBenchmarking EdinUni OCT2014

The International Surface Temperature

Dr Kate Willett

Talk Outline

Creating a set of dirty/error filled worlds

Facilitation of

fundamental issues …

/ spatial details are the issue

standards and source

www.iedro.org

LANDUSE CHANGE: EXPOSURE AND

Adjust mean

By month? By season? By full homogeneous

Type I Error

Missed adjustments vs false alarms: which is

XTRUTH = a climate element at time t and location l

= XTRUTH

XOB = observation at time t, location l and height h

Benchmarking Cycle

Creating a 'Clean'

IMPORTANT:

Why use a GCM?

Voila!

simulate standardised anomaly time series for networks

Inhomogeneities: annual mean minimum

LANDUSE CHANGE: EXPOSURE AND

World 1: no breaks

Adjustment

Adjust mean

By month? By season? By other variables e.g,

Level 1: Return of original climate features e.g., trends, climatology, variability (UNC)

Homogenised stations

Team Validation

BETTER

Other Benchmarks

benchmarks

Corr(x,y) = a * exp((-b*d_hor) + (-c*d_vert))

VAR Parameters

vec(Φ1) = (I ⊗ Γ(0))-1 vec(Γ(1))

Obtaining realistic

Shock terms need to follow the observed distribution and be spatially correlated

Obtaining realistic

Shock terms need to follow the observed distribution and be spatially correlated

The Random Shock Part:

Results: Station AR(1)

Results

function (maybe vary with latitude?)

Corr(x,y) = a * exp((-bd_hor) + (-cd_vert))