• No results found

Efficient Curve Fitting Techniques

N/A
N/A
Protected

Academic year: 2021

Share "Efficient Curve Fitting Techniques"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Life Conference and Exhibition 2011 Life Conference and Exhibition 2011 Stuart Carroll, Christopher Hursey

Efficient Curve Fitting

© 2010 The Actuarial Profession www.actuaries.org.uk

Efficient Curve Fitting Techniques

20-22 November

Agenda

• Background

• Outline of the problem and issues to consider

• The solution

• Theoretical justification

• Further considerations

• Practical issues

• Outcome

• Questions or comments

1

(2)

We wished to use an internal model running in excess of one million scenarios

scenarios

Many different products And many different risk factors

It was not feasible to evaluate liabilities accurately in every scenario

Therefore the decision was taken to formula fit liability values Use polynomial approximation formulae

Fitted to a limited number of evaluation points

We need to capture all material risk dependenciesWe need to capture all material risk dependencies

Each risk factor leads to a marginal risk function in one variable Non-linearity between risk factors leads to non linearity functions in

two or more variables

Marginal risk functions and non linearity functions are summed to give the final approximation formula.

2

Outline of the problem and issues to consider

Requirements

• Repeatable process that is objective

• Measurable performance

• Risk management exercise, not just compliance – We wish to model the full distribution

• Sufficient number of sample points for accuracy – Implies more sample points

• Reasonable run times

• Reasonable run times

– Implies fewer sample points

• Need to resolve the tension between the previous two points.

(3)

Outline of the problem and issues to consider

Testing and Validation

• Measures of goodness of fit – Least squares

– Maximum absolute error – Error dependency

– Error Bias

• Out of sample testing

– How many to be statistically significant

• Analysis of change

• Understanding the results.

4

Outline of the problem and issues to consider

Other considerations

• Determining the limits of the model

• Range for each risk factor

– Allow for movements in roll-forward – Four standard deviations?

• What about plan B?

– Must distinguish between analysis phase and

d ti h

production phase

– What if goodness of fit fails once in production?

5

(4)

Lapses Lapses

A simple example

• Consider the variation in liabilit al e ith respect

PV LiabilitiesPV Liabilities

liability value with respect to a single risk factor

• The least squares

quadratic is fitted using all evaluation points

• Initial fit is reasonable but uses too many calculations

-80 -60 -40 -20 0 20 40 60 80

Percent Actual Results

-80 -60 -40 -20 0 20 40 60 80

Percent

Actual Results Approxim ation Results

uses too many calculations to be feasible for the

production phase

• How do we reduce the number of fitting points?

6

The Solution

The Error Curve

• The error curve is cubic Approximation Errors - All points 8

• Three roots at three points of intersection

• Those three points define a unique quadratic function

• Fit to the three points of intersection only

-8 -6 -4 -2 0 2 4 6

-80% -60% -40% -20% 0% 20% 40% 60% 80%

Percent

Error (£m)

Approximation Errors - Three points

6 8

intersection only

same approximation function same error curve

• No loss in performance for greatly reduced effort.

-8 -6 -4 -2 0 2 4

-80% -60% -40% -20% 0% 20% 40% 60% 80%

Percent

Error (£m)

(5)

The Solution

Implications

• In general, we wish to approximate our unknown function by an (n-1)-order polynomial

• The least squares (n-1)-order polynomial approximation will intersect the unknown function n times

• If those n points of intersection can be determined, they provide the optimal fitting points, or “nodes” for our approximation functionpp

• We can achieve as good a fit using n nodes as using an infinite number of nodes

• Similarly, we can determine the points of maximum error

Allows us to estimate maximum error

more powerful than estimating maximum sample error. 8

The Solution

More points is not necessarily better

• What happens when we add points to try and improve the fit?

• The error curve changes

Maximum error has increased

Sum squared error is worse

Approximation Errors - Three points

-4 -2 0 2 4 6 8

-80% -60% -40% -20% 0% 20% 40% 60% 80%

Error (£m)

Approximation Errors - Five points

-4 -2 0 2 4 6 8

-80% -60% -40% -20% 0% 20% 40% 60% 80%

Error (£m)

• Adding more points has led to a deterioration in the fit

Why?

9

-8 -6

Percent -8 -6

Percent

(6)

Rationale

• This is not a regression problem but an approximation problem

problem

• For a given order of polynomial approximation function there exists a unique optimal error curve, i.e. one that deviates least from zero in the least squares sense

• Identifying solutions to the optimum error curve allows us to use the absolute minimum number of nodes that are required to uniquely define the optimum approximation required to uniquely define the optimum approximation function

• Adding further nodes shifts the error curve and, by definition, results in a sub-optimal approximation.

10

The Solution

Determining the fitting points

• We introduce a single assumption: our unknown function can be accurately represented by an n-order polynomial

• Powers of (n+1) and above are vanishingly small

• Under this assumption the n points of intersection depend only on the range, or “domain”, over which the least squares best fit is performed

• Similarly the turning points or points of maximum error

• Similarly, the turning points, or points of maximum error, also depend only on the domain

• Usefully, all our points of interest are now independent of the unknown function and can be determined analytically.

(7)

We wish to approximate a liability function with a quadratic (n=3)

The Solution

An example

Quadratic fit to cubic

25000

function with a quadratic (n 3)

Assuming powers of four and above are immaterial, we are attempting to

approximate an unknown cubic

the least squares quadratic best fit to any cubic, intersects that cubic at three points

0 5000 10000 15000 20000

-80 -60 -40 -20 0 20 40 60 80

)

(L L (L L) 3(L L)

Errors

1000 1500 2000

12 2

) ( 2 1 1

L P L

2 ) ( 5 3 2

)

( 2 1 2 1

3 , 2

L L L

P L

where L1and L1represent the upper and lower limits of the domain.

and has turning points

2 ) ( 5 1 2

)

( 2 1 2 1

2 , 1

L L L

M L

-2000

-1500 -1000 -500 0 500

-80 -60 -40 -20 0 20 40 60 80

The Solution

More examples

Quadratic least squares fit using 1,200 points and the resulting error curves

Quadratic fit to cubic 12000

Quadratic fit to cubic 15000

Quadratic fit to cubic 30000

-2000 0 2000 4000 6000 8000 10000

-80 -60 -40 -20 0 20 40 60 80

-10000 -5000 0 5000 10000

-80 -60 -40 -20 0 20 40 60 80

0 5000 10000 15000 20000 25000

-80 -60 -40 -20 0 20 40 60 80

Errors

1500 2000

Errors

1500 2000

Errors

1500 2000

13

Points of intersection do not depend on the function.

-2000 -1500 -1000 -500 0 500 1000

-80 -60 -40 -20 0 20 40 60 80

-2000 -1500 -1000 -500 0 500 1000

-80 -60 -40 -20 0 20 40 60 80

-2000 -1500 -1000 -500 0 500 1000

-80 -60 -40 -20 0 20 40 60 80

(8)

Implications

• We can make an assumption about the order of polynomial f

that accurately represents the unknown function

• Given this assumption, the nodes that give optimum least squares fit can be determined without any prior analysis or knowledge of the unknown function

• Goodness of fit tests will fail for one of two reasons – a good enough fit is not possible ora good enough fit is not possible, or

– the initial assumption is invalid

• In either case. simply revise the assumption and try again

• As long as the initial assumption holds, once identified, the fitting nodes and points of maximum error remain fixed.

14

Theoretical Justification

Weierstrauss Approximation Theorem - Any continuous function on a closed and bounded interval can be approximated on that interval by polynomials to any degree of accuracy

The optimum nodes can be shown to correspond to the roots of the Legendre polynomials for all n

Properties of Legendre polynomials Orthogonal

Series convergence on [a, b]

p y y g y

Minimum deviation from zero

We effectively use a truncated Legendre series

) ( )

( )

(

1

0

x L a x L a x

f n n

n

k k k

(9)

Further Considerations

Non Linearity

The same theory can be extended and applied to non

Lapses

120 140

Interest Rates

180 200

extended and applied to non linearity functions in two or more variables

Construct a combined risk surface by adding marginal risk functions

Compare with actual combined risk surface to evaluate non li it

0 20 40 60 80 100

-50%

-40.00% -30.00%

-20.00% -10.00%

0.00% 10.00%

20.00

% 30.00%

40.00

% 50.00%

0 20 40 60 80 100 120 140 160

-3% -2.400%

-1.800% -1.200%

-0.600% 0.000%

0.600%

1.200% 1.800%

2.400% 3.000%

150 200 250 Combined Risk Surface from Marginals

150 200 250 Actual Combined Risk Surface

linearity

If there is no non linearity between two risk factors then there should be no cross terms in the approximation formula.

16

-50%

-25.00%

0.00%

25.00%

50.00%

-3%-1.500%0.000%1.500%

3.000%

0 50 100

-50%

-25.00%

0.00%

25.00%

50.00%

-3% -1.500%0.000%1.500%3.000%

0 50 100

Further Considerations

Non Linearity

Non-linearity is the difference between the combined impact of

60 Non linearity surface

between the combined impact of two or more risk factor and the sum of those same risk factors.

Using a combination of terms in xy, x2y, xy2and x2y2, we can construct a two factor 2ndorder polynomial approximation

- 3 -2.1 -1.2 -0.3 0.6 1.5 2.4

-40 - 30 - 20 -10 0 10 20 30 40 50

Approxim ation

17

Least squares fit found from 14,641 nodes.

-50%-35.00%-20.00% -5.00% 10.00% 25.00% 40.00% -3%-1.800%-0.600%0.600%1.800%3.000%

-30 -20 -10 0 10 20 30

-50%-35.00%-20.00% -5.00% 10.00% 25.00% 40.00% -3%-1.800%-0.600%0.600%1.800%3.000%

-8 -6 -4 -2 0 2 4 6 8

-50%-35.00%-20.00% -5.00% 10.00% 25.00% 40.00% -3%-1.800%-0.600%0.600%1.800%3.000%

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

-3 -2.1 -1.2 -0.3 0.6 1.5 2.4

-40 -30 -20 - 10 0 10 20 30 40 50 60

(10)

Error Surface Error Surface

Non Linearity

Optimal fitting points are given by the intersection of the non

20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60

20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55

by the intersection of the non- 60

linearity surface and the least squares non linearity

approximation function

No unique solution

It can be shown that the intersection of the one factor

l ti id l ti t

-3.00 -2.75 -2.50 -2.25 -2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 -60 -55 -50 -45 -40 -35 -30 -25 -20

-3.00 -2.75 -2.50 -2.25 -2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 -60 -55 -50 -45 -40 -35 -30 -25

solutions provide one solution to -20

the two factor problem

Same approximation function results from fitting to four nodes as from fitting to 14,641 nodes.

18

Further Considerations

Non Linearity

• Can show that the roots to the Legendre polynomials provide optimum nodes single factor polynomials of any order

• Can also show that the intersection of the single factor solutions provide optimum nodes to the following:

– Two factor 2ndorder polynomials – Three factor 2Three factor 2ndorder polynomialsorder polynomials – Two factor 3rdorder polynomials

• Attempts at a general proof for multifactor polynomials have so far been unsuccessful.

(11)

Further Considerations

Constrained solutions

Errors 1500

Consider a quadratic least squares fit over a non-symmetrical domain

Quadratic fit to cubic

25000

-1500 -1000 -500 0 500 1000

-60 -50 -40 -30 -20 -10 0 10 20 30 40

Resulting error at zero may be undesirable

Can solve directly with constraint that error at zero is nil

0 5000 10000 15000 20000

-60 -50 -40 -30 -20 -10 0 10 20 30 40

Quadratic fit to cubic Errors

20

Resulting fit is optimal given the constraint.

Quad at c t to cub c

0 5000 10000 15000 20000 25000

-60 -50 -40 -30 -20 -10 0 10 20 30 40

Errors

-1500 -1000 -500 0 500 1000 1500

-60 -50 -40 -30 -20 -10 0 10 20 30 40

Quadratic fit to cubic

15000 20000 25000

Errors

500 1000 1500 2000 Quadratic fit to cubic

15000 20000 25000

Errors

500 1000 1500 2000 Quadratic fit to cubic

15000 20000 25000

Errors

500 1000 1500 2000 Quadratic fit to cubic

15000 20000 25000

Errors

500 1000 1500 2000

Further Considerations

What about Plan B?

• Minimise the maximum C

0 5000 10000

-80 -60 -40 -20 0 20 40 60 80 -2000

-1500 -1000 -500 0

-80 -60 -40 -20 0 20 40 60 80

0 5000 10000

-80 -60 -40 -20 0 20 40 60 80 -2000

-1500 -1000 -500 0

-80 -60 -40 -20 0 20 40 60 80

0 5000 10000

-80 -60 -40 -20 0 20 40 60 80 -2000

-1500 -1000 -500 0

-80 -60 -40 -20 0 20 40 60 80

0 5000 10000

-80 -60 -40 -20 0 20 40 60 80 -2000

-1500 -1000 -500 0

-80 -60 -40 -20 0 20 40 60 80

error using Chebyshev nodes

• Adjust the domain to use nested and coincident nodes

• Use a Dampening function

Same domain

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Same domain - no coincedent nodes

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Different domains - 2 nodes coincident

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

3rd Order Legendre 4th Order Legendre

p g

Error curve is next term in Legendre series

Coefficient is the error at the extreme.

21

3rd Order Legendre

3rd Order Legendre 4th Order Legendre

(12)

• Stochastic values

Approximation error must be minimised

Take the average of two results close to, and equidistant from, each node

Similarly for each test point

• Error accumulation

The errors in marginal risk functions combine to form an error f

surface

These errors may accumulate to exceed materiality limits Lower materiality limits in the marginal risk functions

• Fit to the true non linearity surface

Adjust for marginal errors in non linearity before fitting.

22

Outcome

• Efficiency maximised

Fit exactly n points for n formula coefficients

• Process is repeatable and objective

Based on theories widely accepted and used in engineering, physics and animation

• Less reliance on samples for performance measurement

Maximum error is targeted and measuredg

Model limitations can be determined with some confidence

• Better understanding of results

Fitting errors are predictable and explainable

• Greater confidence in the internal model results.

(13)

Questions or comments?

Expressions of individual views by members of The Actuarial Profession and its staff are encouraged.

The views expressed in this presentation are those of the presenter.

© 2010 The Actuarial Profession www.actuaries.org.uk 24

References

Related documents

Temeljni je dio rada prikaz funkcioniranja grupa solidarne razmjene u Hrvatskoj i usporedba socio-ekonomskih obilježja članova grupa i poljoprivrednika koji ih

Online community: A group of people using social media tools and sites on the Internet OpenID: Is a single sign-on system that allows Internet users to log on to many different.

 $he am!litude and linearity o" the transducer out!ut sinal are usually not "irst% e usually not "irst% order concerns in transducer desin& but t.. order concerns

Subexponential tails of discounted aggregate claims in a time- dependent renewal risk model.. The subexponential product convolution of two

Activation of TrkB receptors by brain-derived neurotrophic factor (BDNF) followed by MAPK/ERK signaling increases dendritic spine density and the proportion of mature spines

1. The first step called pre-processing and runs different tasks to enhance the input fingerprint image. The next step deals with the extraction of minutiae. In the third step

Previous studies have considered only medical or direct expenditure while calculating the out-of-pocket expenditure on maternity care even though the indirect or non-medical costs

The purpose of this study was to evaluate the diagnostic utility of real-time elastography (RTE) in differentiat- ing between reactive and metastatic cervical lymph nodes (LN)