Adjoint method - Finite differences - Derivatives of complex models

Derivatives of complex models

2.2 Finite differences

2.3.1 Adjoint method

An outline of the adjoint method is given here following the format of Hasco¨et et al. (2005). We have a model, η(·), which takes inputs, x, and returns outputs, y. There are p total inputs to the model and r total outputs. If we require the partial derivatives of all the outputs of the model with respect to all the inputs, then the Jacobian matrix is required:

Jacobian =

The model, η(·), is complex and therefore likely to be composed of multiple subroutines. The subroutines themselves are then generally made up of smaller routines and eventually we are left with elementary functions on individual lines of code, which, assuming they are differentiable, can be differentiated by applying the chain rule. We denote these elementary functions by f₁, . . . , f_K where f₁ is the first function executed when we run η(·) and f_K is the last. We can therefore express the model η(·) in the following way:

η(x) = f_K◦ f_K−1◦ . . . ◦ f₁(x), (2.9) and so we have K total elementary functions which make up the model. Function f₁will act on, and therefore likely effect, the input variables, x and so we denote z_k to be the state vector: the vector of all variables values after the first k functions have been executed. In this way, we set x = z0, the x vector remains fixed and we have zk = fk(zk−1). We use this notation for the description of the method

only, in practise z_k is not stored but overwritten by z_k+1; this is important in particular for the reverse mode which we discuss later in this section.

To differentiate the whole model with respect to x and obtain η⁰(x), the chain rule can then be applied to (2.9) as follows:

η⁰(x) = f_K⁰ (zK−1) . f_K−1⁰ (zK−2) . . . f₁⁰(z0 = x). (2.10) Each f_k⁰(z_k−1) is an intermediate Jacobian matrix of partial derivatives. The resulting, final, Jacobian matrix, η⁰(x), is clearly very large as it contains all the derivatives of all the outputs, with respect to all the inputs. Often however, a model user may only be interested in the derivatives of a subset of outputs with respect to the most active variables; in this case the model can be differentiated so that only specific elements of the Jacobian matrix are returned and this results in a more efficient differentiated model.

If we require the derivatives of all r outputs with respect to one of the inputs, i, i.e. the i^th column of the full Jacobian matrix (2.8), we can apply the tangent mode:

∂y

∂x⁽ⁱ⁾ = η⁰(x = z₀) ∂x

∂x⁽ⁱ⁾,

= f_K⁰ (z_K−1) . f_K−1⁰ (z_K−2) . . . f₁⁰(z₀) ∂x

∂x⁽ⁱ⁾ (2.11) Clearly multiplying matrices by vectors is computationally cheaper than multi-plying matrices by matrices and so equation (2.11) must be computed from right to left. This is straightforward as the state of the model after the first func-tion, f₁, is required before the state of the model after the second function, f₂, etc. This is known as the tangent linear model and can be applied alongside the running of the standard calculations in η(·). Due to this, the tangent mode is also known as the forward mode. If we require the derivatives of all r outputs with respect to multiple inputs, the tangent mode can be applied in multiple directions and hence is called the tangent multidirectional mode. This can be done in one source transformation of the original code. The computational cost of the tangent mode is therefore proportional to the number of inputs we require derivatives with respect to.

An alternative method to the tangent linear model is the adjoint method.

An adjoint model is defined as the transpose of the Jacobian matrix, (Marotzke et al., 1999). If we require the derivatives of one output, j, with respect to all

p inputs, i.e the j^th row of the full Jacobian matrix (2.8), then we can use the adjoint method and transpose of η⁰(x):

∂y^(j)

∂x = (η⁰(x = z₀))^T ∂y

∂y^(j),

= (f₁⁰(z₀))^T(f₂⁰(z₁))^T . . . (f_K⁰ (z_K−1))^T ∂y

∂y^(j). (2.12) As with the tangent mode, computing (2.12) from right to left is much more efficient than left to right but now the state of the model after the second function, f₂ is required before the state of the model after the first function, f₁. Hence this method is called the reverse mode. We have used the notation z_k= f_k(z_k−1) but, as discussed earlier in this section, in the model itself, executing f_k will cause z_k to overwrite z_k−1. Now to calculate the derivatives, as in (2.12), the first step is to calculate (f_K⁰ (z_K−1))^T and to do this we require the state vector after K − 1 functions have been executed: z_K−1. To generate this we therefore need to execute z_k−1 = f_K−1◦ . . . ◦ f₁(z₀). We would expect that the function output, y will be required in addition to the derivatives so there is no additional computational expense executing these functions here. The next step in the calculation of (2.12) requires z_K−2and although this has already been calculated, it has since been overwritten by z_K−1. Therefore either the standard f functions must be computed again, up to f_K−2, or when z_K−2was first calculated it could be stored such that it can be recalled when required. This can then be repeated for each z_k in k ∈ {1, . . . , K}. If the latter option is chosen then the computational time is less but memory and storage requirements are greater and can in some cases cause problems. Regardless though of whether the forward or reverse mode is adopted, the computing resource required to run such a model is greater than its corresponding simulator which is the standard version of the model.

Both modes can be applied through the use of automatic differentiation assum-ing that the model, described by a computer program, is written in a high-level programming language such as Fortran. For a comprehensive account of auto-matic differentiation see Griewank (2003). Both the tangent and reverse mode calculate partial derivatives with respect to model inputs and so throughout the remainder of this document, an adjoint model will simply refer to a differentiated model. When such an adjoint model is run, the partial derivatives with respect to the model inputs in addition to the standard model output are produced.

In document Using derivative information in the statistical analysis of computer models (Page 31-34)