The theory in Chapter 2 gives the basis for implementing an automatic Legendre transform. The Legendre transform is in itself relatively straightforward, and the major challenge lies in generating the gradients. The gradient of the Legendre transform is also needed in order to transform more than one variable set. Note that any gradient of the Legendre transform depend on the gradients of the untrans-formed function to the same order of differentiation. The gradient operations are easily achieved by the procedure described in the previous section.
It is certainly desirable to generate the result within the grammar described in Chapter 6, but this is unfortunately not possible because the gradient of the Legen-dre transform requires the inverse of a matrix. The grammar has no concept of a (matrix) inverse, and it would be impossible to introduce a general inversion pro-cedure into the grammar because of its multidimensional nature. The problem can, however, be avoided by generating the inverse outside the grammar, and then use the result in the context of RGrad. It must be stressed that this is a special operation provided in order to facilitate the gradient of the Legendre transform, and should
not be considered as a part of RGrad, or the grammar itself. The operation takes place in the background, and the inverse method will not be available to the end user. For practical purposes, the distinction is not visible in the implementation, but it requires a few hacks that are not welcome to the programmer.
The first and second order gradients do not fall into the same calculation scheme as the third and higher order gradients. This is because of certain simplifications, which indirectly affect the higher order gradients as well. The simplifications can potentially reduce the runtime of the final code by one order of magnitude. Since these simplifications are so important, and the number of expressions is quite low, all the first and second order gradients are always calculated, regardless of what the user actually needs. The first argument to the legendre method is the vari-able to transform, and because of the pre-calculation of the first and second order gradients, all variables that are used in the gradient calculations at a later stage must be given as arguments as well.
Even though the first and second order gradients are made ready in the background, they are not added to the graph, which means the code required to calculate these gradients is not generated unless the user explicitly asks for it. The transformed expression behave just like ordinary expressions, with one important exception, namely that it cannot be transformed with respect to the same variable again. In-verse Legendre transforms are therefore not possible. Transforming a Legendre function twice with the same variable, should give back the original function but this will not happen here. The reason for this is that the variable transformations internally are based upon the original variables only. The Legendre transform is re-quired to find the gradient of the function with respect to the transformed variable.
This gradient is multiplied with the variable. In this case, the gradient calculation will work, but the variable will not be the transformed variable, thereby generating erroneous results. Unfortunately it is not possible to prevent this from happening within the existing code, see Section 7.2.3
Rule 2.23: gσ σ σfrr= grσ σ grσ σfrr+ gσ σfrrr= 0 gσ σ σfrrfrr+ gσ σfrrr= 0 gσ σ σfrrfrr− frr-1frrr= 0
Combine and order: gσ σ σ = frr-1frrrfrr-1frr-1
The equations shown above comes from Appendix A, and show that the matrix inverse frr-1that comes from solving a system of linear equations. This system of
equations is lower triangular, and is therefore quite trivial to solve, but is impos-sible to calculate the gradient analytically because of the inverse. Implicit differ-entiation must therefore be used, and the system of equations must then be solved afterwards. This gives some extra challenges in creating an algorithm for pro-viding the gradients. The solution is to avoid solving the expression before it is exported. By doing so the inverse does not make problems in the differentiation.
It is, however, important to keep track of the expression, such that solving it after-wards does not pose a problem. The implicit expressions are not created because they have a simple structure and is therefore easy to solve. There are no intentions of implementing implicit expressions to RGrad at this point; therefore the expres-sions created here are treated with a special code made for the sole purpose of generating gradients of Legendre transforms.
By looking at the expressions derived in Chapter 2 and Appendix A, the expres-sions are seen to be divided into three different parts. The first part is the term which is multiplied by the solution of the implicit expression. This is either the number 1, in which case the solution is trivial, or the second order gradient. This is a symmetric matrix with the same size as the element used in the differentiation.
The rest of the expression is ordered by the sign of the elements, and divided into two parts. All terms with the same sign as the main term will, get a negative sign when the expression is solve, whilst the other terms will get a positive sign.
The three parts are each stored in arrays. The first array will at most include one object, but it is nevertheless given the same structure as the rest of the arrays. The terms involved in the expression are either a sum of different terms, or an inner product between two, and only two, elements. Since the expressions have such an easy structure, a simple procedure using sub-arrays is implemented to collect the terms of the expression. The objects on the first level of the array are addition. On the second level, there will be an inner product. This structure is also maintained during differentiation. As an example, the expression
fα = β + γ − δ − εϕ (7.1)
is stored this way:
[α]
[β , γ]
[δ , [ε, ϕ]]
Parallel to these arrays, expression equivalents are generated and stored inside the object. Because the derivation of the gradients is different for Legendre trans-formed expressions, these expressions are used only when output is exported. For
each Legendre transformed expression it may be necessary to create several ordi-nary expressions. The reason is that each inner product must be calculated before the result can be added to the rest of the terms in the expression. Creating these inner products is not straightforward, but since all the elements involved in the expression are created internally, the number of outcomes is kept at a minimum.
The two elements in the inner product have only one non-broadcasted dimension in common. This is the dimension which the sum will be calculated over. The dimension is determined by searching through the dimensions of the two elements.
There is also the possibility that the inner product is a scalar because the sum given by Rule 2.23 may have no dimensions, and consequently there will be no sum. The λ -function is in this case relatively simple, and involves one single multiplication only. All implicit equation must be solved in order to construct the final expres-sion. Instead of broadcasting every term in the expression an extra expression is constructed. In this expression, theλ -function will consist of only plus and minus, depending on which side of the equal sign the term was in the original expression.
All The terms are collected accordingly in two separate arrays, and it is relatively easy to construct theλ -function and the expression. The broadcasting of this ex-pression must finally be adjusted to fit the inner product with the inverse in order to complete the final expression.
Since all the expressions are made simultaneously, it is possible to assure that the units of the gradients of the Legendre transform are consistent, and possible errors are given in the proper place, see Section 7.3. It is also possible to assure that the dimensions and broadcasting are correct.
7.5.1 Calculating gradients of Legendre transforms
The interface to create gradients of the Legendre transformed expressions is the same as for ordinary expressions, but the interpretation is not the same. As men-tioned earlier, the first and second order gradients are implemented by hand, while higher order gradients follow the scheme derived in Chapter 2:
hσfrrdr= hrdr hθdt+ hσftrdt= htdt
hr= hσfrr ht = hθ+ hσftr
The two different types of expressions originate from Rules 2.23 and 2.24 which are repeated above, and which in turn means that two different methods are
re-quired for constructing the gradient. An expression can only be transformed with respect to one variable at a time. This means that if the gradient is calculated with respect to the transformed variable, one method is called, while a different method is needed for gradients with respect to untransformed variables. These methods do only apply Rules 2.23 or 2.24. There is a different method for the actual gra-dient calculations. The main differentiation method will successively go through the three arrays within the object, and create new objects of the correct type. The structure described in the previous section is used both to differentiate, and to con-struct the new objects. Since the expression is implicit, the differentiation will also be implicit. If the first term of Equation 7.1 is present, differentiation gives two new terms according to the product rule of differentiation. As an example the third-order gradient of U created from A by transforming T is shown below. The second order gradient of U with respect to−S and V can be written like:
U−SVAT T = ATV This will be stored internally like:
[AT T] [ATV] []
When U−SV is differentiated with respect to V this will become:
U−SVVAT T+U−SVAT TV = ATVV
[AT T] [ATVV] [[U−SV, AT TV]]
The first term is of the same kind as the original; while the second term is placed in the second part of Equation 7.1. Differentiation of the last two parts of the transformed expressions are straightforward. The only task to consider is to call the correct method for differentiating the transformed expressions.