The generation of gradients is the most complicated operation which is undertaken on the expressions, see Section 6.2, and in order to save computation time each gradient is therefore computed only once. If the same gradient is required several times, a copy of the previously computed gradient is returned. As explained in Section 6.2, the chain rule requires that theλ -function has been differentiated with respect to all the variables, which in turn will generate a new expression for each derivative not evaluating to zero. The new expression must again be multiplied by the corresponding gradient. There will be a sum of multiplications, which in turn will give a new expression which represents the gradient.
The objects given as arguments to an expression have a corresponding element in theλ -function. The object from the list of arguments and the corresponding object in theλ -function are differentiated simultaneously. The λ -function is differenti-ated with respect to the variable corresponding to each element, and the gradient
is calculated for that element. If the gradient of the element exists and the dif-ferentiation of the λ -function is not zero, a new expression will be constructed.
The expression looks is determined by the outcome of the differentiation of theλ -function, and there are two different alternatives for differentiating theλ -function, either internally or by calling external programs. The internal method will work directly on the internal representation of theλ -function and produce a new object in the same representation.
However, since theλ -function is a symbolic algebraic expression, it makes sense to differentiate this with a Computer Algebra System (CAS). The possibility of using an external CAS has therefore been implemented as an alternative to the internal differentiation. In principle any CAS which takes command line arguments can be used for the purpose. The only requirement is to translate the internal representa-tion to a string which is read by the CAS, and interpret the result afterwards. This has been demonstrated for two different computer algebra systems, namely Maple and Ginsh. The latter is a front-end to GiNaC (Bauer et al., 2002). The great advan-tage of these programs is that they render expressions with fewer operations than the internal differentiation∗. The numerical result are the same within the machine precision, but with fewer operations, and therefore the final code will run faster.
The disadvantages are mainly compatibility issues, and all external programs re-quire some kind of license, either commercial (Maple) or not (Ginsh). Another great disadvantage is that the overhead in making these external calls significantly slows down the gradient calculations. This problem can probably be solved, or at least reduced by linking the computation kernels into Ruby. This should improve the computation time significantly since the external library is then loaded only once per run instead of once per differentiation.
A new expression based on the differentiatedλ -function must be created unless the differentiation evaluates to zero, or the gradient of the corresponding object from the argument list does not exist. The newλ -function raises three different possibilities: Theλ -function can result in another λ -function, the λ -function may reduce to one single object, or theλ -function may reduce to a numerical value.
Whenever aλ -function is differentiated, and the number of variables in the deriva-tive is decreased compared to the parent function, it means that some of the corre-sponding objects in the argument list are not needed in the argument list given de-fined for the differentiated expression. The differentiation can also possibly change the order of the variables. To remove any obsolete objects, and to figure out the new argument order, the newλ -function is traversed in the order it is evaluated,
∗This is always the case for Maple with optimization option tryhard. It is, however, not always true for Ginsh or Maple without optimization. The reason for this is probably that CAS focuses on generating human readable code. This is not always the fastest computation wise.
and each unique object that appears in theλ -function is collected. When this op-eration is complete, the corresponding objects are picked from the argument list, and a new argument list is created without obsolete elements, and with the correct ordering.
If the traversal of theλ -function renders an empty argument list it is either an error, or theλ -function has reduced to a numerical value, most likely a fraction.
The reason that the fraction appear is that the internal representation uses operator overloading and lets Ruby handle numerical operations, with one important excep-tion: Integer fractions will be truncated by Ruby, for instance 1/2 will become 0. In order to avoid truncation errors internally, these divisions are not converted to floating point numbers before the expression is exported. This special case is treated in exactly the same manner as when an ordinary numeric value is the only object which is left after differentiation. It does not make sense to create a struc-ture to hold one scalar value, and maybe copy this into several dimensions. Scalar values should therefore be integrated into theλ -function. The only problem is that theλ -function is not generated yet. To solve the problem, a temporary object is created, and the scalar value is substituted into theλ -function when this is created.
A new expression is created based on the newλ -function and the original expres-sion in a case where the derivative of theλ -function is not a numeric value. As already mentioned, a new argument list is created based on the newλ -function and the original argument list.
The objects in this list must be copied from the original expression to make sure the originals are not destroyed. The broadcasting of the objects are kept when they are copied, but the broadcasting is not necessarily correct in the setting of the newly created expression and must therefore be adjusted. The broadcasting must be adjusted if the differentiation of theλ -function results in the removal of objects from the argument list. This means that it might be possible to reduce the number of dimensions of the new object compared to the original. This will save both computation effort and memory in the final code. If a dimension is removed from the new expression, this dimension must be replaced by broadcasting when the expression is used. According to the chain rule, the new expression must be multiplied with the corresponding gradient, as explained in Section 6.3 on page 84.
It basically means that the broadcasting of the new expression might need to be adjusted such that this multiplication is possible.
The gradient calculation of an expression can be shown with the following exam-ple:
e = RGrad::expr(x[d, nil], a, y[d, nil]){|_x, _a, _y|
_x*_a*_y}
The symbols x, a and y represent any differentiable objects. Their gradients are assumed to be known and are here called gx, ga and gy respectively. The gradient of e will create a total of four new expressions, which are shown below. The objects dx, da and dy are due to the differentiation of theλ -function which must satisfy the chain rule.
dx = RGrad::expr(a, y[d, nil]){| _a, _y|
_a*_y}
da = RGrad::expr(x, y){|_x, _y|
_x*_y}
dy = RGrad::expr(x[d, nil], a){|_x, _a|
_x*_a}
Notice that the broadcasting of da has been removed, and that the reduction in rank is compensated for when the expression is used to calculate the gradient of e:
ge = RGrad::expr(gx[d, nil, d], dx[d, d, nil], ga, da[d, nil, nil], gy[d, nil, d], dy[d, d, nil]){
|_gx, _dx, _ga, _da, _gy, _dy|
_gx*_dx + _ga*_da + _gy*_dy}
The last option for the differentiation of theλ -function occurs when it is reduces to a single object. When this happens it is not necessary to create a new object, since a copy of the remaining object can be used instead. The broadcasting of the new object must be updated to the new context of course, and the broadcasting must be augmented according to the dimensions of the arguments given to the gradient method.
The new object, copied or created, is matched with the corresponding gradient.
Since the gradient of an object is calculated only once, the dimensions with broad-casting must be reset in the context where the object is used. Once all gradient-expression from the differentiatedλ -function pairs are collected, a new expression representing the gradient can be constructed. Theλ -function for this expression is constructed based on these collected objects. As explained in Section 6.2, thisλ -function is really a sum of products. This issue is solved by using arrays to collect the objects on two levels. The objects at the first level are added, while the objects at the second level are multiplied. At the same time the function is simplified by inserting scalar values directly into theλ -function, as explained earlier.
The expression representing the gradient must be summed over the same dimen-sions as the original if the original expression was a summed expression. If the number of dimensions for this new expression has increased compared to the orig-inal expression, the broadcasting of the sum must be adjusted accordingly.
Every time a gradient is calculated with respect to a new variable, the differentia-tion of theλ -function will render the same expressions. It therefore makes sense
to create these expressions once, and only use copies later. This approach was at-tempted, but generated a bug in the broadcasting which could not be resolved at the time. It was therefore decided to implement a method which could recognize these duplicate expressions, and replace them afterwards. What is important with this procedure is that the exact same result is not calculated more than once in the final code. Several copies of the same object will appear in the model graph, and when these objects are exported, the unique objects are distinguished by comparing the names of all the objects, see Chapter 8 for more details. It is therefore sufficient to change the names of the duplicate expressions that are created, to make sure the expression is calculated only once. A string representation is used to compare the different expressions with each other. This string is based on the argument list, the broadcasting and the λ -function of the expression, and will therefore be unique for all unique expressions. When two identical expressions are found, the name of the second is simply replaced by the name of the first one. This will automatically affect the name of all the copies of the object as well, and it will also affect the string representation of the expressions which depends on this object. This means that in order to be sure that all duplicate expressions are removed several passes must be run. In practice the optimization is run in an indefinite while loop until no more expressions are replaced. In practical calculations it is normally enough with three to four passes, but up to eighteen passes has been registered.