OTL-circuit Function - Improve the Active Subspace Method by Partitioning the Parameter Space

This OTL-circuit function models an output transformerless push-pull circuit, which can be found athttp: //www.s f u.ca/ssurjano/otlcircuit.html.

Vm = (Vb1 +0.74)β(Rc2+9) β(Rc2+9) +Rf + 11.35Rf β(Rc2+9) +Rf + 0.74Rfβ(Rc2+9) Rc1(β(Rc2+9) +Rf) , (4.4) whereVb1 =12Rb2/(Rb1+Rb2).

Its inputs and their distributions are detailed in the table below. Variable Symbol Distribution (U(min, max))

Resistance b1 Rb1 U(50, 150) Resistance b2 Rb2 U(25, 70) Resistance f Rf U(.5, 3) Resistance c1 Rc1 U(1.2, 2.5) Resistance c2 Rc2 U(.25, 1.2) Current gain β U(50, 300) This is a 6-dimensional function.

The first glance at figure4.7reveals the same pattern as for higher dimensional functions, namely, the curves are relatively flat. Moreover, one can observe that the curves are oscillating more than those for the other test functions, which suggests that we may not have enough training and/or testing points. This is logical because the parameter space of the OTL-circuit function is the largest among all the test functions used in this thesis. Hence, using the same number (8000) of sample points and testing points may not capture the structure of the function surface as well as in the other three test functions. Moreover, we can more clearly see from figure4.8 that adaptive methods, for the first time, are outperformed by the random methods. This result leads us to further consider the limitations of the adaptive methods.

Our guess as to an explanation is that the points chosen by the adaptive methods are clustered together, so that the adaptive algorithms do not capture the structure information as much as the random algorithms do. To test our guess, we normalise the value of each variable and plot their frequency side by side. Figure4.9illustrates the distribution of points in terms different axis. The left column representing the six variables of points generated by the algorithm 6 and the right by algorithm 3. The right column shows that variables are approximately uniform distributed. This makes sense as we choose our points from a uniform distribution in the random method. While we can see from the left column that the first variable, the third variable and the fourth variable are clearly clustered. This confirms our guess that the points chosen by adaptive algorithms tend to cluster together for certain functions. The next problem is to find out why we get a set of clustered points.

FIGURE 4.7: MSE of 6D OTL-circuit function; 100 gradients evaluated

We believe that this problem occurs when the underlying functions have both stronger local ridge characteristics and weak or even no ridge regions in the parameter space. For example, let us assume that we have a function with some regions in the parameter space that can be perfectly represented by a one-dimensional active subspace, which means that, in these regions, the only variation occurs along the gradient direction. In this case, we can in theory construct a prefect response surface with a sufficient number of sample points. This means that instead of having f(x) ≈ g(WT₁y), we have

f(x) = g(WT₁y). Let us also assume that we have some regions that are not perfectly ridge but nearly ridge regions. In other words, in these regions, the function varies significantly along the gradient direction but also varies less dramatically but constantly along the other directions. This still provides us with a one-dimensional active subspace but also noise in the response surface. We illustrate these situations using functions such as f(x,y) = x2

FIGURE 4.8: MSE of 6D OTL-circuit function; 100 gradients evaluated

and f(x,y) = x2+0.1y2, both for x∈ [−1, 1],y ∈[−1.1].

For the function f(x,y) = x2 (figure 4.10), we have that the gradient equals 2x, which tells us that the direction[1, 0]Tis the active subspace, called the active direction here. Then, we draw 10 sample points and construct the response surface, which is shown in figure4.11. This tells us that these points are perfectly interpolated by the regression line. However, if we do the same for f(x,y) = x2+0.1y2 (figure 4.12), we find from figure 4.13 that those points are not exactly on the regression line, which suggests the existence of noise. As shown, the noise is not large because the noise comes from they2

term for y ∈ [−1, 1]. Note that because the variation along the x-axis is still dominant, we still have an active subspace of dimension 1.

Let us imagine that if we have a function that has several regions that vary as f(x,y) = x2 or f(x,y) = y2 but where some regions have the responses

FIGURE4.9: Distribution of distances of points used for OTL- circuit function

surface as f(x,y) = x2+0.1y2, then our adaptive algorithms will always se- lect the points in the f(x,y) = x2+0.1y2regions. This is because the points in other regions have a squared error of 0 (ignoring the rounding error). How- ever, selecting these points and constructing a region on it does not increase the total accuracy of the methods because the newly formed region still has the same active subspace. Hence, we have a very similar response surface as the previous one, and the error is still larger than other more ridge-like regions. This should provide us a short overall distance between the adap- tively selected points compared with randomly selected points.

FIGURE4.10: Surface of f(x,y) =x2

FIGURE4.11: Regression along the active subspace of f(x,y) = x2

We next reproduce this problem using the function f(x,y) = xexp(−x2−

y2). We limit our range tox,y ∈ [−5, 5]. We have a surface as shown in figure

FIGURE4.12: Surface of f(x,y) =x2+0.1y2

FIGURE4.13: Regression along the active subspace of f(x,y) = x2+0.1y2

function, and the remainder of the surface of the function is relatively flat. Therefore, we expect that all the points selected by the adaptive methods should be inside the hole or the pump, particularly around the tip of the hole

and/or the pump, which associated with the highest error. By selecting 20 points from both the adaptive method algorithm 6 and the random method algorithm 3, we have graph4.15. As shown, the adaptive method chooses almost all the points inside the pump, particularly around the tips of the pump.

Note that algorithm 7 (with Gaussian) should provide similar results because the type of functions mentioned previously admit one-dimensional active subspaces. Therefore, it suffers from the same problem as algorithm 6.

FIGURE4.14: Surface of f(x,y) =xexp(−x2−y2)

In document Improve the Active Subspace Method by Partitioning the Parameter Space (Page 75-82)