3.3 Clusterwise linear regression based on nonsmooth optimization (CLR(Opt))
3.3.4 Computation of initial solutions
1, · · · , xk−1, yk−1) to the (k − 1)-CLR problem (3.3.2) consider the following set of hyperplanes:
Ck=(u, v) ∈ Rn+1: Eab(u, v) > rk−1ab ∀(a, b) ∈ A .
The set Ck contains all hyperplanes which do not attract any point from the set A. It is clear that over this set the function ¯fk is constant and reaches its global maximum
value (3.3.5). Therefore any hyperplane from the set Ck is a stationary point for
the function ¯fk. This means that if we choose a starting point in this set then most local methods will be unable to escape it and will not decrease the value of both the auxiliary and overall fit functions. Therefore it is crucial to select starting points in the complementary set Ck of the closure of Ck:
Ck =(u, v) ∈ Rn+1: ∃ (a, b) ∈ A such that Eab(u, v) < rabk−1 .
Clearly, Ck is the set of hyperplanes which attract at least one data point from the set A. Any hyperplane from this set will decrease the value of the auxiliary CLR function. Our aim is to find hyperplanes which provide significant decrease of the value of this function. Next we describe how such hyperplanes can be found.
Let ¯A0 be a set of all data points (a, b) ∈ A which do not lie on any of hyperplanes (x1, y1), . . . , (xk−1, yk−1). If ¯A0 = ∅ then the hyperplanes (x1, y1), · · · , (xk−1, yk−1)
perfectly approximate the set A. Therefore we assume that ¯A0 6= ∅. Take any
(a, b) ∈ ¯A0. Assume that this point belongs to the cluster determined by the linear regression coefficients {xj, y
j} where j ∈ {1, . . . , k − 1}. Define another hyperplane (xab, yab) parallel to the hyperplane (xj, yj) passing through the point (a, b). Then xab = xj and yab = b − hxj, ai. It is clear that (xab, yab) ∈ Ck for all (a, b) ∈ ¯A0. The value ˜fk−1 of the function fk−1 over A with hyperplanes (x1, y1, · · · , xk−1, yk−1) is:
˜ fk−1 =
X
(a,b)∈A rk−1ab
and the value ˜fk of the function fk over A with hyperplanes (x1, y1, · · · , xk−1, yk−1) and (xab, yab) is: ˜ fk= ¯fk(xab, yab) = X (c,d)∈A min{rk−1cd , Ecd(xab, yab)}.
The difference between these two values is: d(xab, yab) = ˜fk−1− ˜fk =
X
(c,d)∈A
max{0, rk−1cd − Ecd(xab, yab)}.
d(xab, yab) > 0 for any data point (a, b) ∈ ¯A0. Let γ1 ∈ [0, 1] be a given number. Let ¯
d1 = max{d(xab, yab) : (a, b) ∈ ¯A0} (3.3.7)
and the set
¯
A1 = {(a, b) ∈ A : d(xab, yab) ≥ γ1d¯1}. (3.3.8)
This set contains all the solutions providing decrease above a threshold γ1d¯1. For γ1 = 0 the set ¯A1 = ¯A0 and for γ1 = 1 the set ¯A1 contains only data points providing largest decrease ¯d1 of the k-th CLR function.
For each (a, b) ∈ ¯A1 compute the set Bab as follows:
Bab =(c, d) ∈ A : Ecd(xab, yab) < rk−1cd . (3.3.9) The set Bab contains all points from the set A attracted by the clusterwise linear regression function (xab, yab). We compute (¯xab, ¯yab) as a linear regression function
approximating the set Bab. This additional step to update the clusterwise linear
regression function (xab, yab) allows one to improve an initial solution determined by the cluster Bab.
Now we can define the following set of hyperplanes: ¯
A2 =(u, v) : u ∈ Rn, v ∈ R and ∃(a, b) ∈ ¯A1 s.t. u = ¯xab, v = ¯yab . (3.3.10) The set ¯A2 contains all hyperplanes computed using points (a, b) ∈ ¯A1. Next we compute the value ˆfk(u, v) of the overall fit function fk over A with hyperplanes (x1, y1, · · · , xk−1, yk−1) and (u, v) = (¯xab, ¯yab): ˆ fk(u, v) = ¯fk(¯xab, ¯yab) = X (c,d)∈A min{rk−1cd , Ecd(¯xab, ¯yab)}
and the following number: ˆ
fk,min= minn ˆfk(u, v) : (u, v) ∈ ¯A2
o
≥ 0. (3.3.11)
Let γ2 ∈ [1, ∞[ be a given number. Define the following set ¯
All hyperplanes from the set ¯A3 are considered as an initial solution to solve Problem (3.3.6). The number γ2fˆk,minis defined as a threshold and if the value of the auxiliary CLR function at (u, v) ∈ ¯A2 is greater than this threshold this hyperplane is not considered as a “promising” to be an initial solution to minimize the auxiliary CLR function, since the value of this function at this initial solution is significantly larger than its best value. If γ2 = 1 then hyperplanes from ¯A2 with the lowest value of the auxiliary CLR function are chosen and if γ2 is sufficiently large then ¯A3 = ¯A2.
Thus, an algorithm for finding good initial solutions for solving Problem (3.3.6) can be summarized as follows:
Algorithm 2 An algorithm for finding initial solutions to solve Problem (3.3.6).
1: (Initialization) Select the numbers γ1 ∈ [0, 1] and γ2 ∈ [1, ∞[.
2: Determine the set ¯A0 and compute the number ¯d1 using (3.3.7).
3: Compute the set ¯A1 using (3.3.8).
4: For each (a, b) ∈ ¯A1 compute the set Bab using (3.3.9), update the clusterwise regression functions (xab, yab) and compute the set ¯A2 applying (3.3.10).
5: Compute the number ˆfk,min using (3.3.11) and the set ¯A3 using (3.3.12). Any hyperplane (u, v) ∈ ¯A3 is an initial solution to solve Problem (3.3.6).
Next we describe an algorithm for solving the auxiliary CLR problem (3.3.6). For a given hyperplane (u, v) define the following set:
B(u, v) = {(a, b) ∈ A : Eab(u, v) < rk−1ab },
The set B(u, v) contains all points from the set A which are attracted by the linear regression function (u, v). It is obvious that B(u, v) 6= ∅ for all (u, v) ∈ ¯A3.
Algorithm 3 An algorithm for solving Problem (3.3.6).
1: (Initialization) Select numbers γ1 ∈ [0, 1], γ2 ∈ [1, ∞[ and apply Algorithm 2 to compute the set ¯A3.
2: Select the initial linear regression function (u0, v
0) ∈ ¯A3, compute the set B(u0, v0) and set l := 0.
3: Solve the following linear regression problem:
minimize ϕ(u, v) = X
(a,b)∈B(ul,v
l)
Eab(u, v) subject to u ∈ Rn, v ∈ R (3.3.13)
and obtain regression coefficients (˜ul, ˜v
l).
4: Compute the set B(˜ul, ˜v
l).
5: (Stopping criterion) If B(˜ul, ˜vl) = B(ul, vl), then set (¯u, ¯v) := (ul, vl) and stop. (¯u, ¯v) is a solution to Problem (3.3.6).
6: Otherwise set ul+1 := ˜ul, v
l+1 := ˜vl, B(ul+1, vl+1) := B(˜ul, ˜vl), l := l + 1 and go to Step 3.
Figure 3.2: Illustration of the incremental CLR algorithm.