Inverse Linear Programming - Learning Probabilistic Graphical Models for Image Segmentation

Given some observation, the aim of solving an inverse problem is to determine the factors that produce them. For instance computer tomography solves an inverse problem for reconstructing a physical volume for which only some measurements were observed.

Similarly in inverse optimization, given the observed optimal solution one computes the model parameters which would result in the optimal solution. In this sense one can think about learning as an inverse problem. Given observations (mean parameters, ground truth data) we want to compute model parameters (canonical parameters) that lead to the given observations, when solving an inference problem.

In inverse optimization a feasible solution to the inverse problem may be found which is not an optimal one, however. The optimal one can be very difficult or impossible to find only based on the observed solution. In order to find the optimal model parameters that produced the observed solution, the given feasible parameters should be perturbed as little as possible so that they lead (correspond) to an optimal solution. This is what we refer to by inverse programming or inverse linear programming when the problem to solve is a linear program (LP) as it is in our case when applied to the learning problem. More precisely we define inverse linear programming as in [ZL96] as following:

Definition 2.6.1. Let the linear program (LP) be given by

min

x hˆc, xi s.t. Ax ≥ b and l ≤ x ≤ u (2.105)

where ˆ_{c, x, l, u ∈ R}n_{, b ∈ R}m_{, A ∈ R}m×n. Let ˆx be a feasible solution to the given LP.

2.6. Inverse Linear Programming

becomes an optimal solution to the adjusted LP given by min

x h˜c, xi, s.t. Ax ≥ b and l ≤ x ≤ u (2.106)

whose solution is an optimal x. Then the inverse LP is expressed as

min ||˜c − ˆc||1, s.t. x is an optimal solution to (2.106) . (2.107)

Note that for the definition above we have to know the optimal solution x in order to solve the inverse LP. This definition for the `₁ norm defined by ||˜c − ˆc||1 =Pi|˜ci− ˆci|

was extended to the `∞ norm defined by ||˜c − ˆc||∞= maxi|˜ci− ˆci| in the later paper

by Zhang [ZL99] and to the weighted case (weighted norm) in [AO01]. What is important is that whenever the original problem is an LP the inverse problem is an LP, too [AO01]. The feasible region of the inverse problem is formulated using the complementary slackness constraints and the constraints of the dual problem.

Next we derive the inverse linear program given the original LP.

Given the primal LP as in (2.105) we first define its dual max

y hb, yi + hl, λi − hu, ψi (2.108a)

s.t. Ay + λ − ψ = ˆc, y ≥ 0, λ ≥ 0, ψ ≥ 0 (2.108b) where y is the associated dual variable to the constraint on x expressed by the matrix

A. Furthermore, λ and ψ are the dual variables associated with the constraints

on x, l, and u respectively. Linear programming optimality conditions state that the primal and dual solutions x and y are optimal if they are both feasible for the corresponding problems and the complementary slackness conditions are satisfied, i.e.

a) Ax > b =⇒ y = 0, (2.109a)

b) x > l =⇒ λ = 0, and (2.109b)

c) x < u =⇒ ψ = 0. (2.109c)

As stated in Definition 2.6.1we want ˆx to be an optimal solution to the perturbed

problem (2.106). We can consider the primal and the dual for the perturbed problem as the primal and the dual of the original problem just with ˆc replaced by ˜c. We call

the primal and the dual of the perturbed problem the primal perturbed and the dual perturbed. Now, due to the optimality conditions, ˆx is an optimal solution to the

perturbed primal if and only if there exists a dual problem to the perturbed primal and the complementary slackness conditions as in (2.109) are satisfied. From the constraints of the dual problem (2.108) and the complementary slackness conditions

we can define the following index sets for ˆx

B := {i ∈ [m] : (Aˆx − b)i= 0} (2.110a)

L := {j ∈ [n] : (ˆx − l)j = 0} (2.110b)

U := {j ∈ [n] : (ˆx − u)j = 0} (2.110c)

S := {j ∈ [n] : 0 < ˆxj < uj}, (2.110d)

then the constraints in the perturbed dual can be written as

(Ay + λ − ˜c)L= 0 (2.111a)

(Ay − ψ − ˜c)U = 0 (2.111b)

(Ay − ˜c)S = 0 (2.111c)

yB ≥ 0, λL≥ 0, ψU ≥ 0. (2.111d)

Let us denote c = ˜c − ˆc. Then the inverse problem as defined in (2.107) is to minimize the `1 norm of c such that the constraints as defined in (2.111) are satisfied. To this end, let us write the `1 norm as ||c||1 = c++ c−, where c+ = max{c, 0} and

c− = − min{c, 0}. Then we can define the inverse LP by the constraints (2.111) where instead of ˜c we use ˜c = ˆc + c = ˆc + c+− c−

Inverse LP : min c+_,c−_≥0h1, c +_{+ c}−_{i s.t.} _(2.112a) (Ay − c++ c−+ λ − ˆc)L= 0 (2.112b) (Ay − c++ c−− ψ − ˆc)U = 0 (2.112c) (Ay − c++ c−− ˆc)S = 0 (2.112d) yB≥ 0, λL≥ 0, ψU ≥ 0. (2.112e)

The problem is clearly an LP, which brings us to the conclusion that the inverse of an LP is an LP too. In fact the inverse problems of some LPs originating from the minimum cut, the assignment problem and the shortest path problem have their inverse problems with the `₁ norm as an LP of the same kind as the original problem.

In document Learning Probabilistic Graphical Models for Image Segmentation (Page 62-65)