8.3 Generalized Inverses
8.3.3 Generalized inverse
A matrix G is termed a generalized inverse of the m × n matrix A if it satisfies the first of the Moore–Penrose conditions: AGA = A. It is usually denoted by A−and necessarily is a n × m matrix. The Moore–Penrose inverse of A, A+ is clearly a generalized inverse of A and we know that the Moore–Penrose inverse is unique but the proof of this required an appeal to each of the four MP conditions (see
§8.3.1). This suggests that a generalized inverse satisfying only the first condition is not necessarily unique. This is indeed the case and it can be shown (see Abadir and Magnus (2005) §10.5) that any generalized inverse can be written in the form A−= A++ Q − A+AQAA+ where Q is any n × m matrix, i.e., Q is arbitrary. It is easily verified that AA−A = A, recalling that AA+A = A (the first of the MP conditions). Note that taking Q = A+ gives A−= A+ as the generalized inverse, recalling that A+AA+= A+(the second of the MP conditions).
R has no ready-made function to produce generalized inverses but the form above can be used, together with ginv(.) in the MASS library, to produce a generalized inverse (which of course will be different for different choices of Q).
The primary role of generalized inverses is in discussing solutions of the system of linear equations in x, Ax = y. There may be many solutions for x if A is singular or non-square.
Example 8.6:
(i) A non-square matrix not of full rank
> library(MASS)
> options(digits=2)
> A<-matrix(c(3,2,1,4,2,0,5, + 2,-1,-1,0,1),4,3,byrow=T)
> A
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 4 2 0
[3,] 5 2 -1
[4,] -1 0 1
> ### First need the
> ### MP-Inverse of A
> M<-ginv(A)
> M
[,1] [,2] [,3] [,4]
[1,] 6.9e-18 0.06 0.11 -0.06 [2,] 1.7e-01 0.06 -0.06 0.11 [3,] 3.3e-01 0.06 -0.22 0.28
> ### Next generate an
> ### arbitrary 3x4
> ### matrix Q
> set.seed(137)
> Q<-matrix(c(sample(1:12 + ,replace=T)),3,4)
> Q
[,1] [,2] [,3] [,4]
[1,] 8 10 11 4
[2,] 5 5 10 9
[3,] 11 12 8 10
> ### Now calculate
> ### Generalized Inverse
> G<-M+Q-M%*%A%*%Q%*%A%*%M
> G
[,1] [,2] [,3] [,4]
[1,] -0.56 1.5 2.6 3.9 [2,] -5.72 -3.8 3.1 7.1 [3,] -1.89 2.8 2.6 6.3
> ### check G satisfies
> ### the first condition
> A%*%G%*%A
> ### Next generate a
> ### different arbitrary
> ### check G satisfies
> ### the first condition
> A%*%G%*%A
This example shows two of the arbitrarily many generalized inverses of A by generating two random versions of Q using the function sample(.) (type help(sample) for more information on this function). The seeds used for the R random number generator were 137 and 163 and so can be reproduced if desired.
(ii) A non-square matrix of full column rank
> options(digits=2)
> set.seed(163)
8.3.3.1 Solutions of linear equations
The linear equation in x, Ax = y, may possess a unique solution, no solution or arbitrarily many solutions. For example, the equation 1 12 2x = 11 has no solution since it is not possible for both x1+x2= 1 and 2x1+2x2= 1 to be true simultaneously so it is said to be inconsistent . If the equation Ax = y has a solution it is said to be consistent . The equation (1, 1)x = 1 (where x is a 2 × 1 column vector has the solutions x = (1, q)0for any value of q, so there are arbitrarily many solutions.
If Ax = y is consistent then there is a solution, x? say, so that Ax?= y. So y = Ax? = AA−Ax?= AA−y. Conversely, if AA−y = y then let x? = A−y so Ax?= AA−y = y and thus the equation is consistent. Thus a necessary and sufficient condition for the equation Ax = y to be consistent is that AA−y = y. This provides a way of checking whether a system of linear equations is consistent. In practice, of
course, this would be checked by using the Moore–Penrose inverse A+because of the ease of computation.
Clearly if A is a square non-singular matrix then A−= A−1 and so AA−y = AA−1y = y and so the equation is consistent and x = A−1y is the unique solution (since A−1is unique).
Further, if Ax = y is consistent and if A is m × n with m > n and ρ(A) = n, i.e., it has full column rank then A0A is a non-singular n × n matrix and so possesses an inverse; consequently premultiplying both sides of the equation by (A0A)−1A0gives a solution as x = (A0A)−1A0y.
Notes:
(a) This argument is only valid if Ax = y is consistent because if it is not, it would depend upon a false premise, i.e., it is possible that A has full column rank but x = (A0A)−1A0y is not a solution of the equation (and indeed the equation has no solutions). This is illustrated in the first of the examples below.
(b) In the full column rank case (A0A)−1A0= A+; see key result (ii) in §8.3.1 on Page 125, so the solution can be written as x = A+y.
(c) If m = n, i.e., A is square and therefore non-singular, then (A0A)−1= A−1(A0)−1 and so this reduces to x = A−1y as the (unique) solution.
(d) It will be seen that in general the solution x = A+y is unique when m > n and ρ (A) = n, provided the equation is consistent (i.e., has any solutions at all).
Suppose Ax = y is consistent (so that AA−y = y), then if A− is any generalized inverse of A we have AA−A = A so AA−Ax = Ax, then if Ax = y we have A(A−y) = y and so x = A−y is a solution of Ax = y. Conversely, suppose Ax = y is consistent and has a solution x = Gy. Let aj be the jth column of A and consider the equations Ax = aj. Each of these has a solution x = ej, the jthunit vector, i.e., a vector with jth element 1 and all others 0 and so the equations are consistent. Therefore the equations Ax = ajhave a solution x = Gaj for all j and so AGaj= ajfor all j and thus AGA = A.
Recalling (see §8.3.3) that any generalized inverse A−can be written in the form A−= A++ Q − A+AQAA+and provided the equation is consistent, i.e., AA+y = y, we can write any solution of Ax = y in the form x = (A++ Q − A+AQAA+)y = A+y + (In− A+A)q, where q is any conformable vector (i.e., n × 1), writing q for Qy.
If A has full column rank then A+A = In(see key result (ii) in §8.3.1 on Page 125) and thus the solution above reduces to x = A+y and is unique even if A is non-square.
If A has full row rank then AA+= Im(see key result (i) in §8.3.1 on Page 125) and so AA+y = y for any y and so the Ax = y equation is consistent for any y and has a solution x = A+y. If the QR decomposition of A0 is given by A0= QR then A+= A0(AA0)−1= QR(R0Q0QR)−1= QRR−1(R0)−1= Q(R0)−1as stated in §8.2.7 on Page 123.
If the equation Ax = y has no solutions or has many different solutions, the question arises as to which is the best approximate solution or is the best exact
solution. One method is to base the choice on the least squares criterion. Consider the quantity (Ax − y)0(Ax − y) and choose a value of x which minimises this (whether or not the equation is consistent). Let x?= A+y then (Ax − y) = A(x − x?) − squares solution of the equation Ax = y and it may be an approximate solution or it may be an exact solution.
Summary:
• The linear equation Ax = y may have a unique exact solution, many exact solutions or no exact solutions.
• If it has a solution (i.e., is consistent) then this is given by x = A−y.
• A necessary and sufficient condition for it to have any solutions is that AA−y = y.
Essentially, this amounts to saying the solution might be x = A−y (but check that it works to see whether there are any exact solutions at all).
• If A has full row rank then it has at least one solution for every value of y.
• If A has full column rank and if the equation is consistent then then x = A+y is the unique solution.
• Irrespective of whether the equation is consistent x = A+y is the least squares solution, i.e., it minimises (Ax − y)0(Ax − y).
Example 8.7: (Details of the calculations are left to the exercises.) (i) A matrix of full column rank but equation is not consistent.
Let A =
then clearly the columns of A are linearly
independent and so ρ(A) = 2 but AA+y = Ax = y is not consistent. This is easily seen because Ax = y implies that x1+ x2= 1 and 2x1+ 2x2= 1 (and 3x1+ 4x2= 1) and the first two equations cannot both be true. Note that if y =
AA+y = y and so the equation has a solution and indeed it is unique and given by A+y (see exercises below).
(ii) A 3 × 4 matrix of rank 2.
so the equation is consistent. Thus it has at least one solution, one of these is
provided by x = A+y =
Verification that this does provide a solution and other details of the calculations are left to the exercises.