Pukelsheim Optimal DoE

(1)

(2)

Optimal Design

of Experiments

(3)

service because they continue to be important resources for mathematical scientists. Editor-in-Chief

Robert E. O'Malley, jr., University of Washington Editorial Board

Richard A. Brualdi, University of Wisconsin-Madison Nicholas J. Higham, University of Manchester Leah Edelstein-Keshet, University of British Columbia Herbert B. Keller, California Institute of Technology Andrzej Z. Manitius, George Mason University Hilary Ockendon, University of Oxford Ingram Olkin, Stanford University Peter Olver, University of Minnesota

Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht Classics in Applied Mathematics

C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences

Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods

James M. Ortega, Numerical Analysis: A Second Course

Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Uncon-strained Minimization Techniques

F. H. Clarke, Optimization and Nonsmooth Analysis

George F. Carrier and Carl E. Pearson, Ordinary Differential Equations Leo Breiman, Probability

R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding

Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences

Olvi L. Mangasarian, Nonlinear Programming

*Carl Friedrich Gauss, Theory of the. Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis

U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations

K. E. Brenan, S. L. Campbell, and L. R. Pel2old, Numerical Solution of Initial-Value Problems in Differential-Algebraic Equations

Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations

Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators

Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem

Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow

*First time in print.

(4)

Classics in Applied Mathematics (continued)

Peter W. M. John, Statistical Design and Analysis of Experiments

Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes

Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design

Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Popula-tions: A New Statistical Methodology

James A. Murdock, Perturbations: Theory and Methods

Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables

David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications

F Natterer, The Mathematics of Computerized Tomography

Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging R. Wong, Asymptotic Approximations of Integrals

O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation

David R. Brillinger, Time Series: Data Analysis and Theory

Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems

Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Intriligator, Mathematical Optimization and Economic Theory Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems

Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. I: Theory

M. Vidyasagar, Nonlinear Systems Analysis, Second Edition

Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice

Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Method-ology of Selecting and Ranking Populations

Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods Leah Edelstein-Keshet, Mathematical Models in Biology

Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations

J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition

George F Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and Technique

Friedrich Pukelsheim, Optimal Design of Experiments

(5)

(6)

Optimal Design

of Experiments

Friedrich Pukelsheim

University of Augsburg

Augsburg, Germany

Society for Industrial and Applied Mathematics Philadelphia

(7)

This SIAM edition is an unabridged republication of the work first published by John Wiley & Sons, Inc., New York, 1993.

1 0 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permis-sion of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Library of Congress Cataloging-in-Publication Data:

Pukelsheim, Friedrich,

1948-Optimal design of experiments / Friedrich Pukelsheim.— Classic ed. p. cm. — (Classics in applied mathematics ; 50)

Originally published: New York : J. Wiley, 1993. Includes bibliographical references and index. ISBN 0-89871-604-7 (pbk.)

1. Experimental design. 1. Title. II. Series. QA279.P85 2006

519.5'7--dc22

Partial royalties from the sale of this book are placed in a fund to help students attend SIAM meetings and other SIAM-related activities. This fund is administered by SIAM, and qualified individuals are encouraged to write directly to SIAM for guidelines.

is a registered trademark.

(8)

1

(9)

1.23. Optimal Estimators in Classical Linear Models, 24 1.24. Experimental Designs and Moment Matrices, 25 1.25. Model Matrix versus Design Matrix, 27

1.26. Geometry of the Set of All Moment Matrices, 29 1.27. Designs for Two-Way Classification Models, 30 1.28. Designs for Polynomial Fit Models, 32

Exercises, 33

2. Optimal Designs for Scalar Parameter Systems 35

2.1. Parameter Systems of Interest and Nuisance Parameters, 35 2.2. Estimability of a One-Dimensional Subsystem, 36

2.3. Range Summation Lemma, 37 2.4. Feasibility Cones, 37

2.5. The Ice-Cream Cone, 38

2.6. Optimal Estimators under a Given Design, 41

2.7. The Design Problem for Scalar Parameter Subsystems, 41 2.8. Dimensionality of the Regression Range, 42

2.9. Elfving Sets, 43

2.10. Cylinders that Include the Elfving Set, 44

2.11. Mutual Boundedness Theorem for Scalar Optimality, 45 2.12. The Elfving Norm, 47

2.13. Supporting Hyperplanes to the Elfving Set, 49 2.14. The Elfving Theorem, 50

2.15. Projectors for Given Subspaces, 52

2.16. Equivalence Theorem for Scalar Optimality, 52 2.17. Bounds for the Optimal Variance, 54

2.18. Eigenvectors of Optimal Moment Matrices, 56

2.19. Optimal Coefficient Vectors for Given Moment Matrices, 56 2.20. Line Fit Model, 57

2.21. Parabola Fit Model, 58 2.22. Trigonometric Fit Models, 58

2.23. Convexity of the Optimality Criterion, 59 Exercises, 59

3. Information Matrices 61

3.1. Subsystems of Interest of the Mean Parameters, 61 3.2. Information Matrices for Full Rank Subsystems, 62 3.3. Feasibility Cones, 63

(10)

CONTENTS IX 3.4. Estimability, 64

3.5. Gauss-Markov Estimators and Predictors, 65 3.6. Testability, 67

3.7. F-Test of a Linear Hypothesis, 67 3.8. ANOVA, 71

3.9. Identifiability, 72 3.10. Fisher Information, 72 3.11. Component Subsets, 73 3.12. Schur Complements, 75

3.13. Basic Properties of the Information Matrix Mapping, 76 3.14. Range Disjointness Lemma, 79

3.15. Rank of Information Matrices, 81

3.16. Discontinuity of the Information Matrix Mapping, 82 3.17. Joint Solvability of Two Matrix Equations, 85

3.18. Iterated Parameter Subsystems, 85 3.19. Iterated Information Matrices, 86 3.20. Rank Deficient Subsystems, 87

3.21. Generalized Information Matrices for Rank Deficient Subsystems, 88

3.22. Generalized Inverses of Generalized Information Matrices, 90 3.23. Equivalence of Information Ordering and Dispersion

Ordering, 91

3.24. Properties of Generalized Information Matrices, 92 3.25. Contrast Information Matrices in Two-Way Classification

Models, 93 Exercises, 96

4. Loewner Optimality 98

4.1. Sets of Competing Moment Matrices, 98

4.2. Moment Matrices with Maximum Range and Rank, 99 4.3. Maximum Range in Two-Way Classification Models, 99 4.4. Loewner Optimality, 101

4.5. Dispersion Optimality and Simultaneous Scalar Optimality, 102

4.6. General Equivalence Theorem for Loewner Optimality, 103 4.7. Nonexistence of Loewner Optimal Designs, 104

4.8. Loewner Optimality in Two-Way Classification Models, 105 4.9. The Penumbra of the Set of Competing Moment

(11)

4.10. Geometry of the Penumbra, 108

4.11. Existence Theorem for Scalar Optimality, 109 4.12. Supporting Hyperplanes to the Penumbra, 110

4.13. General Equivalence Theorem for Scalar Optimality, 111 Exercises, 113

5. Real Optimality Criteria 114

5.1. Positive Homogeneity, 114

5.2. Superadditivity and Concavity, 115

5.3. Strict Superadditivity and Strict Concavity, 116 5.4. Nonnegativity and Monotonicity, 117

5.5. Positivity and Strict Monotonicity, 118 5.6. Real Upper Semicontinuity, 118 5.7. Semicontinuity and Regularization, 119 5.8. Information Functions, 119

5.9. Unit Level Sets, 120

5.10. Function-Set Correspondence, 122 5.11. Functional Operations, 124

5.12. Polar Information Functions and Polar Norms, 125 5.13. Polarity Theorem, 127

5.14. Compositions with the Information Matrix Mapping, 129 5.15. The General Design Problem, 131

5.16. Feasibility of Formally Optimal Moment Matrices, 132 5.17. Scalar Optimality, Revisited, 133

Exercises, 134

6. Matrix Means 135

6.1. Classical Optimality Criteria, 135 6.2. D-Criterion, 136 6.3. A-Criterion, 137 6.4. E-Criterion, 137 6.5. T-Criterion, 138 6.6. Vector Means, 139 6.7. Matrix Means, 140

6.8. Diagonality of Symmetric Matrices, 142 6.9. Vector Majorization, 144

6.10. Inequalities for Vector Majorization, 146 6.11. The Holder Inequality, 147

(12)

CONTENTS XI 6.12. Polar Matrix Means, 149

6.13. Matrix Means as Information Functions and Norms, 151 6.14. The General Design Problem with Matrix Means, 152 6.15. Orthogonality of Two Nonnegative Definite Matrices, 153 6.16. Polarity Equation, 154

6.17. Maximization of Information versus Minimization of Variance, 155

Exercises, 156

7. The General Equivalence Theorem 158

7.1. Subgradients and Subdifferentials, 158 7.2. Normal Vectors to a Convex Set, 159 7.3. Full Rank Reduction, 160

7.4. Subgradient Theorem, 162

7.5. Subgradients of Isotonic Functions, 163 7.6. A Chain Rule Motivation, 164

7.7. Decomposition of Subgradients, 165 7.8. Decomposition of Subdifferentials, 167 7.9. Subgradients of Information Functions, 168 7.10. Review of the General Design Problem, 170

7.11. Mutual Boundedness Theorem for Information Functions, 171 7.12. Duality Theorem, 172

7.13. Existence Theorem for Optimal Moment Matrices, 174 7.14. The General Equivalence Theorem, 175

7.15. General Equivalence Theorem for the Full Parameter Vector, 176

7.16. Equivalence Theorem, 176

7.17. Equivalence Theorem for the Full Parameter Vector, 177 7.18. Merits and Demerits of Equivalence Theorems, 177 7.19. General Equivalence Theorem for Matrix Means, 178 7.20. Equivalence Theorem for Matrix Means, 180

7.21. General Equivalence Theorem for E-Optimality, 180 7.22. Equivalence Theorem for E-Optimality, 181

7.23. E-Optimality, Scalar Optimality, and Eigenvalue Simplicity, 183

7.24. E-Optimality, Scalar Optimality, and Elfving Norm, 183 Exercises, 185

(13)

8. Optimal Moment Matrices and Optimal Designs 187

8.1. From Moment Matrices to Designs, 187

8.2. Bound for the Support Size of Feasible Designs, 188 8.3. Bound for the Support Size of Optimal Designs, 190 8.4. Matrix Convexity of Outer Products, 190

8.5. Location of the Support Points of Arbitrary Designs, 191 8.6. Optimal Designs for a Linear Fit over the Unit Square, 192 8.7. Optimal Weights on Linearly Independent Regression

Vectors, 195

8.8. A-Optimal Weights on Linearly Independent Regression Vectors, 197

8.9. C-Optimal Weights on Linearly Independent Regression Vectors, 197

8.10. Nonnegative Definiteness of Hadamard Products, 199 8.11. Optimal Weights on Given Support Points, 199 8.12. Bound for Determinant Optimal Weights, 201 8.13. Multiplicity of Optimal Moment Matrices, 201

8.14. Multiplicity of Optimal Moment Matrices under Matrix Means, 202

8.15. Simultaneous Optimality under Matrix Means, 203 8.16. Matrix Mean Optimality for Component Subsets, 203 8.17. Moore-Penrose Matrix Inversion, 204

8.18. Matrix Mean Optimality for Rank Deficient Subsystems, 205 8.19. Matrix Mean Optimality in Two-Way Classification

Models, 206 Exercises, 209

9. D-, A-, E-, T-Optimality 210

9.1. D-, A-, E-, T-Optimality, 210 9.2. G-Criterion, 210

9.3. Bound for Global Optimality, 211 9.4. The Kiefer-Wolfowitz Theorem, 212

9.5. D-Optimal Designs for Polynomial Fit Models, 213 9.6. Arcsin Support Designs, 217

9.7. Equivalence Theorem for A-Optimality, 221 9.8. L-Criterion, 222

9.9. A-Optimal Designs for Polynomial Fit Models, 223 9.10. Chebyshev Polynomials, 226

(14)

CONTENTS Xlll 9.12. Scalar Optimality in Polynomial Fit Models, I, 229

9.13. E-Optimal Designs for Polynomial Fit Models, 232 9.14. Scalar Optimality in Polynomial Fit Models, II, 237 9.15. Equivalence Theorem for T-Optimality, 240

9.16. Optimal Designs for Trigonometric Fit Models, 241 9.17. Optimal Designs under Variation of the Model, 243

Exercises, 245

10. Admissibility of Moment and Information Matrices 247

10.1. Admissible Moment Matrices, 247 10.2. Support Based Admissibility, 248 10.3. Admissibility and Completeness, 248

10.4. Positive Polynomials as Quadratic Forms, 249 10.5. Loewner Comparison in Polynomial Fit Models, 251 10.6. Geometry of the Moment Set, 252

10.7. Admissible Designs in Polynomial Fit Models, 253

10.8. Strict Monotonicity, Unique Optimality, and Admissibility, 256 10.9. E-Optimality and Admissibility, 257

10.10. T-Optimality and Admissibility, 258

10.11. Matrix Mean Optimality and Admissibility, 260 10.12. Admissible Information Matrices, 262

10.13. Loewner Comparison of Special C-Matrices, 262 10.14. Admissibility of Special C-Matrices, 264

10.15. Admissibility, Minimaxity, and Bayes Designs, 265 Exercises, 266

11. Bayes Designs and Discrimination Designs 268

11.1. Bayes Linear Models with Moment Assumptions, 268 11.2. Bayes Estimators, 270

11.3. Bayes Linear Models with Normal-Gamma Prior Distributions, 272

11.4. Normal-Gamma Posterior Distributions, 273 11.5. The Bayes Design Problem, 275

11.6. General Equivalence Theorem for Bayes Designs, 276 11.7. Designs with Protected Runs, 277

11.8. General Equivalence Theorem for Designs with Bounded Weights, 278

11.9. Second-Degree versus Third-Degree Polynomial Fit Models, I, 280

(15)

11.10. Mixtures of Models, 283

11.11. Mixtures of Information Functions, 285

11.12. General Equivalence Theorem for Mixtures of Models, 286 11.13. Mixtures of Models Based on Vector Means, 288

11.14. Mixtures of Criteria, 289

11.15. General Equivalence Theorem for Mixtures of Criteria, 290 11.16. Mixtures of Criteria Based on Vector Means, 290

11.17. Weightings and Scalings, 292

11.18. Second-Degree versus Third-Degree Polynomial Fit Models, II, 293

11.19. Designs with Guaranteed Efficiencies, 296

11.20. General Equivalence Theorem for Guaranteed Efficiency Designs, 297

11.21. Model Discrimination, 298

11.22. Second-Degree versus Third-Degree Polynomial Fit Models, III, 299

Exercises, 302

12. Efficient Designs for Finite Sample Sizes 304

12.1. Designs for Finite Sample Sizes, 304 12.2. Sample Size Monotonicity, 305

12.3. Multiplier Methods of Apportionment, 307 12.4. Efficient Rounding Procedure, 307

12.5. Efficient Design Apportionment, 308 12.6. Pairwise Efficiency Bound, 310 12.7. Optimal Efficiency Bound, 311 12.8. Uniform Efficiency Bounds, 312 12.9. Asymptotic Order O(n-l), 314

12.10. Asymptotic Order O(n-2), 315

12.11. Subgradient Efficiency Bounds, 317

12.12. Apportionment of D-Optimal Designs in Polynomial Fit Models, 320

12.13. Minimal Support and Finite Sample Size Optimality, 322 12.14. A Sufficient Condition for Completeness, 324

12.15. A Sufficient Condition for Finite Sample Size D-Optimality, 325

12.16. Finite Sample Size D-Optimal Designs in Polynomial Fit Models, 328

(16)

CONTENTS XV 13. Invariant Design Problems 331

13.1. Design Problems with Symmetry, 331 13.2. Invariance of the Experimental Domain, 335

13.3. Induced Matrix Group on the Regression Range, 336 13.4. Congruence Transformations of Moment Matrices, 337 13.5. Congruence Transformations of Information Matrices, 338 13.6. Invariant Design Problems, 342

13.7. Invariance of Matrix Means, 343 13.8. Invariance of the D-Criterion, 344 13.9. Invariant Symmetric Matrices, 345

13.10. Subspaces of Invariant Symmetric Matrices, 346 13.11. The Balancing Operator, 348

13.12. Simultaneous Matrix Improvement, 349 Exercises, 350

14. Kiefer Optimality 352

14.1. Matrix Majorization, 352

14.2. The Kiefer Ordering of Symmetric Matrices, 354 14.3. Monotonic Matrix Functions, 357

14.4. Kiefer Optimality, 357

14.5. Heritability of Invariance, 358

14.6. Kiefer Optimality and Invariant Loewner Optimality, 360 14.7. Optimality under Invariant Information Functions, 361 14.8. Kiefer Optimality in Two-Way Classification Models, 362 14.9. Balanced Incomplete Block Designs, 366

14.10. Optimal Designs for a Linear Fit over the Unit Cube, 372 Exercises, 379

15. Rotatability and Response Surface Designs 381

15.1. Response Surface Methodology, 381 15.2. Response Surfaces, 382

15.3. Information Surfaces and Moment Matrices, 383 15.4. Rotatable Information Surfaces and Invariant Moment

Matrices, 384

15.5. Rotatability in Multiway Polynomial Fit Models, 384 15.6. Rotatability Determining Classes of Transformations, 385 15.7. First-Degree Rotatability, 386

(17)

15.9. Rotatable First-Degree Moment Matrices, 388 15.10. Kiefer Optimal First-Degree Moment Matrices, 389 15.11. Two-Level Factorial Designs, 390

15.12. Regular Simplex Designs, 391

15.13. Kronecker Products and Vectorization Operator, 392 15.14. Second-Degree Rotatability, 394

15.15. Rotatable Second-Degree Symmetric Matrices, 396 15.16. Rotatable Second-Degree Moment Matrices, 398 15.17. Rotatable Second-Degree Information Surfaces, 400 15.18. Central Composite Designs, 402

15.19. Second-Degree Complete Classes of Designs, 403 15.20. Measures of Rotatability, 405

15.21. Empirical Model-Building, 406 Exercises, 406

Comments and References 408

1. Experimental Designs in Linear Models, 408 2. Optimal Designs for Scalar Parameter Systems, 410 3. Information Matrices, 410

4. Loewner Optimality, 412 5. Real Optimality Criteria, 412 6. Matrix Means, 413

7. The General Equivalence Theorem, 414

8. Optimal Moment Matrices and Optimal Designs, 417 9. D-, A-, E-, T-Optimality, 418

10. Admissibility of Moment and Information Matrices, 421 11. Bayes Designs and Discrimination Designs, 422

12. Efficient Designs for Finite Sample Sizes, 424 13. Invariant Design Problems, 425

14. Kiefer Optimality, 426

15. Rotatability and Response Surface Designs, 426

Biographies 428 1. Charles Loewner 1893-1968, 428 2. Gustav Elfving 1908-1984, 430 3. Jack Kiefer 1924-1981, 430 Bibliography 432 Index 448

(18)

Preface to the Classics Edition

Research into the optimality theory of the design of statistical experiments originated around 1960. The first papers concentrated on one specific optimality criterion or another. Before long, when interrelations between these criteria were observed, the need for a unified approach emerged. Invoking tools from convex optimization theory, the optimal design problem is indeed amenable to a fairly complete solution. This is the topic of Optimal Design of Experiments, and over the years the material developed here has proved comprehensive, useful, and stable. It is a pleasure to see the book reprinted in the SIAM Classics in Applied Mathematics series.

Ever since the inception of optimal design theory, the determinant of the mo-ment matrix of a design was recognized as a very specific criterion function. In fact, determinant optimality in polynomial fit models permits an analysis other than the one presented here, based on canonical moments and classical polynomials. This alternate part of the theory is developed by H. DETTE and W.J. STUDDEN in their monograph The Theory of Canonical Moments with Applications in Statistics,

Probability, and Analysis, and the references listed there complement and update the

bibliography given here.

Since the book's initial publication in 1993, its results have been put to good use in deriving optimal designs on the circle, optimal mixture designs, or optimal designs in other linear statistical models. However, many practical design problems of applied statistics are inherently nonlinear. Even then, local linearization may open the way to apply the present results, thus aiding in identifying good, practical designs.

FRIEDRICH PUKELSHEIM

Augsburg, Germany October 2005

(19)

(20)

Preface

... dans ce meilleur des [modeles] possibles ... tout est au mieux.

Candide (1759), Chapitre I, VOLTAIRE

The working title of the book was a bit long, Optimality Theory of Experi-mental Designs in Linear Models, but focused on two pertinent points. The setting is the linear model, the simplest statistical model, where the results are strongest. The topic is design optimality, de-emphasizing the issue of design construction. A more detailed Outline of the Book follows the Contents.

The design literature is full of fancy nomenclature. In order to circumvent expert jargon I mainly speak of a design being -optimal for K 'Q in H, that is, being optimal under an information function , for a parameter system of interest K'6, in a class of competing designs. The only genuinely new notions that I introduce are Loewner optimality (because it refers to the Loewner matrix ordering) and Kiefer optimality (because it pays due homage to the man who was a prime contributor to the topic).

The design problems originate from statistics, but are solved using special tools from linear algebra and convex analysis, such as the information matrix mapping of Chapter 3 and the information functions of Chapter 5. I have refrained from relegating these tools into a set of appendices, at the expense of some slowing of the development in the first half of the book. Instead, the auxiliary material is developed as needed, and it is hoped that the exposition conveys some of the fascination that grows out of merging three otherwise distinct mathematical disciplines.

The result is a unified optimality theory that embraces an amazingly wide variety of design problems. My aim is not encyclopedic coverage, but rather to outline typical settings such as D-, A-, and E-optimal polynomial regression designs, Bayes designs, designs for model discrimination, balanced incomplete block designs, or rotatable response surface designs. Pulling together formerly separate entities to build a greater community will always face opponents who fear an assault on their way of thinking. On the contrary, my intention is constructive, to generate a frame for those design problems that share xix

(21)

a common goal. The goal of investigating optimal, theoretical designs is to provide a gauge for identifying efficient, practical designs.

Il meglio e l'inimico del bene. Dictionnaire Philosophique (1770), Art Dramatique, VOLTAIRE

ACKNOWLEDGMENTS

The writing of this book became a pleasure when I began experiencing en-couragement from so many friends and colleagues, ranging from good advice of how to survive a book project, to the tedious work of weeding out wrong theorems. Above all I would like to thank my Augsburg colleague Norbert Gaffke who, with his vast knowledge of the subject, helped me several times to overcome paralyzing deadlocks. The material of the book called for a number of research projects which I could only resolve by relying on the competence and energy of my co-authors. It is a privilege to have cooper-ated with Norman Draper, Sabine Rieder, Jim Rosenberger, Bill Studden, and Ben Torsney, whose joint efforts helped shape Chapters 15, 12, 11, 9, 8, respectively.

Over the years, the manuscript has undergone continuous mutations, as a reaction to the suggestions of those who endured the reading of the early drafts. For their constructive criticism I am grateful to Ching-Shui Cheng, Holger Dette, Berthold Heiligers, Harold Henderson, Olaf Krafft, Rudolf Mathar, Wolfgang Nather, Ingram Olkin, Andrej Pazman, Norbert Schmitz, Shayle Searle, and George Styan. The additional chores of locating typos, detecting doubly used notation, and searching for missing definitions was undertaken by Markus Abt, Wolfgang Bischoff, Kenneth Nordstrom, Ingolf Terveer, and the students of various classes I taught from the manuscript. Their labor turned a manuscript that initially was everywhere dense in error into one which I hope is finally everywhere dense in content.

Adalbert Wilhelm carried out most of the calculations for the numeri-cal examples; Inge Dotsch so cheerfully kept retyping what seemed in final form. Ingo Eichenseher and Gerhard Wilhelms contributed the public do-main postscript driver dvilw to produce the exhibits. Sol Feferman, Timo Makelainen, and Dooley Kiefer kindly provided the photographs of Loewner, Elfving, and Kiefer in the Biographies. To each I owe a debt of gratitude.

Finally I wish to acknowledge the support of the Volkswagen-Stiftung, Hannover, for supporting sabbatical leaves with the Departments of Statis-tics at Stanford University (1987) and at Penn State University (1990), and granting an Akademie-Stipendium to help finish the project.

FRIEDRICH PUKELSHEIM Augsburg, Germany

(22)

List of Exhibits

1.1 The statistical linear model, 3 1.2 Convex cones in the plane R2, 11

1.3 Orthogonal decompositions induced by a linear mapping, 14 1.4 Orthogonal and oblique projections, 24

1.5 An experimental design worksheet, 28 1.6 A worksheet with run order randomized, 28

1.7 Experimental domain designs, and regression range designs, 32 2.1 The ice-cream cone, 38

2.2 Two Elfving sets, 43 2.3 Cylinders, 45

2.4 Supporting hyperplanes to the Elfving set, 50

2.5 Euclidean balls inscribed in and circumscribing the Elfving set, 55

3.1 ANOVA decomposition, 71

3.2 Regularization of the information matrix mapping, 81 3.3 Discontinuity of the information matrix mapping, 84 4.1 Penumbra, 108

5.1 Unit level sets, 121

6.1 Conjugate numbers, p + q = pq, 148 7.1 Subgradients, 159

7.2 Normal vectors to a convex set, 160 7.3 A hierarchy of equivalence theorems, 178

(23)

8.1 Support points for a linear fit over the unit square, 194 9.1 The Legendre polynomials up to degree 10, 214

9.2 Polynomial fits over [-1; 1]: -optimal designs for 0 in T, 218

9.3 Polynomial fits over [—1;!]: -optimal designs for 6 in 219

9.4 Histogram representation of the design , 220 9.5 Fifth-degree arcsin support, 220

9.6 Polynomial fits over [-1;1]: -optimal designs for 6 in T, 224

9.7 Polynomial fits over [-1;1]: -optimal designs for 6 in 225

9.8 The Chebyshev polynomials up to degree 10, 226 9.9 Lagrange polynomials up to degree 4, 228 9.10 E-optimal moment matrices, 233

9.11 Polynomial fits over [-1;1]: -optimal designs for 8 in T, 236

9.12 Arcsin support efficiencies for individual parameters 240 10.1 Cuts of a convex set, 254

10.2 Line projections and admissibility, 259 10.3 Cylinders and admissibility, 261

11.1 Discrimination between a second- and a third-degree model, 301

12.1 Quota method under growing sample size, 306 12.2 Efficient design apportionment, 310

12.3 Asymptotic order of the E-efficiency loss, 317 12.4 Asymptotic order of the D-efficiency loss, 322

12.5 Nonoptimality of the efficient design apportionment, 323 12.6 Optimality of the efficient design apportionment, 329 13.1 Eigenvalues of moment matrices of symmetric three-point

designs, 334

14.1 The Kiefer ordering, 355

(24)

LIST OF EXHIBITS XX111

14.3 Uniform vertex designs, 373 14.4 Admissible eigenvalues, 375

15.1 Eigenvalues of moment matrices of central composite designs, 405

(25)

Interdependence of Chapters

1 Experimental Designs

in Linear Models 2 Optimal Designs forScalar Parameter Systems

3 Information Matrices 4 Loewner Optimality

5 Real Optimality Criteria 6 Matrix Means

7 The General Equivalence Theorem

8 Optimal Moment Matrices and Optimal Designs

9 D-, A-, E-, T-Optimality

10 Admissibility of Moment and Information Matrices

11 Bayes Designs and Discrimination Designs

12 Efficient Designs for Finite Sample Sizes

13 Invariant Design Problems

14 Kiefer Optimality

15 Rotatability and

Response Surface Designs XXIV

(26)

Outline of the Book

CHAPTERS 1, 2, 3, 4: LINEAR MODELS AND INFORMATION MATRICES

Chapters 1 and 3 are basic. Chapter 1 centers around the Gauss-Markov Theorem, not only because it justifies the introduction of designs and their moment matrices in Section 1.24. Equally important, it permits us to define in Section 3.2 the information matrix for a parameter system of interest K'0 in a way that best supports the general theory. The definition is extended to rank deficient coefficient matrices K in Section 3.21. Because of the dual purpose the Gauss-Markov Theorem is formulated as a general result of matrix algebra. First results on optimal designs are presented in Chapter 2, for parameter subsystems that are one-dimensional, and in Chapter 4, in the case where optimality can be achieved relative to the Loewner ordering among information matrices. (This is rare, see Section 4.7.) These results also follow from the General Equivalence Theorem in Chapter 7, whence Chapters 2 and 4 are not needed for their technical details.

CHAPTERS 5,6: INFORMATION FUNCTIONS

Chapters 5 and 6 are reference chapters, developing the concavity properties of prospective optimality criteria. In Section 5.8, we introduce information functions which by definition are required to be positively homogeneous, superadditive, nonnegative, nonconstant, and upper semicontinuous. Infor-mation functions submit themselves to pleasing functional operations (Sec-tion 5.11), of which polarity (Sec(Sec-tion 5.12) is crucial for the sequel. The most important class of information functions are the matrix means with pa-rameter They are the topic of Chapter 6, starting from the classical D-, A-, E-criterion as the special cases respectively.

(27)

CHAPTERS 7, 8,12: OPTIMAL APPROXIMATE DESIGNS AND EFFICIENT DISCRETE DESIGNS

The General Equivalence Theorem 7.14 is the key result of optimal design theory, offering necessary and sufficient conditions for a design's moment matrix M to be -optimal for K' in M. The generic result of this type is due to Kiefer and Wolfowitz (1960), concerning D-optimality for 6 in M . The present theorem is more general in three respects, in allowing for the competing moment matrices to form a set M which is compact and con-vex, rather than restricting attention to the largest possible set M of all moment matrices, in admitting parameter subsystems K' rather than con-centrating on the full parameter vector 6, and in permitting as optimality criterion any information function , rather than restricting attention to the classical D-criterion. Specifying these quantitites gives rise to a number of corollaries which are discussed in the second half of Chapter 7. The first half is a self-contained exposition of arguments which lead to a proof of the Gen-eral Equivalence Theorem, based on subgradients and normal vectors to a convex set. Duality theory of convex analysis might be another starting point; here we obtain a duality theorem as an intermediate step, as Theorem 7.12. Yet another approach would be based on directional derivatives; however, their calculus is quite involved when it comes to handling a composition

C like the one underlying the optimal design problem.

Chapter 8 deals with the practical consequences which the General Equiv-alence Theorem implies about the support points xi, and the weights w, of

an optimal design The theory permits a weight w, to be any real number between 0 and 1, prescribing the proportion of observations to be drawn un-der xi. In contrast, a design for sample size n replaces wi by an integer n,-, as

the replication number for xi. In Chapter 12 we propose the efficient design

apportionment as a systematic and easy way to pass from wi, to ni. This

dis-cretization procedure is the most efficient one, in the sense of Theorem 12.7. For growing sample size AX, the efficiency loss relative to the optimal design stays bounded of asymptotic order n- 1; in the case of differentiability, the

order improves to n-2.

CHAPTERS 9,10,11: INSTANCES OF DESIGN OPTIMALITY

D-, A-, and E-optimal polynomial regression designs over the interval [—1; 1] are characterized and exhibited in Chapter 9. Chapter 10 discusses admis-sibility of the moment matrix of a polynomial regression design, and of the contrast information matrix of a block design in a two-way classifica-tion model. Prominent as these examples may be, it is up to Chapter 11 to exploit the power of the General Equivalence Theorem to its fullest. Var-ious sets of competing moment matrices are considered, such as Ma for

(28)

mix-OUTLINE OF THE BOOK XXVll ture model designs, {(M,... ,M): M M} for mixture criteria designs, and for designs with guaranteed efficiencies. And they are eval-uated using an information function that is a composition of a set of m information functions, together with an information function on the nonnegative orthant Rm.

CHAPTERS 13,14,15: OPTIMAL INVARIANT DESIGNS

As with other statistical problems, invariance considerations can be of great help in reducing the dimensionality and complexity of the general design problem, at the expense of handling some additional theoretical concepts. The foundations are laid in Chapter 13, investigating various groups and their actions as they pertain to an experimental domain design r, a regression range design a moment matrix M(£), an information matrix C/c(M), or an information function (C). The idea of "increased symmetry" or "greater balancedness" is captured by the matrix majorization ordering of Section 14.1. This concept is brought together with the Loewner matrix ordering to create the Kiefer ordering of Section 14.2: An information matrix C is at least as good as another matrix D, C > D, when relative to the Loewner ordering, C is above some intermediate matrix which is majorized by D, The concept is due to Kiefer (1975) who introduced it in a block design setting and called it universal optimality. We demonstrate its usefulness with balanced incomplete block designs (Section 14.9), optimal designs for a linear fit over the unit cube (Section 14.10), and rotatable designs for response surface methodology (Chapter 15).

The final Comments and References include historical remarks and men-tion the relevant literature. I do not claim to have traced every detail to its first contributor and I must admit that the book makes no mention of many other important design topics, such as numerical algorithms, orthogonal ar-rays, mixture designs, polynomial regression designs on the cube, sequen-tial and adaptive designs, designs for nonlinear models, robust designs, etc.

(29)

(30)

Errata

Page ±Line Text Correction

31 32 91 156 157 169 169 203 217 222 241 270 330 347 357 361 378 390 xxix + 12 Exh. 1.7 -11 -2 +11 +13 -7 -12 _7 -8 _2 +4 +3 -7 + 15 + 11 +9 +13,-3 Section 1.26

lower right: interchange

B~B, BB~ X = \C\ i E >0, GKCDCK'G' : s x k Section 1.25 1/2 and 1/6 B~ BK, BkB ~ \X\ = C j E >0, GKCDCK'G' + F : s x (s — k) d s i [in denominator] r [in numerator] Exhibit 9.4 K NND(s) Exhibit 9.2 Ks NND(k) a(jk) Ilm b(jk) lI1+m

(31)

(32)

(33)

(34)

C H A P T E R 1

Experimental Designs in

Linear Models

This chapter provides an introduction to experimental designs for linear mod-els. Two linear models are presented. The first is classical, having a dispersion structure in which the dispersion matrix is proportional to the identity matrix. The second model is more general, with a dispersion structure that does not impose any rank or range assumptions. The Gauss-Markov Theorem is for-mulated to cover the general model. The classical model provides the setting to introduce experimental designs and their moment matrices. Matrix algebra is reviewed as needed, with particular emphasis on nonnegative definite matri-ces, projectors, and generalized inverses. The theory is illustrated with two-way classification models, and models for a line fit, parabola fit, and polynomial fit.

1.1. DETERMINISTIC LINEAR MODELS

Many practical and theoretical problems in science treat relationships of the type

where the observed response or yield, y, is thought of as a particular value of a real-valued model function or response function, g, evaluated at the pair of arguments (t, 0). This decomposition reflects the distinctive role of the two arguments: The experimental conditions t can be freely chosen by the experimenter from a given experimental domain T, prior to running the ex-periment. The parameter system 6 is assumed to lie in a parameter domain ®, and is not known to the experimenter. This is paraphrased by saying that the experimenter controls t, whereas "nature" determines 6.

The choice of the function g is central to the model-building process. One

1

(35)

of the simplest relationships is the deterministic linear model

where f(t) = (f\(t), . . . ,/*(0) ' an^ 0 = (#i> • • • i #*) ' are vectors in ^-dimen-sional Euclidean space Rk. All vectors are taken to be column vectors, a prime

indicates transposition. Hence f(t)'B is the usual Euclidean scalar product, /(0'0 — £;<*/} (Ofy Linearity pertains to the parameter system 0, not to the experimental conditions t.

Linearity shifts the emphasis from the model function g to the regres-sion function f. Assuming that the experimenter knows both the regresregres-sion function / and the experimental conditions t, a compact notation results upon introducing the k x 1 regression vector x = /(?), and the regression range X = [f(t) : t £ T} C R*. From an applied point of view the exper-imental domain T plays a more primary role than the regression range X, but the latter is expedient for a consistent development. The linear model, in its deterministic form discussed so far, thus takes the simple form y = x'B.

1.2. STATISTICAL LINEAR MODELS

In many experiments the response can be observed only up to an additive random error e, distorting the model to become

Because of random error, repeated experimental runs typically lead to dif-ferent observed responses, even if the regression vector x and the parameter system 8 remain identical. Therefore any evaluation of the experiment can involve a statement on the distribution of the response only, rather than on any one of its specific values. A (statistical) linear model thus treats response and error as real-valued random variables Y and £, governed by a probability distribution P and satisfying the relationship

In this model, the term e may subsume quite diverse sources of error, ranging from random errors resulting from inaccuracies in the measuring devices, to systematic errors that are due to inappropriateness of a model function

(36)

1.3. CLASSICAL LINEAR MODELS WITH MOMENT ASSUMPTIONS 3

EXHIBIT 1.1 The statistical linear model. The response Y decomposes into the deterministic

mean effect x'Q plus the random error E.

1.3. CLASSICAL LINEAR MODELS WITH MOMENT ASSUMPTIONS

To proceed, we need to be more specific about the underlying distributional assumptions. For point estimation, the distributional assumptions solely per-tain to expectation and variance relative to the underlying distribution P,

For this reason 0 is called the mean parameter vector, while the model vari-ance a2 > 0 provides an indication of the variability inherent in the

observa-tion Y. Another way of expressing this is to say that the random error E has mean value zero and variance a2, neither of which depends on the regression

vector x nor on the parameter vector 0 of the mean response.

The k x 1 parameter vector 0 and the scalar parameter a2 comprise a

total of k +1 unknown parameter components. Clearly, for any reasonable inference, the number n of observations must be at least equal to k + 1. We consider a set of n observations,

with possibly different regression vectors jc, in experimental run /. The joint distribution of the n responses Yt is specified by assuming that they are

uncorrelated.

Considerable simplicity is gained by using vector notation. Let

denote the n x l response vector Y, the n x k model matrix X, and the n x l error vector £, respectively. (Henceforth the random quantities Y and E are n x l vectors rather than scalars!) The (i,y)th entry *|; of the matrix X is the

(37)

same as the ; th component of the regression vector jc,, that is, the regression vector jCj appears as the / th row of the model matrix X. The model equation thus becomes

With /„ as the n x n identity matrix, the model is succinctly represented by the expectation vector and dispersion matrix of y,

and is termed the classical linear model with moment assumptions.

In other words, the mean vector Ep[Y] is given by the linear relation-ship X6 between the regression vectors *!,...,*„ and the parameter vec-tor 0, while the dispersion matrix D/>[F] is in its classical, that is, simplest, form of being proportional to the identity matrix.

1.4. CLASSICAL LINEAR MODELS WITH NORMALITY ASSUMPTION

For purposes of hypothesis testing and interval estimation, assumptions on the first two moments do not suffice and the entire distribution of Y is re-quired. Hence in these cases there is a need for a classical linear model with normality assumption,

in which Y is assumed to be normally distributed with mean vector XB and dispersion matrix a2In. If the model matrix X is known then the normal

distribution P = N^.^ is determined by 8 and a2. We display these

pa-rameters by writing Ee.a2[- • •] in place of E/>[- • •], etc. Moreover, the letter P

soon signifies a projection matrix.

1.5. TWO-WAY CLASSIFICATION MODELS

The two-sample problem provides a simple introductory example. Consider two populations with mean responses a\ and a2. The observed responses

from the two populations are taken to have a common variance a2 and to

be uncorrelated. With replications y = !,...,«/ for populations / = 1,2 this yields a linear model

(38)

1.5. TWO-WAY CLASSIFICATION MODELS 5

Assembling the components into n x 1 vectors, with n = n\ + n^, we get

Here the n x 2 model matrix X and the parameter vector 6 are given by

with regression vectors x\ = Q and *2 = (i) repeated n\ and «2 times. The experimental design consists of the replication numbers n\ and n-i, telling the experimenter how many responses are to be observed from which population. It is instructive to identify the quantities of this example with those of the general theory. The experimental domain T is simply the two-element set {1,2} of population labels. The regression function takes values /(I) = Q and /(2) = (J) in R2, inducing the set X = {(J), (J)} as the regression range.

The generalization from two to a populations leads to the one-way clas-sification model. The model is still Y,; = a, + £t;, but the subscript ranges

turn into i = l , . . . , a and ; = !,...,«,. The mean parameter vector be-comes 0 = («!,..., «„)', and the experimental domain is T = {1,...,0}. The regression function / maps i into the /th Euclidean unit vector ei of

Ra, with /th entry one and zeros elsewhere. Hence the regression range is X = {el5... ,ea}. Further generalization is aided by a suitable terminology.

Rather than speaking of different populations, / = 1,..., a, we say that the "factor" population takes "levels" / = 1..., a. More factors than one occur in multiway classification models.

The two-way classification model with no interaction may serve as a pro-totype. Suppose level / of a first factor "A" has mean effect a/, while level j of a second factor "B" has mean effect )8y. Assuming that the two effects are

additive, the model reads

with replications i — 1,... ,n/;, for levels i = 1,... ,a of factor A and levels

j = 1,... ,b of factor B. The design problem now consists of choosing the replication numbers n,;. An extreme, but feasible, choice is n,; = 0, that is,

(39)

parameter vector 0 is the k x 1 vector (ai,..., aa, p\,..., ftb)', with k = a+b.

The experimental domain is the discrete rectangle T = (1,..., a} x {1,..., b}. The regression function / maps («',/) into (J), where e{ is the ith Euclidean

unit vector of Ra and d, is the ; th Euclidean unit vector of R*. We return to

this model in Section 1.27.

So far, the experimental domain has been a finite set; next it is going to be an interval of the real line R.

1.6. POLYNOMIAL FIT MODELS

Let us first look at a line fit model,

Intercept a and slope )8 form the parameter vector 6 of interest, whereas the experimental conditions f, come from an interval T C R. For the sake of concreteness, we think of t e T as a "dose level". The design problem then consists of determining how many and which dose levels f i,..., r/ are to be observed, and how often. If the experiment calls for nt replications

of dose level f,-, the subscript ranges in the model are / = 1,... ,n, for i = 1,... ,^. Here the regression function has values f(i) = (1,0'. generating a line segment embedded in the plane R2 as regression range X.

The parabola fit model has mean response depending on the dose level quadratically,

This changes the regression function to f(t) = (l,r,f2)', and the regression

range X turns into the segment of a parabola embedded in the space R3.

These are special instances of polynomial fit models of degree d > 1, the model equation becoming

The regression range X is a one-dimensional curve embedded in R*, with k = d +1. Often the description of the experiment makes it clear that the exper-imental condition is a single real variable /; a linear model for a line fit (parabola fit, polynomial fit of degree d) is then referred to as a first-degree model (second-degree model, d th-degree model).

This generalizes to the powerful class of m-way d th-degree polynomial fit models. In these models the experimental condition / = (fi,• • • ,fm)' has m

components, that is, the experimental domain T is a subset of Rm, and the

(40)

1.7. EUCLIDEAN MATRIX SPACE 7 For instance, a two-way third-degree model is given by

with i experimental conditions f, = (to,to)' € T C R2, and with subscript

ranges / = 1,..., nt; for / = 1,..., L As a second example consider the

three-way second-degree model

with i experimental conditions f, = (to, to, to)' e T C R3, and with subscript

ranges / = 1,..., n,- for i = 1,..., £. Both models have ten mean parameters. The two examples illustrate saturated models because they feature every possible dth-degree power or cross product of the variables 11,...,tm. In

general, a saturated m-way d th-degree model has

mean parameters. An instance of a nonsaturated two-way second-degree model is

with i experimental conditions tt = (to,to)' e T C R2, and with subscript

ranges / = 1,..., n, for / = 1,..., £.

The discussion of these examples is resumed in Section 1.27, after a proper definition of an experimental design.

1.7. EUCLIDEAN MATRIX SPACE

In a classical linear model, interest concentrates on inference for the mean parameter vector 6. The performance of appropriate statistical procedures tends to be measured by dispersion matrices, moment matrices, or informa-tion matrices. This calls for a review of matrix algebra. All matrices used here are real.

First let us recall that the trace of a square matrix is the sum of its diagonal entries. Hence a square matrix and its transpose have identical traces. An-other important property is that, under the trace operator, matrices commute

(41)

provided they are conformable,

We often apply this rule to quadratic forms given by a symmetric matrix A, in using x'Ax = trace Axx' = trace xx'A, as is convenient in a specific context. Let R"** denote the linear space of real matrices with n rows and k columns. The Euclidean matrix scalar product

turns Rn*k into a Euclidean space of dimension nk. For k = 1, we recover the

Euclidean scalar product for vectors in W. The symmetry of scalar products, trace A 'B = (A,B) = (B,A) = trace B'A, reproduces the property that a square matrix and its transpose have identical traces. Commutativity under the trace operator yields (A,B) = trace A 'B = trace BA1 — (B',A') = (A',B'), that is, transposition preserves the scalar products between the matrix spaces of reversed numbers of rows and columns, Rnxk and R*xw.

In general, although not always, our matrices have at least as many rows as columns. Since we have to deal with extensive matrix products, this facilitates a quick check that factors properly conform. It is also in accordance with writing vectors of Euclidean space as columns. Notational conventions that are similarly helpful are to choose Greek letters for unknown parameters in a statistical model, and to use uppercase and lowercase letters to discriminate between a random variable and any one of its values, and between a matrix and any one of its entries.

Because of their role as dispersion operators, our matrices often are sym-metric. We denote by Sym(A;) the subspace of symmetric matrices, in the space Ukxk of all square, that is, not necessarily symmetric, matrices. Recall

from matrix algebra that a symmetric k x k matrix A permits an eigenvalue decomposition

The real numbers A i , . . . , A j t are the eigenvalues of A counted with their respective multiplicities, and the vectors z\, • . . , z* € ^fc f°rm an orthonormal

system of eigenvectors. In general, such a decomposition fails to be unique, since if the eigenvalue A; has multiplicity greater than one then many choices

for the eigenvectors Zj become feasible.

The second representation of an eigenvalue decomposition, A = Z'&^Z, assembles the pertinent quantities in a slightly different way. We define the operator AA by requiring that it creates a diagonal matrix with the argument vector A = (A1 ?..., \k)' on the diagonal. The orthonormal vectors z\, • - - , Zk

(42)

1.8. NONNEGATIVE DEFINITE MATRICES 9 form the k x k matrix Z' = (z\,..., Zk}-, whence Z' is an orthogonal matrix. The equality with the first representation now follows from

Matrices that have matrices or vectors as entries, such as Z', are termed block matrices. They provide a convenient technical tool in many areas. The algebra of block matrices parallels the familiar algebra of matrices, and may be verified as needed.

In the space Sym(/c), the subsets of nonnegative definite matrices, NND(fc), and of positive definite matrices, PD(/c), are central to the sequel. They are defined through

Of the many ways of characterizing nonnegative definiteness or positive def-initeness, frequent use is made of the following.

1.8. NONNEGATIVE DEFINITE MATRICES

Lemma. Let A be a symmetric k x k matrix with smallest eigenvalue

Then we have

Proof. Assume A e NND(£), and choose an eigenvector z e R* of norm 1 corresponding to AminCA). Then we obtain 0 < z 'Az = Amjn(^)z 'z =

(43)

This yields trace

To complete the circle, we verify for all by

choosing

For positive defimteness the arguments follow the same lines upon ob-serving that we have provided

The set of all nonnegative definite matrices NND(fc) has a beautiful geo-metrical shape, as follows.

1.9. GEOMETRY OF THE CONE OF NONNEGATIVE DEFINITE MATRICES

Lemma. The set NND(A:) is a cone which is convex, pointed, and closed,

and has interior PD(£) relative to the space Sym(A:).

Proof. The proof is by first principles, recalling the definition of the

prop-erties involved. For A e NND(A;) and 5 > 0 evidently 8A € NND(fc), thus NND(fc) is a cone. Next for A, B e NND(fc) we clearly have A+B e NND(fc) since

Because NND(fc) is a cone, we may replace A by (1 - a)A and B by aB, where a lies in the open interval (0;1). Hence given any two matrices A and B the set NND(&) also includes the straight line (1 - a)A + aB from A to B, and this establishes convexity. If A e NND(fc) and also —A £ NND(fc), then A = 0, whence the cone NND(&) is pointed.

The remaining two properties, that NND(fc) is closed and has PD(fc) for its interior, are topological in nature. Let

be the closed unit ball in Sym(k) under the Euclidean matrix scalar product. Replacing B e Sym(fc) by an eigenvalue decomposition £;Ayv;j>y' yields

trace B2 — ]T; A?; thus B e B has eigenvalues Ay satisfying |A,| < 1. It follows

that B e B fulfills x'Bx < \x'Bx\ < £; jAyKjc'^)2 < x'^y^^x = x'x for

all x e IR*.

A set is closed when its complement is open. Therefore we pick an arbi-trary k x k matrix A which is symmetric but fails to be nonnegative definite. By definition, x 'Ax < 0 for some vector x e R*. Define 5 = -x'Ax/(2x'x] > 0. For every matrix B 6 B, we then have

(44)

1.10. THE LOEWNER ORDERING OF SYMMETRIC MATRICES 11

EXHIBIT 1.2 Convex cones in the plane U2. Left: the linear subspace generated by x e R2 is

a closed convex cone that is not pointed. Right: the open convex cone generated by x,y 6 R2,

together with the null vector, forms a pointed cone that is neither open nor closed.

Thus the set A + SB is included in the complement of NND(Jt), and it follows that the cone NND(fc) is closed.

Interior points are identified similarly. Let A e intNND(fc), that is, A + SB C NND(fc) for some 8 > 0. If x ± 0 then the choice B = -xx'/x'x e B leads to

Hence every matrix A interior to NND(/c) is positive definite, intNND(fc) C PD(fc). It remains to establish the converse inclusion. Every matrix A € PD(fc) has 0 < Amin(A) = 6, say. For B e B and x e R*, we obtain x'Bx >

-jc'jc, and

Thus A + 3BC NND(fc) shows that A is interior to NND(fc).

There are, of course, convex cones that are not pointed but closed, or pointed but not closed, or neither pointed nor closed. Exhibit 1.2 illustrates two such instances in the plane R2.

1.10. THE LOEWNER ORDERING OF SYMMETRIC MATRICES

True beauty shines in many ways, and order is one of them. We prefer to view the closed cone NND(fc) of nonnegative definite matrices through the

(45)

partial ordering >, defined on Sym(k) by

which has come to be known as the Loewner ordering of symmetric matrices. The notation B < A in place of A > B is self-explanatory. We also define the closely related variant > by

which is based on the open cone of positive definite matrices.

The geometric properties of the set NND(fc) of being conic, convex, point-ed, and clospoint-ed, translate into related properties for the Loewner ordering:

The third property in this list says that the Lowener ordering is antisymmetric. In addition, it is reflexive and transitive,

Hence the Loewner ordering enjoys the three properties that constitute a partial ordering.

For scalars, that is, A: = 1, the Loewner ordering reduces to the fa-miliar total ordering of the real line. Or the other way around, the total ordering > of the real line U is extended to the partial ordering > of the matrix spaces Sym(/c), with k > 1. The crucial distinction is that, in general, two matrices may not be comparable. An example is furnished by

for which neither A > B nor B > A holds true.

Order relations always call for a study of monotonic functions.

1.11. MONOTONIC MATRIX FUNCTIONS

We consider functions that have a domain of definition and a range that are equipped with partial orderings. Such functions are called isotonic when they

(46)

1.12. RANGE AND NULLSPACE OF A MATRIX 13 are order preserving, and antitonic when they are order reversing. A function is called monotonic when it is isotonic or antitonic. Two examples may serve to illustrate these concepts.

A first example is supplied by a linear form A *-+ trace AB on Sym(A:), determined by a matrix B e Sym(fc). If this linear form is isotonic relative to the Loewner ordering, then A > 0 implies trace A B > 0, and Lemma 1.8 proves that the matrix B is nonnegative definite. Conversely, if B is nonneg-ative definite and A > C, then again Lemma 1.8 yields trace(^4 - C)B > 0, that is, trace AB > trace CB. Thus a linear form A i-> trace AB is isotonic relative to the Loewner ordering if and only if B is nonnegative definite. In particular the trace itself is isotonic, A •-> trace A, as follows with B = Ik.

It is an immediate consequence that the Euclidean matrix norm \\A\\ — (trace A2)1/2 is an isotonic function from the closed cone NND(fc) into the

real line. For if A > B > 0, then we have

As a second example, matrix inversion A l is claimed to be an antitonic

mapping from the open cone PD(fc) into itself. For if A > B > 0 then we get

Pre- and postmultiplication by A ] gives A~l < B l, as claimed.

A minimization problem relative to the Loewner ordering is taken up in the Gauss-Markov Theorem 1.19. Before turning to this topic, we review the role of matrices when they are interpreted as linear mappings.

1.12. RANGE AND NULLSPACE OF A MATRIX

A rectangular matrix A e Rnxk may be identified with a linear mapping

carrying x e Rk into Ax e IR". Its range or column space, and its nullspace

or kernel are

The range is a subspace of the image space Rn. The nullspace is a subspace

of the domain of definition Rk. The rank and nullity of A are the dimensions

of the range of A and of the nullspace of A, respectively.

If the matrix A is symmetric, then its rank coincides with the number of nonvanishing eigenvalues, and its nullity is the number of vanishing eigenval-ues. Symmetry involves transposition, and transposition indicates the pres-ence of a scalar product (because A' is the unique matrix B that satisfies

(47)

EXHIBIT 13 Orthogonal decompositions induced by a linear mapping. Range and nullspace

of a matrix A € R"** and of its transpose A' orthogonally decompose the domain of defini-tion Kk and the image space R".

(Ax,y) = (x,By) for all x,y). In fact, Euclidean geometry provides the fol-lowing vital connection that the nullspace of the transpose of a matrix is the orthogonal complement of its range. Let

denote the orthogonal complement of a subspace L of the linear space R".

1.13. TRANSPOSITION AND ORTHOGONALITY

Lemma. Let A be an n x k matrix. Then we have

Proof. A few transcriptions establish the result:

Replacing A' by A yields nullspace A = (range A')^. Thus any n x k matrix A comes with two orthogonal decompositions, of the domain of defi-nition R*, and of the image space R". See Exhibit 1.3.

(48)

1.15. DISTRIBUTIONAL SUPPORT OF LINEAR MODELS 15

1.14. SQUARE ROOT DECOMPOSITIONS OF A NONNEGATIVE DEFINITE MATRIX

As a first application of Lemma 1.13 we investigate square root decomposi-tions of nonnegative definite matrices. If V is a nonnegative definite n x n matrix, a representation of the form

is called a square root decomposition of V, and U is called a square root of V. Various such decompositions are easily obtained from an eigenvalue decomposition

For instance, a feasible choice is U = (±v/Alzi,...,±v/&«) e Rnx". If V

has nonvanishing eigenvalues A I , . . . , A*, other choices are U = (±\f\iz\,..., ±v/AJtZjt) e R"x/c; here V = UU' is called a full rank decomposition for the

reason that the square root U has full column rank.

Every square root U of V has the same range as V, that is,

To prove this, we use Lemma 1.13, in that the ranges of V and U coin-cide if and only if the nullspaces of V and U' are the same. But U 'z = 0 clearly implies Vz = 0. Conversely, Vz = 0 entails 0 = z'Vz = z'UU'z = (U'z)'(U'z), and thus forces U'z = 0.

The range formula, for every n x k matrix X,

is a direct consequence of a square root decomposition V = UU', since range V = range U implies range X'V — range X'U = range X'VX C range X'V.

Another application of Lemma 1.13 is to clarify the role of mean vectors and dispersion matrices in linear models.

1.15. DISTRIBUTIONAL SUPPORT OF LINEAR MODELS

Lemma. Let Y be an n x 1 random vector with mean vector //, and

dispersion matrix V. Then we have

(49)

that is, the distribution of Y is concentrated on the affine subspace that results if the linear subspace range V is shifted by the vector /A.

Proof. The assertion is true if V is positive definite. Otherwise we must

show that Y — p, lies in the proper subspace range V with probability 1. In view of Lemma 1.13, this is the same as Y-JJL _L nullspace V with probability 1. Here nullspace V may be replaced by any finite set {z\,. • . , Zk} of vectors spanning it. For each j = 1,..., k we obtain

thus Y — n _L Zj with probability 1. The exceptional nullsets may depend on the subscript;', but their union produces a global nullset outside of which

is orthogonal to as claimed.

In most applications the mean vector /A is a member of the range of V. Then the affine subspace p + range V equals range V and is actually a lin-ear subspace, so that Y falls into the range of V with probability 1. In a classical linear model as expounded in Section 1.3, the mean vector fj, is of the form X6 with unknown parameter system 6. Hence the containment IJi — Xde range V holds true for all vectors 8 provided

Such range inclusion conditions deserve careful study as they arise in many places. They are best dealt with using projectors, and projectors are natural companions of generalized inverse matrices.

1.16. GENERALIZED MATRIX INVERSION AND PROJECTIONS

For a rectangular matrix A € Rnxk, any matrix G e Rkxn fulfilling AGA = A

is called a generalized inverse of A. The set of all generalized inverses of A,

is an affine subspace of the matrix space R*XAI, being the solution set of an

inhomogeneous linear matrix equation. If a relation is invariant to the choice of members in A~, then we often replace the matrix G by the set A~, For instance, the defining property may be written as A A'A = A.

A square and nonsingular matrix A has its usual inverse A~l for its unique

generalized inverse, A' = {A~1}. In this sense generalized matrix inversion

is a generalization of regular matrix inversion.

Our explicit convention of treating A~ as a set of matrices is a bit un-usual, even though it is implicit in all of the work on generalized matrix

(50)

1.17. RANGE INCLUSION LEMMA 17 inverses. Namely, often only results that are invariant to the specific choice of a generalized inverse are of interest. For example, in the following lemma, the product X'GX is the same for every generalized inverse G of V. We indicate this by inserting the set V~ in place of the matrix G.

However, the central optimality result for experimental designs is of op-posite type. The General Equivalence Theorem 7.14 states that a certain property holds true, not for every, but for some generalized inverse. In fact, the theorem becomes false if this point is missed. Our notation helps to alert us to this pitfall.

A matrix P e R"x" is called a projector onto a subspace /C C U" when P

is idempotent, that is, P2 = P, and has /C for its range.

Let us verify that the following characterizing interrelation between gen-eralized inverses and projectors holds true:

For the direct part, note first that AG is idempotent. Moreover the inclusions

show that the range of AG and the range of A coincide. For the converse part, we use that the projector AG has the same range as the matrix A. Thus every vector Ax with x G Rk has a representation AGy with y € IR", whence AGAx = AGAGy = AGy — Ax. Since x can be chosen arbitrarily, this establishes AGA = A.

The intimate relation between range inclusions and projectors, alluded to in Section 1.15, can now be made more explicit.

1.17. RANGE INCLUSION LEMMA

Lemma. Let X be an n x k matrix and V be an n x s matrix. Then we

have

If range X C range V and V is a nonnegative definite n x n matrix, then the product

does not depend on the choice of generalized inverse for V, is nonnegative definite, and has the same range as X' and the same rank as X.

Proof. The range of X is included in the range of V if and only if A' —