Statistical Methods in HYDROLOGY-Haan

(1)

Statistical Methods,in

HYDROLOG*

(2)

(3)

S t a t i s t i c a l Methods

i n

HYDROLOGY

Second Edition

CHARLES

T. HAAN

Iowa State

Press

(4)

CHARLES

T.

HAAN is Regents Professor and Sarkeys Distinguished Professor, Emeritus, from the Department of Biosystems and A,gicultural Engineering, Oklahoma State University, Still- water.

O 1974 Iowa State University Press O 2002 Iowa State Press

Iowa State Press

2121 State Avenue, Ames, Iowa 50014

Orders: 1-800-862-6657 Office: 1-515-292-0140 Fax: 1-5 15-292-3348

Web site: www.iowastatepress.com

Authorization to photocopy iteins for internal or personal use, or the internal or personal use of specific clients, is granted by Iowa State Press, provided that the base fee of $.lo per copy is paid directly to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license by CCC, a separate system of payments has been arranged. The fee code for users of the Transactional Reporting Service is 0-8 138-1503- 712002 $. 10.

@Printed on acid-free paper in the United States of America

First edition, 1974 Second edition, 2002

Library of Congress Cataloging-in-Publication Data Haan, C. T. (Charles Thomas)

Statistical methods in hydrology / Charles T. Haan.-2nd ed. p. cm.

Includes bibliographical references and index. ISBN 0-8 138- 1503-7 (acid-free paper)

1. Hydrology-Statistical methods. I. Title. GB656.2.S7 H3 2002

55 1.48 '0 7 ' 2 7 - 4 ~ 2 1 2002000060 The last digit is the print number: 9 8 7 6 5 4 3 2 1

(5)

my constant companion, friend, helpmate, and source of

encouragement for the past

34 years.

Secondly, I dedicate the book to my two daughters,

Patti and Pam, and to my son Chris, his wzye Rie,

and their two children, Katrina and Daniel.

nirdly, I dedicate the book to my parents, Charles and Dorothy,

who gaue me a start in life and taught me

many of the values I hold dear:

Finally, the book is dedicated to the many graduate students

that I have worked with. They have been a constant

(6)

(7)

. . .

PREFACE TO SECOND EDITION xv

.

. . .

PREFACE TO FIRST EDITION x v ~ i

. . .

ACKNOWLEDGMENTS FOR THE SECOND EDITION xix

. . .

ACKNOWLEDGMENTS FOR THE FIRST EDITION xx

1 INTRODUCTION

. . .

3

. . .

Hydrologic data 9

. . .

2 PROBABILITY AND PROBABILITY DISTRIBUTIONS-BASIC CONCEPTS 16

... ...

Probability : 17

. . .

Total probability theorem 24

Bayestheorem

. . .

25 Counting

. . .

26

. . .

Graphical presentation 29 Randomvariables

. . .

31

. . .

Univariate probability distributions 32

. . .

Bivariate distributions 40

. . .

Marginal distributions 41

. . .

Conditional distributions 41

(8)

Independence

. . .

43 Deriveddistributions

. . .

44

. . .

Mixed distributions 48

. . .

Exercises 49

. . .

3 PROPERTIES OF RANDOM VARIABLES 52

. . .

Moments and expectation-univariate distributions 53

. . .

Measures of central tendency 55 Arithmeticmean

. . .

55 Geometricinean

. . .

56 Median

. . .

56 Mode

. . .

56 Weightedmean

. . .

57

. . .

Measures of dispersion -57 Range

. . .

57 Variance

. . .

57

. . .

Measures of symmetry 58 Measuresofpeakedness

. . .

59 . . . Moments and expectation-jointly distributed random variables 60 Covariance

. . .

62

. . .

Correlation coefficient 62

. . .

Further properties of moments 65

. . .

Sample moments 66

. . .

Probability-weighted moments and L-moments -68

. . .

Parameter estimation 70 Unbiasedness

. . .

70 Consistency

. . .

70 Efficiency

. . .

71 Sufficiency

. . .

71

. . .

Method of moments 72

. . .

Maximum likelihood 74

. . .

Chebyshevinequality 76 Lawoflargenumbers

. . .

77

. . .

Exercises 78 4 SOME DISCRETE PROBABILITY DISTRIBUTIONS AND THEIR APPLICATIONS

. . .

81

. . .

Hypergeometric distribution 81

. . .

Bernoulli processes 84

. . .

Binomial distribution 84

. . .

Geometric distribution 89

. . .

Negative binomial distribution -90

. . .

(9)

. . .

Poissonprocess 91

. . .

Poisson distribution 91 Exponential distribution

. . .

93

. . .

Gamma distribution 93

. . .

Summary of Poisson process -94

. . .

Multinomial distribution 95

Exercises

. . .

96

5 NORMALDISTRIBUTION

. . .

100

. . .

General normal distribution 100

Reproductiveproperties

. . .

101

. . .

Standard normal distribution -102

. . .

Approximations for standard normal distribution 104

. . .

Central limit theorem 106

. . .

Constructing pdf curves for data 107

. . .

Normal approximations for other distributions 109

. . .

Binomial distribution 109

. . .

Negative binomial distribution 110

. . .

Poisson distribution 111

. . .

Continuous distributions 111 Exercises

. . .

111

. . .

6 CONTINUOUS PROBABILITY DISTRIBUTIONS 114

. . .

Uniform distribution 114

. . .

Triangular distribution 116

. . .

Exponential distribution 117 Gammadistribution

. . .

120

. . .

Lognormal distribution -126

. . .

Extreme value distributions 129

. . .

Extreme Value Type I -132

Extreme Value Type III Minimum (Weibull)

. . .

134 Discussion

. . .

138

. . .

Generalized extreme value distribution 139

Betadistribution

. . .

140

. . .

Pearson distributions 141

. . .

Some important distributions of sample statistics 142

. . .

Chi-square distribution 142

. . .

The t distribution 143 TheFdistribution

. . .

144

. . .

Transformations 145 Exercises

. . .

146

(10)

7 FREQUENCYANALYSIS

. . .

149

Probability plotting

. . .

151

Historicaldata

. . .

156

Outliers

. . .

158

Analytical hydrologic frequency analysis

. . .

158

Normal distribution

. . .

159

Lognormal distribution

. . .

160

Log Pearson type I11 distribution

. . .

160

. . .

Extreme value type I distribution (Gumbel distribution) 164 Other distributions

. . .

165

. . .

Generalconsiderations 165 Confidenceintervals

. . .

167

Treatmentofzeros

. . .

168

Truncation of low flows

. . .

176

. . .

Use of paleohydrologic data 177

. . .

Probable maximum flood 177

. . .

Discussion of flood frequency determinations 178

. . .

Regionalfrequencyanalysis 180 Delineation of homogeneous regions

. . .

180

Historical development

. . .

181

. . .

Statistical methods 182 Frequencydistributions

. . .

182

. . .

Regression-based procedures 183 Index-floodmethod

. . .

186

Regional index-flood relationship

. . .

186

. . .

Regionalization using L-moments and the GEV distribution 187

. . .

Regionalization using modeling 189 Frequency analysis of precipitation data

. . .

-189

. . .

Frequency analysis of other hydrologic variables 191 Exercises

. . .

192

8 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

. . .

194

Confidence intervals

. . .

196

Mean of a normal distribution

. . .

197

Variance of a normal distribution

. . .

199

One-sided confidence intervals

. . .

200

Parameters of probability distributions

. . .

-201

Hypothesis testing

. . .

-201

H,

.

p = pl. Ha: p = p2. normal distribution. known variance

. . .

206

. . .

H,

.

p = p,. Ha: p = p2. normal distribution. unknown variance 206

. . .

H,

.

p = po. Ha: p # po. normal distribution. known variance 207 H,

.

p = po. Ha: p # po. normal distribution. unknown variance

. . .

-207

. . .

Test for differences in means of two normal distributions 208

(11)

Test of H,: u2 = a; versus Ha: a' # a: normal population . . . 209 Test of H,

.

a: = a; versus Ha: a: # a; for two normal populations

. . .

209 Test for equality of variances from several normal distributions

. . .

209

. . .

Testing the goodness of fit of data to probability distributions 210

. . .

Chi-square goodness of fit test 211

Distributional tests based on cumulative distributions

. . .

213

. . .

Comparing two empirical distributions 219

. . .

General comments on goodness of fit tests 221

. . .

Exercises 221

9 SIMPLE LINEAR REGRESSION

. . .

Simple regression

. . .

Evaluating the regression

. . .

Confidence intervals and tests of hypotheses .

.

Inferences on regression coefficients

. . . .

Confidence intervals on regression line

. .

Confidence intervals on standard error

. . .

Extrapolation

. . .

General considerations

. . .

Exercises

. . .

10 MULTIPLE LINEAR REGRESSION 242

Notation

. . .

242 Generallinearmodel

. . .

242

. . .

Confidence intervals and tests of hypotheses 249

. . .

Confidence intervals on standard error 249

. . .

Inferences on the regression coefficients 249

. . .

Confidence intervals on the regression line 251

. . .

Other inferences in regression 251

Whichlineisbest

. . .

254 Extrapolation

. . .

256

. . .

Autocorrelated errors 257

. . .

Testing for serial correlation 259

. . .

Corrective action 260 Multicolinearity

. . .

260

. . .

Detection of multicolinearity -262

. . .

An application of multiple regression -262

. . .

Transforming linear models -266

. . .

Indicator variables in regression 268

Generalcomments

. . .

272

. . .

Logistic regression 272

(12)

11 CORRELATION

. . .

281

. . .

Inferences about population correlation coefficients 282 Serialcorrelation

. . .

287

Correlation and regional analysis

. . .

290

Correlation and cause and effect

. . .

-291

Spurious correlation

. . .

291

Exercises

. . .

293

12 MULTIVARIATE ANALYSIS

. . .

297

Notation

. . .

297

Principalcomponents

. . .

298

Regression on principal components

. . .

-307

Multivariate multiple regression

. . .

311

Canonical correlation

. . .

312

Cluster analysis

. . .

313

Exercises

. . .

318

13 DATAGENERATION

. . .

321

Univariate data generation

. . .

-321

Multivariate data generation

. . .

327

Multivariate. correlated. normal random variables

. . .

-327

Multivariate. correlated. nornormal random variables

. . .

328

Applications of data generation

. . .

-331

. . .

Exercises 334 14 ANALYSIS OF HYDROLOGIC TIME SERIES

. . .

-336

Definitions

. . .

336

Trendanalysis

. . .

340

Jumps

. . .

346

Autocorrelation

. . .

348

Periodicity

. . .

350

Autoregressive integrated moving average models (ARIMA)

. . .

355

Moving Average Processes (MA)

. . .

356

Autoregressive processes

. . .

358

. . .

Autoregressive Moving Average Models ARMA (p, q) 362 Autoregressive Integrated Moving Average ARIMA (p. d. q)

. . .

-363

~ s t i m a t e of noise variance o:

. . .

-364

Parameter estimation via least squares

. . .

364

ARmodels

. . .

364

MAmodels

. . .

364

Parameter estimation via maximum likelihood

. . .

-366

(13)

15 SOME STOCHASTIC HYDROLOGIC MODELS . . . 370

Purely random stochastic models

. . .

-374

First-order Markov process

. . .

375

First-order Markov process with periodicity

. . .

378

Higher-order autoregressive models

. . .

379

Markovchainmodels

. . .

380

. . .

Exercises 388 16 PROBABILISTIC METHODS FOR UNCERTAINTY. RISK. AND RELIABILITY ANALYSIS

. . .

390

Sensitivity analysis

. . .

391

Traditional or local sensitivity analysis

. . .

-391

Global sensitivity analysis

. . .

392

Uncertainty analysis

. . .

396

Reliability and risk analysis

. . .

396

Uncertainty. risk. and reliability analysis methods

. . .

398

First-order approximation method

. . .

398

Simplified FOA estimates for some functional forms

. . .

-399

Monte Carlo simulation

. . .

-404

Corrected FOA method

. . .

-406

Correcting FOA mean and variance estimates of an individual function

. . .

406

Second-order approximation method

. . .

-411

First-order reliability method

. . .

-412

Generic expectation functions

. . .

418

Othermethods

. . .

423

Second-order reliability methods

. . .

423

Point estimation methods

. . .

424

Transform methods

. . .

424

17 GEOSTATISTICS

. . .

425

Descriptive statistics

. . .

426

Semivariogrammodels

. . .

430

Combination semivariogram models

. . .

;

. . .

432

. . .

Estimation 433

. . .

Anexample 438

. . .

Anisotropy 443

. . .

Cokriging 445 Local and global estimation

. . .

446

Polygon declustering

. . .

446

. . .

Celldeclustering 447 Pointkriging

. . .

447

. . .

Blockkriging 447

(14)

xiv CONTENTS

. . .

Estimation of cumulative distributions 447

. . .

Uncertainty 448

Modeling using geostatistics

. . .

-449

APPENDIXES

. . .

451

A . 1

.

Common distributions

. . .

-451

Hydrologicdata

. . .

454

A.2. Monthly runoff (in.), Cave Creek near Fort Spring, Kentucky

. . .

-454

A.3. Peak discharge (cfs), Cumberland River at Cumberland Falls,

. . .

Kentucky 455 A.4. Peak discharge (cfs), Piscataquis River, Dover-Foxcroft, Maine

...

457

A.5. Total Precipitation (in.) for week of March 1 to March 7, Ashland, Kentucky

. . .

458

A.6. Flow and sediment load, Green River at Munfordville, Kentucky

. . .

458

A.7. Streamflow (in.), Walnut Gulch near Tombstone, Arizona

. . .

459

A.8. Monthly Rainfall (in.), Walnut Gulch near Tombstone, Arizona . . . 460

A.9. Annual discharge (cfs ), Spray River, Banff, Canada

. . .

461

A.lO. Annual discharge (cfs), Piscataquis River, Dover-Foxcroft, Maine

. . . .

461

A.ll. Annual discharge (cfs), Llano River, Junction, Texas

. . .

461

Statistical tables

. . .

462

A

.

12 . Standard normal distribution

. . .

462

.

A 13 Percentile values for the t distribution

. . .

464

A.14. Percentile values for the chi square distribution

. . .

465

A.15. Percentile values for the F distribution

. . .

467

A . 16

.

Critical values for the Kolmogorov-Smirnov test statistic

. . .

469

.

A 17 Durban-Watson test bounds

. . .

470

BIBLIOGRAPHY

. . .

471

. . .

(15)

Preface to the

Second Edition

SINCE THE publication of the first edition of this book, statistics has come to play an increasingly important role in hydrology. The advancements in computing technology and data management have made the application of statistical techniques that were previously known but difficult to implement allnost routine. User friendly software for personal computers has made powerful statistical routines available to nearly all hydrologists. Generally, this software comes with user manuals or help files that lead a new user through the steps needed to use the programs. Unfortunately, these aids rarely indicate the assumptions inherent in the techniques, the limitations of the techniques, and the situations in which the techniques should or should not be used. They are generally weak in instructing one on the interpretation of the results of the analysis as well. This software is a tool that is available for use in hydrology but does not replace sound hydrologic understanding of the problem at hand nor does it replace a basic understanding of the statistical technique being used.

This current edition should serve as a companion to many of the software programs available-not to explain how to use the software, but to provide guidance as to the proper routines to use for a particular problem and the interpretation of the results of the analysis.

The basic philosophy of the current edition is the same as that of the first edition. Enough detail on particular statistical methods is presented to gain a working understanding of the technique. Certainly the treatment on any particular statistical technique is not exhaustive. Much theory and derivation are omitted and left to more in-depth treatments found in books dealing specifically with the various topics.

Two chapters have been added to the book. One of these chapters deals with uncertainty analysis and the other with geostatistics. Both of these topics have received great emphasis in

(16)

xvi PREFACE TO THE SECOND EDITION the past decade. Uncertainty analysis is a growing concern as it is increasingly recognized that both statistical and deterministic analyses result in estimates that are far from absolute answers. Increasingly. attempts are made to evaluate how much uncertainty should be associated with various types of analyses. Rather than providing a point estimate of some quantity, confidence limits are sought, such that one can assert with various degrees of confidence bounds within which the sought after quantity is thought to be. Geostatistics has become of increasing impor- tance as geographically referenced information becomes available and is used in geographical information systems (GISs) to produce hydrologic estimates.

The chapter on uncertainty was written by Aditya Tyagi, a former PhD candidate at Oklahoma State University and currently a water resources engineer with CH2M Hill. Jason Vogel, a research engineer and PhD candidate at Oklahoma State University, was a coauthor of the chapter on geostatistics.

(17)

Preface

to

the

First Edition

THE RANDOM variability of such hydrologic variables as streamflow and precipitation has been recognized for centuries. The general field of hydrology was one of the first areas of science and engineering to use statistical concepts in an effort to analyze natural phenomena. Many pa- pers have been published that amply demonstrate the value of statistical tools in analyzing and solving hydrologic problems. In spite of the long history and proven utility of statistical techniques in hydrology, relatively few comprehensive and basic treatments of statistical methods in hydrology have been published.

This book has been prepared to assist engineers and hydrologists develop an elementary knowledge of some statistical tools that have been successfully applied to hydrologic problems. The intent of the book is to familiarize the reader with various statistical techniques, point out their strengths and weaknesses and demonstrate their usefulness. The serious reader will want to supplement the material with formal courses or independent study of those individual topics that are major interests. No single topic has been developed completely. Books have been written covering many of the topics discussed as single chapters in this presentation. Again the purpose here

is to develop understanding and illustrate the usefulness of the techniques. Most of the techniques are discussed in sufficient detail for a thorough understanding and application to problem situations. The philosophy of the presentation has been that one does not have to understand hydro- dynamics to swim even though it could help one to become a more proficient swimmer.

The book has not been written for statisticians or for those primarily interested in statistical theory. Rather it has been prepared for hydrologists and engineers interested in learning how statistical models and methods can be valuable tools in the analysis and solution of many hydrologic and engineering problems. The basic premise has been taken (and justifiably so) that

(18)

xviii PREFACE TO THE FIRST EDITION statisticians are competent so that many statistical results are presented without developing a rigorous proof of their validity. Proofs for most results can be found in mathematical statistics books many of which are listed in the bibliography.

No prior knowledge of statistics is required if one starts with Chapter 2. Those with varying degrees of statistical knowledge may choose to start with later chapters. A knowledge of calculus is required throughout and some familiarity with matrices is needed for material in later chapters. Appendix D is a review of the basic matrix manipulation used in the book (not in this new edition).

This is not a statistical "cookbook" for hydrologists. It does non contain step-by-step calculation procedures for "standard hydrologic problems. Basic statistical concepts are discussed and illustrated in enough detail so that one can develop his own computational procedures or methods.

Most of the computations in actual work situations would be done on digital computers. Computer programs have not been included because it is felt that most computer centers will have programs or programmers available. Likewise computational techniques are not empha- sized. For example, in the chapter on multiple regression, efficient techniques for matrix inver- sion are not presented as it is felt that these techniques are readily available at most computer ten- ters. The emphasis is thus retained on the statistical technique being used and not on the computational aspects of the problem.

Some liberties have been taken in that many terms are not precisely defined in a mathematical sense unless such a definition is warranted. Where terms are loosely defined, it is hoped that the meticulous reader will accept the general connotation of the terms for purposes of simplicity and to avoid placing emphasis on terms rather than concepts.

Many of the problems require sets of data. Those data may be supplied by the reader or selected from the data in Appendix C.

I am grateful to the Literary Executor of the late Sir Ronald A. Fisher, F. R. S., to Dr. Frank Yates, F. R. S. and to London Group Ltd., London, for permission to reprint Table E.5 from their book Statistical Tables for Biological, Agricultural arzd Medical Research, 6~ Edition ( 1 974) (not in this new edition).

(19)

Acknowledgments

for

the Second Edition

IT HAS been nearly a quarter century since I wrote the first edition of this book. During that time I have become indebted to many people. I have spent nearly this entire period with the Biosystems and Agricultural Engineering Department at Oklahoma State University. This Department has provided a wonderful atmosphere for intellectual growth and accomplishment. The faculty, staff, and students that I have been associated with have helped to create a working environment that was challenging, friendly, and one in which my only limitation was myself.

I am grateful to many individuals. Bill Barfield has continued to be a valued friend and coworker. Dan Storm, Bruce Wilson, and many graduate students have been especially instru- mental in much of my research and teaching in the field of statistical hydrology.

My daughter, Dr. Patricia Haan, assistant professor in the Biological and Agricultural Engineering Department at Texas A&M University, has been very helpful in clarifying some points in the text and correcting errors.

Certainly my wife of 34 years, Jan, has been most supportive and forgiving as I have devoted far too much time to work.

As is true of all of us, I owe whatever I have accomplished to my Creator without Whom

(20)

Acknowledgments for

the First Edition

MUCH OF the material presented in this book was developed for a course taught to students in the Agricultural Engineering and Civil Engineering Departments at the University of Ken- tucky. The suggestions and clarifications made by the students in this course over the past 8 years have been a great aid in attempting to make this book more understandable.

Special acknowledgment must be given to Dan Carey for his careful readings of the entire manuscript. These readings resulted in several corrections and clarifications. Several individuals have read parts of the book and made valuable suggestions for its improvement. Among those reviewing parts of the manuscript were Donn DeCoursey, David Allen, David Culver, and personnel of the U.S. Soil Conservation Service under the direction of Neil Bogner.

Several individuals in the Agricultural Engineering Department at the University of Kentucky offered valuable suggestions and considerable encouragement. Deserving special mention are Billy Barfield, Blaine Parker. and John Walker.

This undertaking has required sacrifice on the part of my family and especially my wife Janice. She not only typed the early drafts of the book but offered continued encouragement over the years as work and revisions were done on the book.

This manuscript was reproduced from photo-ready copy. The excellent typing involved in preparing this final draft as well as an earlier draft was done by Pat Owens. Buren Plaster drafted all of the figures.

Of course any failings and shortcomings of this book must be credited to me. My hope is that it will be found useful in at least partially meeting the need for an elementary treatment of statistical methods in hydrology. Whatever is accomplished along these lines I owe to our Father for giving me the will to see this project through and the ability to withstand the setbacks experienced along the way. Finally I express my appreciation to all of the members of the Agricultural Engineering Department at the University of Kentucky for their understanding during the preparation of this manuscript.

(21)

Statistical Methods i n

(22)

(23)

1. Introduction

MORE THAN 25 years ago I set about writing a book on the application of statistical techniques to hydrology. That book, published in 1977, became the first edition of this current work and was appropriately titled Statistical Methods in Hydrology. Although soundly criticized for producing a book of the general type "Statistics for ," that was little more than a "relevant Schuam's Outline series" on statistics with a little hydrology thrown in (Burges 1978), the book has had a very wide reception, has gone through several printings and has been widely quoted in the literature. However, as I have reflected on this critique over the years, and as I have used statistics to address problems in hydrology and observed others doing the same, I have come to the conclusion that this critique contained a large element of truth.

There is no shortage of very fine books at many levels of complexity on statistics. The theory of statistical procedures and the assumptions in statistical procedures are well explained and widely available. The same statistical techniques might be applied to hydrologic data or to the comparison of the value of the Japanese yen to the U.S. dollar. Statistical techniques are based in mathematics and probability. The units attached to the data being studied are immaterial from a statistical standpoint. What is important is the degree to which the data agree with the assumptions inherent in the statistical procedure being applied.

Similarly, there are many books on hydrology. Some of these books are quite general, some are quite theoretical, some are quite empirical, and none are really exhaustive. The problem with hydrology is that it is, in practice, very messy. For example, we can present in great detail the mathematical development of equations describing the overland flow of water on planes of various types and how flow profiles develop and how runoff hydrographs result at the lower end of these planes. There exist very elegant solutions for these problems-albeit often numerical

(24)

4 CHAPTER 1 procedures are required to arrive at these solutions. With rapid advances in computing technology, this presents a rapidly diminishing problem.

The real problem as I see it is that we have developed an elegant solution to a nonexistent problem. In my lifetime I have observed many rainfall-runoff events and have rarely seen the type of flow described above except in artificial situations such as parts of parking lots or streets covering a tiny fraction of a drainage basin. If there is any overland flow, before it goes very far flow concentration develops and the overland flow "planes" become very nonuniform.

Does that mean it is wrong to develop and present these idealized equations? Does that mean it is wrong to use models that contain these equations to develop runoff hydrographs? NO! It simply means that one must be aware of the relationships between the mathematics of the model and the actual hydrology that is occurring. Through proper selection of roughness coefficients and other coefficients in such models, good estimates of runoff hydrographs may result. Yet that does not mean that the model actually describes in exact detail the hydrologic processes that are occurring. We must not confuse actual hydrologic processes with models of these

processes.

On numerous occasions I have seen those practicing hydrology confusing hydrologic models with actual hydrologic systems. The complexity, the nonhomogeneity, the dynamic nature of actual hydrologic systems are not recognized. The uncertainty inherent in parameters used by hydrologic models to particularize the model to a specific catchment or hydrologic problem are not recognized. The numbers produced by the model are taken as the true hydrologic response of the actual hydrologic system. More disturbing, the algorithms that make up the model are taken as

true and exact representations of the hydrologic systems they purport to represent. Quite likely the one using the hydrologic model has great skills in modeling and in computers but little understanding of the complexity of hydrologic systems.

At this point one might be wondering why I have jumped on mathematical models when this book is about statistics. The answer lies in my experience over the years that statistical methods are often criticized for not being physically based and not representing what is actually occumng in the field. Yet all hydrologic models, not just statistical models, are susceptible to this criticism. Statistical models are often applied just as are mathematical models with little regard to the assumptions in the models. Some take model results as truth, especially if the statistical or mathematical technique is complex. Others will reject model results on the basis that all assumptions are not met. So basically, in hydrology, we face the same dilemmas whether we use mathematical or statistical models.

No model describes the actual and complete hydrology of anything but the simplest of settings. Regardless of what approach we use toward solving an actual hydrologic problem, compromise must be made with the methodology employed. One can never turn professional judgment over to any particular hydrologic model whether the model is mathematical, statistical, or some combination of the two. Any model must be seen as an aid to judgment and not as a replacement for it.

There are no completely theoretical models and no completely statistical models. All models have components of both theory and statistics. Both are techniques for quantifying our understanding and our observations of hydrologic processes. The presence of theory or statistics may not be a formal presence, but it is there. This leads to the conclusion that all models have

(25)

statistical components to some degree. Any constants that are estimated based on observations, even observations formalized into tables like Manning's n values, have been determined by formal or informal application of statistics. Any statistical model should be formulated based on some understanding of the system being modeled. This understanding may be brought into the model through a conceptual structure of the model. These conceptual components are what bring hydrology into the model as opposed to having a purely statistical model. In my view, one should not ignore hydrology when developing models for use in hydrology no matter how sophisticated the statistical techniques that are being used. To the extent that hydrologic knowledge is used in structuring a statistical model, the model may be said to contain conceptual components. Statis- tical models should not be developed by simply throwing data on every conceivable variable into some computerized statistical routine and hoping for the best.

As far as the hydrologist is concerned, statistics is not an end in itself. Statistics is a tool that may help one to understand hydrological data. The fact that to hydrology, statistics is a tool must be kept foremost in mind. It must also be kept in mind that statistics is just one of several tools available for application in hydrology.

Hydrologic processes are not driven by principles of statistics but by physical, chemical, and biological principles, the so-called "Laws of Nature". Often the hydrologic setting is of such complexity that the underlying component hydrologic processes cannot be expressed in such a way as to yield a suitable computational framework for describing the system. Perhaps the mix of surface soil properties, land uses, topography, and so forth are such that the setting of a particular hydrologic problem cannot be adequately described. Perhaps the complexity and heterogeneity of the system is such as to preclude deterministic modeling. Perhaps data are available on a response variable such as stream flow, water quality, or ground water level, but not on the causative variables of rainfall, evaporation, infiltration, and so on. In such a case statistical techniques may be needed in an effort to uncover descriptive behavioral relationships among the data. Such relationships are not cause-effect relationships but descriptive relationships. The relationships may support hypotheses concerning cause and effect but do not conclusively establish such relationships.

Over the past 20 years I have seen many inappropriate applications of statistics in hydrology. I have seen hydrologists stake their reputation as hydrologists on statements made based on poor knowledge of statistics. I have also seen statisticians make far-reaching conclusions with a very elementary knowledge of hydrology; here the argument goes "the data show .

. .".

The data are separated from their hydrologic reality and analyzed as pure numbers!

One thing that has compounded the problem of inappropriate use of statistics in hydrology (or any other field, I suspect) is the ready availability of powerful statistical software that is easy to use. I applaud the availability of this software but shudder at some of the applications that are made with it.

Sometimes a statistical procedure is improperly applied or applied in inappropriate circum- stances. The numbers generated by a statistical analysis are then venerated as absolute truth. It would be better to apply a technique recognizing and admitting its shortcomings and then using the results as a guide rather than religiously adopting the results and claiming they represent reality.

This long introduction has been composed to impart some of my hydrologic-statistical modeling philosophy and to alert the reader that this book will emphasize the assumptions

(26)

inherent in statistical techniques and the consequences of violating these assumptions. Statistical techniques will be explained at the practical level without many derivations and proofs. Refer- ences to these will be given. The book will be most useful to someone having at least an elementary knowledge of mathematical statistics and hydrology. This book addresses the interface of these two disciplines.

The question naturally arises as to what is meant by hydrology in this book. Hydrology broadly defined is the study of water. The Federal Council for Science and Technology (1962)

defined hydrology as

the science that treats of the waters of the Earth, their occurrence, circulation, and distribution, their chemical and physical properties, and their reaction with their environment, including their relation to living things. The domain of hydrology embraces the full life history of water on the earth.

This definition is more or less used in this book. The definition is broad and includes topics some may consider to be more proper to geology, engineering, environmental science, biology, chemistry, paleontology, or some other science. Some may even feel it includes aspects which are nonscientific. By using this definition, when the word "hydrology" is used, it includes these other areas as well.

Statistics will be considered in a limited sense in the context of this book. Statistics will be defined as

a science devoted to developing an understanding of a system that can be used to make inferences about the system based on observation relative to that system.

Models are often used in developing this understanding and in making inferences. Model is a general term that will be taken to mean

a collection of physical laws and empirical observations written in mathematical terms and combined in such a way as to produce estimates based on a set of known and/or assumed conditions.

There are many ways of collecting physical laws and empirical observations and of com- bining them to produce a model. Models can generally be represented as

where 0 represents the outputs or quantities to be estimated; f( ...) represents the mathematical structure of the model;

I

represents inputs to the model, boundary conditions, and initial conditions;

P

represents parameters that help particularize the model to a specific situation; and e represents differences between what actually occurs, 0, and what the model predicts, 0,.

(27)

There are many ways of classifying models. Some people draw sharp distinctions between statistical models and other models. In practice one cannot do a thorough modeling exercise without drawing on statistics in some way. Often some type of statistical work has to be done to come up with values for parameters for a model that might otherwise be considered a nonstatis- tical model. Thus, the parameters of the model become some function of observations. If another set of observations were used presumably different parameter values would result. Since observations (data) in hydrology are generally thought of as random variables and any function of a random variable is a random variable, the parameters for the model effectively become random variables and thus a statistical element enters a model that might otherwise not be considered as a statistical model.

Broadly speaking, quantitative hydrologic models fall on a continuous spectrum of model "types" ranging from completely deterministic on the one hand to completely stochastic on the other. A completely deterministic model would be one arrived at through consideration of the un- derlying physical relationships and would require no experimental data for its application. Statistical models range in complexity from estimating the most likely outcome or result of an experiment to describing in detail a sequence (time series) of outcomes that mimic actual outcomes. All statistical approaches rely on observations. The mathematical techniques used to extract the information contained in the observations may be as simple as computing an average or so complex as to require thousands of stochastic simulations.

Most hydrologic models fall somewhere between the extremities of this model spectrum. Often such models are termed parametric models. A parametric model may be thought of as deterministic in the sense that once model parameters are determined, the model always produces the same output from a given input. On the other hand, a parametric model is stochastic in the sense that parameter estimates depend on observed data and will change as the observed data changes. A stochastic model is one whose outputs are predictable only in a probabilistic sense. With a stochastic model, repeated use of a given set of model inputs produces outputs that are not the same but follow certain statistical patterns. A statistical model is one arrived at by applying statistical methods to a set of data to produce an estimation procedure. Multiple regression models are examples of statistical models. In this sense, all stochastic models are statistical models but all statistical models are not stochastic models.

No matter how simple the hydrologic system or how complex the hydrologic model, the model is always an approximation to the system. There are no hydrologic models-deterministic, stochastic, or combined-that represent exactly anything but the most trivial of hydrologic systems. The digital computer has made possible great advances in all types of hydrologic models. These advancements are noteworthy for both stochastic and deterministic models and have led some hydrologists to vigorously adopt the philosophy that all hydrologic problems should be attacked stochastically and some the philosophy that they should be attacked deterministically. The purpose of this book is not to promote statistical or stochastic models but to present some basic statistical concepts that have been found useful as aids for the solution of hydrologic problems.

Many hydrologic problems can best be solved through the joint application of the various modeling methods. For instance, it may be possible to adequately predict the runoff hydrograph

(28)

8 CHAPTER 1 from a simple watershed deterministically given the rainfall input. It is unlikely, however, that rainfalls that will occur during the life of a water resources project will be deterministically predictable. Thus, one approach to project evaluation would be a stochastic simulation of rainfall, deterministic conversion of the rainfall to streamflow, and a statistical analysis of the resulting streamflows.

Regardless of the type of model that is used, model parameters must be determined in some way from observed hydrologic data. The validity and applicability of a model depend directly on the characteristics of the data used to estimate model parameters. A model can be no better than the data available for parameter estimation. The data used for parameter estimation must be representative of the situation in which the model is going to be used. Obviously, if one is attempting to model streamflow from an urban area, model parameters cannot be estimated from forested watersheds. Similarly, future hydrologic behavior of a watershed can be modeled based on past observations only if available historical data are representative of future conditions. If drastic land use changes are to be made, then the model parameters must be adjusted accordingly.

All techniques used for hydrologic analysis rely on assumptions. Often the strict validity of the analysis depends on how well the true system meets these assumptions. This is certainly true of statistical models and statistical methods applied to hydrologic systems.

There are no statistical procedures whose assumptions exactly match particular hydrologic systems. Likewise there are no hydrologic systems that exactly meet the assumptions made in any particular hydrologic model.

With this in mind one is forced to the conclusion that models cannot yield an exact solution to any realistic hydrologic problem. Models must be treated as a tool that can be used to gain insight and to arrive at potential outcomes in a given hydrologic setting, but the final decision re- garding any hydrologic process rests with the hydrologist, not the models. The hydrologist may choose to adopt a solution generated from modeling considerations, but this decision must be based on the hydrologist's convictions that the solution is hydrologically sound and not simply on how well the model describes the data. How close the final real solution is to the model solution will certainly depend on how well the physical setting matches the assumptions of the modeling techniques employed. It is the hydrologist who must make the determination as to the relationship between the model result and hydrologic reality.

The fact that a statistical modeling procedure requires assumptions that are not strictly met in a particular hydrologic setting does not mean that statistically derived results are of no value. Again, the statistical modeling technique is used to provide insight into the problem at hand and not the final result. Even when it is known that certain assumptions are violated, useful information can often be obtained from a statistical modeling effort.

Throughout this book, assumptions that accompany the statistical technique being discussed will be set forth and discussed from a hydrologic standpoint. The potential problems associated with violating the assumptions will be discussed. One of the frustrations that is constantly faced in using statistical models to represent hydrologic systems is trying to determine if assumptions are met or to what extent assumptions are not met for a particular set of data and the effect of not meeting assumptions on conclusions reached using the method.

One might come away feeling that it is inappropriate to use statistics in hydrology. That is not the case at all. What is inappropriate is for an analyst to relegate absolute hydrologic authority to

(29)

to other tools available, such as mathematical models and common sense.

Deterministic hydrologic models, whether numerical or conceptual, suffer the same problems in terms of assumptions as do statistical models. Rarely are hydrologic models adequately tested over the full range of conditions for which they will be applied. Rarely are all of the assumptions associated with hydrologic models actually set forth. For instance, one assumption inherent in hydrologic models is that a basin's hydrologic response to a rare or extreme event can be modeled with the same algorithms used to model common or predominate events.

In hydrologic frequency analysis, the criticism is often justifiably leveled that estimating a rare flood-say a 500-year flood, from a record of 20 or 30 years, none of which are extraordi- narily large-is fraught with the possibilities of errors. The question is asked, how could relatively common flow levels have information embedded in them that would determine the magnitude of a 500-year event? Said in another way by example, in Oklahoma most annual peak flows from smaller watersheds are generated from thunderstorms that arise over the Great Plains of the central United States. The really big floods may be the result of a hurricane sweeping in from the Gulf of Mexico and traveling over Oklahoma. How can flow data from thunderstorms predict flow magnitudes of hurricane-related floods?

But the same questions apply to deterministic hydrologic models. If a model is formulated and parameters estimated based on common flow levels, how can one be sure these same pararn- eter values and algorithms apply to extreme events?

In both cases, flood frequency analysis and modeling, information is gained about the possible magnitude of the 500-year event. For certain neither estimate is exact! In addition to these estimates the hydrologist should do some field work, look at channel capacities, possibly look for evidence of extreme floods in the geologic past (paleohydrology), and rely on as much hydrologic reasoning as possible to arrive at the final estimate of the 500-year event. One should additionally attempt to place some type of uncertainty bands on the estimate.

What is being suggested is that responsibility for a hydrologic estimate rests squarely on the hydrologist rather than on some analytic technique. One cannot blame the log-Pearson type 111

distribution for making a bad flood frequency estimate. The problem is not the distribution itself (after all the distribution is just a mathematical equation) but the inappropriate application of the distribution in making the estimate. One cannot blame a hydrologic model if a hydraulic structure fails because the flow estimated by the model was in error. One may conclude that the model was inappropriate but it was the hydrologist that made the estimate using the model as a tool.

HYDROLOGIC DATA

Hydrologic data seems to be simultaneously abundant and scarce. We are deluged with data on rainfall, temperature, snowfall, and relative humidity from around the world on a daily basis in newspapers, radio and television reports, and on world-wide computer information networks. Many agencies worldwide collect and archive hydrologic data on streamflow, lake and reservoir levels, ground water elevations, water quality measures, and other aspects of the hydrologic cycle. These data are available in many different forms. Currently access to hydrologic data is being rapidly improved as the data is made available over electronic networks.

(30)

Yet in the face of this apparent abundance, data on a particular aspect of the hydrologic cycle at a particular location for a particular time period are often inadequate or completely lacking. It is often the task of the hydrologist to use any data that can be found having some application to the problem at hand, hydrologic models of various kinds, plus their own hydrologic knowledge to explain past, present, or anticipated hydrologic behavior of the system under study. Statistical procedures are used to evaluate the data, transfer the data to the problem at hand, select models and model parameters, evaluate model predictions, organize one's personal conception of how available data and knowledge come to bear on the problem, make predictions of future behavior of the system, and many other aspects of hydrologic problem-solving.

Hydrologic data are generally presented as values at particular times, such as a river stage at a particular time, or values averaged over time, such as the annual flow for a stream for a particular year. Aggregating data into averages over time intervals may cause a loss of information if the variability of the process within the time period is of interest. Conversely, aggregation may make it possible to more clearly visualize long-term trends because short-term variations about the trend may be removed. The variability from observation to observation in a time series of hydrologic data may be very rapid and significant or very minor. Generally systems having a lot of storage vary more slowly than systems lacking that storage. Figure 1.1 is a plot of the water surface elevation of the Great Salt Lake near Salt Lake City, Utah. This figure shows that during the period of this record, water level changes of about 20 feet have occurred but year-to-year change is relatively slow with the exception of 1982-1 984 when a rise of about 4 feet per year occurred and in the late 1980s when the level dropped rather quickly.

Figure 1.2 shows the annual peak discharge for the Kentucky River near Salvisa, Kentucky. There is little year-to-year carry-over or storage in this river system, so the flows vary more or less randomly from one year to the next.

Figure 1.3 shows the water surface elevation of Devils Lake in North Dakota. The behavior of this lake is puzzling in that it has gone from nearly 1440 feet in elevation in 1867 to 1401 feet in 1940 in an almost continuous decline, at which point an erratic but steady increase in elevation began until it reached 1447 feet in 1999.

1840 1860 1880 1 x 0 1920 1910 198D 1980 aOOO

Year

(31)

0

1895 1915 1935 1955 1975 1995

Year

Fig. 1.2. Annual peak flows on the Kentucky River near Salvisa, Kentucky.

1850 1870 1890 1910 1930 1950 1970 1990 Year

Fig. 1.3. Water surface elevation of Devils Lake, North Dakota.

In the case of the Salt Lake data, a model that estimated the water level in one year based solely on the level the previous year might produce reasonable estimates. The form of such a model would be y, = y,-, where y, is the water level at time t and y,-, is the water level at the previous time t - 1. Such a model may give a better prediction of the lake level in year t than would a model y, =

y

where

y

is the average lake level. The opposite is the case in. the Kentucky River peak flow data. Here y, = would be better than y, = y,-

,.

The previous year's flow is of little value in predicting the current year's flow.

A model for Devils Lake would be difficult to surmise based simply on lake level data, because even a reasonable estimate for the long-term average lake level could not be determined on this record of over 100 years. Simply based on the data, one cannot determine the maximum elevation reached prior to 1867 or what elevation the lake might achieve in the absence of human interference after 1999. Presumably, physical and hydrologic information would shed some light on this problem. These considerations will be discussed in detail and quantified later in the book.

(32)

In selecting data for model parameter estimation, it is important to establish that the data are representative and homogeneous over time or can be adjusted for any nonhomogeneities that may be present.

L€

anything has occurred to cause a change in the characteristic being analyzed, the data must either be adjusted to account for the change or analyzed in two sections: one before the change and one after.

Some common causes of nonhomogeneities are relocating gages (especially rain gages), diverting streamflows, constructing dams, watershed changes such as urbanization or deforesta- tion, stream channel alterations and possibly weather modification, as well as natural events of a catastrophic nature such as earthquakes, humcane floods, and so forth. In some instances the data can be corrected for changes. One possible adjustment would be by reverse reservoir routing to determine what streamflows would have been had a reservoir not been constructed. Some changes such as gradual urbanization of a watershed are difficult to correct.

The statement that the data must be representative means, for example, that data from only unusually wet or dry periods should not be used alone as this will bias the results of the analysis. If there are only a few years of record available for analysis, the chances are good that the data are not representative of the long-term variability that actually exists. Most stochastic models assume that the data being considered are homogeneous and representative.

The concept of the return period of hydrologic events plays an important role in hydrology. The return period of an event is defined as the average elapsed time between occurrences of an event with a certain magnitude or greater. For example, a 25-year peak discharge is a discharge that is equaled or exceeded on average once every 25 years over a long period of time. It does not mean that an exceedance occurs every 25 years, but that the average time between exceedances is 25 years. An exceedance is an event with a magnitude equal to or greater than a certain value. Sometimes the actual time between exceedances is called the recurrence interval. With this definition for recurrence interval, the average recurrence interval for a certain event is equal to the return period of that event. In this book, recurrence interval is used in the same sense as return period.

Of course, the concept of return period can also be applied to low flows, droughts, shortages, and so on. In this case the return period would be the average time between events with a certain magnitude or less. Such an event might still be called an exceedance in the sense that the severity of a drought exceeds some preset level.

Regardless of whether the return period is refemng to an event greater than some value or to an event less than some value, the return period can be related to a probability of an exceedance. If an exceedance occurs on the average once every 25 years, then the probability or chance that the event occurs in any given year is

&

= 0.04 or 4%. Probability, p, of an event occurring in any one year and return period, T, in years, are thus related by

This is a fundamental definition in statistical hydrology.

The concept of a random sample is used throughout this book. A sample might be thought of as a collection of objects selected from a larger collection of these same objects. The larger

(33)

collection of objects, if it contains all of the objects possible, is called the population. For example, 20 years of peak flow data from a certain river is a sample of the possible peak flows on the river. A random sample is one that is selected in such a fashion that any other sample could have resulted with equal likelihood. If the 20 years of peak flow data are considered a random sample, then one is assuming that these 20 years of data are just as likely as any other possible 20 years of data and vise versa.

In some types of analysis it is assumed that the order of occurrence of the data is not important, only the data values are important. The traditional hydrologic frequency analysis is an example of this. If a sample contains elements that are independent of each other, then the order of occurrence of the data is not important. This is the same as saying that the magnitude of an element in the sample is not affected by the temporal pattern of the other elements in the sample. Each element in the sample might be thought of as a random sample of size 1.

On the other hand, there are situations where the order of occurrence of the events is important. In designing a storage reservoir to meet projected water demands, the fact that low flows tend to follow low flows makes it necessary to have a larger reservoir than would be required if the low flows occurred randomly throughout time. This is known as persistence and indicates the elements of the sample are not independent of each other. In this case the entire sequence of data values must be considered the random sample. That is, the sequence contained in the sample is assumed to be as likely as any other sequence. The individual events in the sample are not independent.

If one wanted a random sample consisting of 7 observations of daily flows on a river during a particular year, the daily flows in a particular week of that year could not be used. This is because the flow on the second, third, and so on, day of the week would be dependent on the flows on the preceding days. The flow on day 2, for example, would not represent all possible daily flows but would be highly dependent on the flow during day 1. To get a random sample of daily flows, each of the 365 daily flows would have to have an equal chance of being selected. The sample of flows during the 7 consecutive days could be considered as a random sample of size I of weekly flows (if the week was randomly selected) but not a random sample of size 7 of daily flows.

In any hydrologic data there are errors of various kinds. The errors include measurement errors, data transmittal errors, processing errors, and others. The errors may be systematic errors and show up as a bias in the data or they may be random errors. In most error analysis it is assumed that the errors are random errors and follow the normal distribution. The treatment of hydrologic data contained in this book is not concerned so much with these types of errors as it is with sampling errors.

Sampling error is a misnomer in that there are no errors in the usual sense involved. Sam- pling errors should more properly be called sampling variability, sampling fluctuation, or sample uncertainty. What is meant by sampling error is simply that a random sample has statistical properties that are similar to the population parameters but only equal to the population properties as the sample size gets very large (or the entire population is sampled). If two samples are selected from the same population, their statistical properties will again be similar but equal to each other only as the sample size gets very large.

For example, we may desire to know the average annual rainfall at a given location. Assume we can measure exactly, that is with out any measurement error, the rainfall at the desired

(34)

location. Measurements are collected over a 5-year period and the average annual rainfall is calculated without error in the calculations. A second 5-year period elapses and data from this period is used to calculate the average annual rainfall. The two estimates will be different. Neither will equal the true average annual rainfall. The difference in the estimated values and the true values are the sampling errors. Note we cannot exactly determine the sampling error since the true average is not known.

Thus, variability or uncertainty in the statistical properties of a population based on estimates of the properties from sampIes is called sampling error. It is clear that errors in the sense of mistakes, faulty data, or carelessness are not involved in sampling errors. Sampling error is simply an inherent property of random samples. If it weren't for sampling errors, this book or hun- dreds of others on statistics would not be needed since populations would then be completely specified by any sample from that population.

Example 1 .I. The mean annual suspended sediment load for the Green River near Munfordville, ~ e n t u c k i , can be estimated from the data contained in Appendix B. This data and the resulting estimated mean annual suspended sediment load may contain many types of errors. Systematic errors could result if the flow was sampled for sediment only when the depth of flow exceeded a preset stage. This is because low flows would not be sampled.

Generally, the sediment concentration in low flows is less than that in higher flows. Thus a built in bias or systematic error is produced. Measurement errors could result from plugged samplers, samplers not properly aligned with the direction of flow, allowing the sampler to pick up some bed load, and a number of other reasons. Data transmittal errors and processing errors can result from mistakes in transcribing data from data forms, placing data in the wrong columns on spreadsheets or data entry forms, illegibly written data, and other sources.

Sampling error can be illustrated by assuming that the tabulated data are exactly correct (contain no systematic, measurement, transmittal, or processing errors). If the mean annual suspended sediment load is calculated for each successive 5-year period, the results are 640,827; 484,739; 497,604; and 460,392 tons per year. Under the no error assumption, 4 different values of the mean annual suspended sediment discharge have been calculated each of which contains no errors yet none of which are the same. The difference in the 4 estimates is caused by natural variability in the phenomena (sediment) being sampled. This difference is called sampling error. If conditions on the watershed contributing to the Green River near Munfordville never changed and if the climatic conditions do not change, then theoretically the sampling error can be made as small as desired by an increase in the sample size above the 5 years used in this illustration. Prac- tical limitation is imposed by the length of the available sediment load data record.

Much of the statistical machinery discussed in this book is concerned with sampling errors and the estimation of population characteristics from samples of data. The fact that sampling errors are inherent in random data does not mean, however, that statistical manipulations and sophistication can in any way overcome faulty data. The quality of any statistical analysis is no better than the quality of the data used. It can be worse but no better. Furthermore, statistical considerations should not be used to replace judgement and careful thought in analyzing hydrologic data. In many instances some intelligent thought is worth reams of computer output based

(35)

on a statistical analysis of some data. Statistics should be regarded as a tool, an aid to understanding, but never as a replacement for useful thought.

Rarely will one find a hydrologic problem that exactly fulfills all of the requirements for the application of one statistical technique or another. Two choices are thus available. One can rede- fine the problem so that it meets the requirement of the statistical theory and thus produce an "exact" answer to the artificial problem. The second approach is to alter the statistical technique where possible and then apply it to the real problem realizing that the results will be an approxi- mate answer to the real problem. In this case the degree of the approximation depends on the severity of the violated assumptions. This latter approach is preferable and requires knowledge of available statistical techniques, of assumptions and theory underlying the techniques, and of the consequences of violating the assumptions. It is toward this latter approach that this book is oriented.

Most of the examples and exercises used in this book were selected for pedagogical reasons, not to promote a particular technique. Thus, when a problem involves fitting a normal distribution to annual peak flow, the purpose of the problem revolves around learning about the normal distribution and is not to demonstrate that a normal distribution is applicable to peak flows. Sim- ilarly, many examples and problems had to be simplified so that they could be realistically solved with attention focused on the statistical technique and not the many fascinating intricacies of most real problems. That is not to say the techniques do not apply to real problems-uite the contrary. However, most real problems involve multiple aspects, lots of data, and many considerations other than statistical ones. Rather than get involved in these other important aspects, many of the examples and problems are idealizations of real situations.

Because the exercises were selected as a learning aid, it will be instructive to at least read the problems at the end of each chapter. Many of the problems present useful results that supplement the material in that chapter.

Many actual problems in hydrology require considerable computation. Digital computers are used for this purpose. Special statistical-numerical procedures have been developed to simplify the computations involved and improve the accuracy of the results obtained from many of the analyses presented in this book. These procedures are not presented here. Rather the emphasis is on the principles involved. Some statistical techniques such as geostatistics and multivariate techniques often require extensive calculation and considerable efficiency is gained by using special- purpose programs incorporating numerical shortcuts and safeguards against roundoff errors.

Finally, there are many important areas of statistical analysis applicable to hydrology that are not included in this book. These omitted techniques for the most part require knowledge of the material contained in this book before they can be applied. Thus, this book is an introduction to statistical methods in hydrology. Furthermore, the book is not intended as a handbook or statistical "cookbook for hydrologists. The purpose of this book is to enable the reader to better apply statistical methods to hydrologic problems through a knowledge of the methods, their foundations and limitations.

Statistical Methods in HYDROLOGY-Haan