Topic: BASIC GEOSTATISTICS
Subtopic: Status
Introduction
Classical Statistical Concepts Data Posting and Validation Regionalized Variables Kriging
Data Integration Conditional Simulation
Public Domain Geostatistics Programs Case Studies
Selected Readings Geostatistics Glossary
THERE IS LOVE IN SHARING!!
THIS IS FOR THE BENEFIT OF MY COLLEAGUES IN PETROLEUM GEOSCIENCE IMPERIAL COLLEGE LONDON WHO WILL AND MUST GRADUATE IN SEPTEMBER 2008.
INTRODUCTION
Before undertaking any study of Geostatistics, it is necessary to become familiar with certain key concepts drawn from Classical Statistics, which form the basic building blocks of Geostatistics. Because the study of Statistics generally deals with quantities of data, rather than a single datum, we need some means to deal with that data in a manageable form. Much of Statistics deals with the
organization, presentation, and summary of data. Isaaks and Sriv astava (1989) remind us that “Data speaks most clearly when organized”.
This section reviews a number of classic statistical concepts that are frequently used during the course of geostatistical analysis. By understanding these concepts, we will gain the tools needed to analyze and describe data, and to understand the relationships between different variables.
STATISTICAL NOTATION
Statistical notation uses Roman or Greek letters in equations to represent similar concepts, with the distinction being that:
Greek notation describes Populations: measures of a population are called parameters
Roman notation describes Samples: measures of a sample are called
statistics
Now might be a good time to review the list of Greek letters. Following is a list of Greek letters and their significance within the realm of statistics.
Letter Name Upper & Lower Case alpha beta gamm a delta epsilon zeta eta theta iota
kappa lambd a mu
Statistical Notation: Mean of a Population nu xi omicro n pi
rho Statistical Notation: Correlation Coefficient
sigma
Statistical Notation: Summation
Statistical Notation: Standard Deviation of a Population tau upsilon Υ phi
chi Statistical Notation: Mean of a Sample ( ) psi
omega
It is important to note that in some cases, a letter may take on a different meaning, depending on whether the letter is upper case or lower case. Certain Roman letters take on additional importance as part of the standard notation of Statistics or Geostatistics.
Letter Name Statistical Notation E Event F f f Distribution Frequency,
Probability function for a random variable
h Lag distance (distance between two sample points) m Sample mean
N n
Population size
Sample size (or number of observations in a data set) O o Observed frequencies Outcomes P p Probability Proportion
s Standard deviation of a sample V Variance
X x
Random variable
A single value of a random variable
MEASUREMENT SYSTEMS
Because the conclusions of a quantitative study are based in part on inferences drawn from measurements, it is important to consider the nature of the
measurement systems from which data are collected. Measurements are numerical values that reflect the amount or magnitude of some property. The manner in which numerical values are assigned determines the measurement scale, and thereby determines the type of data analysis (Davis, 1986).
There are four measurement scales, each more rigorously defined than its predecessor; and thus containing more information. The first two are the nominal and ordinal scales, in which we classify observations into exclusive categories. The other two scales, interval and ratio, are the ones we normally think of as “measurements,” because they involve determinations of the magnitude of an observation (Davis, 1986).
Nominal Scale
This measurement classifies observations into mutually exclusive categories of
equal rank, such as “red,” “green,” or “blue.” Symbols like “A,” “B,” “C,” or
numbers are also often used. In geostatistics, we may wish to predict facies occurrence, and may therefore code the facies as 1, 2 and 3, for sand, siltstone, and shale, respectively. Using this scale, there is no connotation that 2 is “twice as much” as 1, or that 3 is “greater than” 2.
Ordinal Scale
Observations are sometimes ranked hierarchically. A classic example taken from geology is Mohs‟ scale of hardness, in which mineral rankings extend from one to ten, with higher ranks signifying increased hardness. The step between
successive states is not equal in this scale. In the petroleum industry, kerogen types are based on an ordinal scale, indicative of stages of organic diagenesis. Interval Scale
This scale is so named because the width of successive intervals is constant. The most commonly cited example of an interval scale is temperature. A change from 10 to 20 degrees C is the same as the change from 110 to 120 degrees C. This scale is commonly used for many measurements. An interval scale does not have a natural zero, or a point where the magnitude is nonexistent. Thus, it is possible to have negative values. Within the petroleum industry, reservoir properties are measured along a continuum, but there are practical limits for the measurements. (It would be hard to conceive of negative porosity, permeability, or thickness, or of porosity greater than 100%.)
Ratio Scale
Ratios not only have equal increments between steps, but also have a zero point. Ratio scales represent the highest forms of measurement. All types of
mathematical and statistical operations are performed with them. Many
geological measurements are based on a ratio scale, because they have units of length, volume, mass, and so forth.
For most of our geostatistical studies, we will be primarily concerned with the analysis of interval and ratio data. Typically, no distinction is made between the two, and they may occur intermixed in the same problem. For example, in trend surface analysis, the independent variable may be measured on a ratio scale, whereas the geographical coordinates are on an interval scale.
POPULATIONS AND SAMPLES INTRODUCTION
Statistical analysis is built around the concepts of “populations” and “samples.” A population consists of a well-defined set of elements (either finite or infinite). More specifically, a population is the entire collection of those elements.
Commonly, such elements are measurements or observations made on items of a specific type (porosity or permeability, for example). A finite population might consist of all the wells drilled in the Gulf of Mexico in 1999, whereas, the infinite
population might be all wells drilled in the Gulf of Mexico, past, present, and
future.
A sample is a subset of elements drawn from the population (Davis, 1986). Samples are studied in order to make inferences about the population itself.
Parameters, Data, And Statistics
Populations possess certain numerical characteristics (such as the population mean) which are known as parameters. Data are measured or observed values obtained by sampling the population. A statistic is similar to a parameter, but it applies to numerical characteristics of the sample data.
Within the population, a parameter consists of a fixed value, which does not change. Statistics are used to estimate parameters or test hypotheses about the parent population (Davis, 1986). Unlike the parameter, the value of a statistic is not fixed, and may change by drawing more than one sample from the same
population.
Remember that values from Populations (parameters) are often assigned Greek letters, while the values from Samples (statistics) are assigned Roman letters.
Random Sampling
Samples should be acquired from the population in a random manner. Random sampling is defined by two properties.
First, a random sample must be unbiased, so that each item in the sample has the same chance of being chosen as any other item in the sample. Second, the random sample must be independent, so that selecting one
item from the population has no influence on the selection of other items in the population.
Random sampling produces an unbiased and independent result, so that, as the sample size increases, we have a better chance of understanding the true nature (distribution) of the population.
One way to determine whether random samples are being drawn is to analyze sampling combinations. The number of different samples of n measurements that can be drawn for the population, N, is given by the equation:
Where:
CNn = the number of combinations of samples
N = the number of elements in the population n = the number of elements in the sample
If the sampling is conducted in a manner such that each of the CNn samples has
an equal chance of being selected, the sampling program is said to be random and the result is a random sample (Mendenhall, 1971).
Sampling Methods
The method of sampling affects our ability to draw inferences about our data (such as estimation of values at unsampled locations) because we must know the probability of an observation in order to arrive at a statistical inference.
Replacement
The issue of replacement plays an important role in our sampling strategy. For example, if we were to draw samples of cards from a population consisting of a deck, we could either:
Draw a card from the deck, and add it‟s value to our hand, then draw another card Or )! n N ( ! n ! N N n
C
Draw a card from the deck, note it‟s value, and put it back in the deck, then draw a card from the deck again.
In the first case, we sample without replacement; in the second case we sample
with replacement. Sampling without replacement prevents us from sampling that
value again, while sampling with replacement allows us the chance to pick that same value again in our sample.
Oilfield Applications to Sampling
When observations having certain characteristics are systematically excluded from the sample, whether deliberately or inadvertently, the sampling is considered biased. In the oil industry, we face this situation quite frequently. Suppose, for example, we may be interested in the pore v olume of a particular reservoir unit for pay estimation. Typically, we use a threshold or porosity cutoff when making the calculation, thus deliberately biasing the true pore volume to a larger value.
Similarly, the process of drilling wells in a reservoir necessarily involves sampling
without replacement.
Furthermore, any sample data set will provide only a sparse and incomplete picture of the entire reservoir. The sampling routine (also known as the drilling program) is highly biased and dependent, and rightly so -any drilling program will be biased toward high porosity, high permeability, high structural position, and ultimately, high production. And the success or failure of nearby wells will influence further drilling. Because the sample data set represents a minuscule subset of the population, we will never really know that actual population distribution function of the reservoir. (We will discuss bias in more detail in our discussion of summary statistics.)
However, despite these limitations, our task is to infer properties about the entire reservoir from our sample data set. To accomplish this, we need to use various statistical tools to understand and summarize the properties of the samples to make inferences about the population (reservoir).
TRIALS, EVENTS, AND PROBABILITY
INTRODUCTION
In statistical parlance, a trial is an experiment that produces an outcome which consists of either a success or a failure. An event is a collection of possible
outcomes of a trial. Probability is a measure of the likelihood that an event will
occur, or a measure of that event‟s relative frequency. The following discussion introduces events and their relation to one another, then provides an overview on probability.
EVENTS
An event is a collection of possible outcomes, and this collection may contain zero or more outcomes, depending on how many trials are conducted. Events can be classified by there relationship to one another:
Independent Events
Events are classified as Independent if the occurrence of event A has no bearing
Dependent Events
Events are classified as Dependent if the occurrence of event A influences the
occurrence of event B.
Mutually Exclusive Events
Events are Mutually Exclusive if the occurrence of either event precludes the occurrence of the other. Two events that are independent events cannot be mutually exclusive.
PROBABILITY
Probability is a measure of the likelihood that an event will occur, or a measure of that event‟s relative frequency. The measure of probability is scaled from 0 to 1, where:
0 represents no chance of occurrence, and 1 represents certainty that the event will occur.
Probability is just one tool that enables the statistician to use information from samples to make inferences or describe the population from which the samples were obtained (Mendenhall, 1971). In this discussion, we will review discrete and
conditional probabilities.
Discrete Probability
All of us have an intuitive concept of probability. For example, if asked to guess whether it will rain tomorrow, most of us would reply with some confidence that rain is either likely or unlikely. Another way of expressing the estimate is to use a numerical scale, such as a percentage scale. Thus, you might say that there is a 30% chance of rain tomorrow, and imply that there is a 70% chance it will not rain.
The chance of rain is an example of discrete probability; it either will or it will not rain. The probability distribution for a discrete random variable is a formula, table, or graph providing the probability associated with each value of the random variable (Mendenhall, 1971; Davis, 1986). For a discrete distribution, probability can be defined by the following:
P(E) =number of outcomes corresponding to event E total number of possible outcomes
Where:
P = the probability of a particular outcome, and E = the event
Consider the following classic example of discrete probability, used almost universally in statistics texts.
Coin Toss Experiment
Coin tossing is a clear-cut example of discrete probability. The event has two states and must occupy one or the other; except for the vanishingly small possibility that the coin will land precisely on edge, it must come up either heads or tails (Davis, 1986: Mendenhall, 1971).
The experiment is conducted by tossing two unbiased coins. When a single coin is tossed, it has two possible outcomes: heads or tails. Because each outcome is equally likely, the probability of obtaining a head is ½. This does not imply that every other toss results in a head, but given enough tosses, heads will appear one-half the time.
Now let us look at the two-coin example. The sample points for this experiment with their respective probabilities are given below (taken from Mendenhall, 1971).
Sample
Point Coin 1 Coin 2 P(EI) y
E1 H H ¼ 2
E2 H T ¼ 1
E3 T H ¼ 1
E4 T T ¼ 0
Let y equal the number of heads observed. We assign the value y = 2 to sample point E1, y = 1 to sample point E2, etc. The probability of each value of y may be calculated by adding the probabilities of the sample points in the numerical event. The numerical event y = 0 contains one sample point, E4; y =1 contains two sample points, E2 and E3; while
y =2 contains one sample point, E1.
The Probability Distribution Function for y, where y = Number of Heads
y Sample Points in y p(y) 0 E4 ¼ 1 E2, E3 ½ 2 E1 ¼
Thus, for this experiment there is a 25% chance of observing two heads from a single toss of the two coins. The histogram contains three classes for the random variable y, corresponding to y = 0, y = 1, and y = 2. Because p(0) = ¼, the theoretical relative frequency for y = 0 is ¼; p(1) = ½, hence the theoretical relative probability for y = 1 is ½, etc. The histogram is shown in Figure 1
Figure 1
If you were to draw a sample from this population, by throwing two balanced coins, say 100 times, and recorded the number of heads observed each time to construct a histogram for the 100 measurements, your histogram would appear very similar to that of Figure 1. If you repeated the experiment with 1000 coin tosses, the similarity would be even more pronounced.
Conditional Probability
The concept of conditional probability is key to oil and gas exploration, because once a well is drilled, it makes more information available, and allows us to revise our estimates of the probability of further outcomes or events. Two events are often related in such a way that the probability of occurrence of one event depends upon whether the other event has or has not occurred. Such a
dependence on a prior event describes the concept of Conditional Probability: the chance that a particular event will occur depends on whether another event occurred previously.
For example, suppose an experiment consists of observing weather on a specific day. Let event A = „snow‟ and B = „temperature below freezing‟. Obviously, events A and B are related, but the probability of snow, P(A), is not the same as the probability of snow given the prior information that the temperature is below freezing. The probability of snow, P(A), is the fraction of the entire population of observations which result in snow. Now examine the sub-population of
observations resulting in B, temperature below freezing, and the fraction of these resulting in snow, A. This fraction, called the conditional probability of A given B, may equal P(A), but we would expect the chance of snow, given freezing temperatures, to be larger.
In statistical notation, the conditional probability that event A will occur given that event B has occurred already is written as:
where the vertical bar in the parentheses means “given” and events appearing to the right of the bar have occurred (Mendenhall, 1971).
Thus, we define the conditional probabilities of A given B as: P(A|B) = P(AB)
P(B)
and we define the conditional probabilities of B given A as follows: P(B|A) = P(AB)
P(A)
Bayes’ Theorem on Conditional Probability
Bayes‟ Theorem allows the conditional probability of an event to be updated as newer information becomes available. Quite often, we wish to find the conditional probability of an event, A, given that event B occurred at some time in the past. Bayes‟ Theorem for the probability of causes follows easily from the definition of conditional probability:
Where:
P(A | B) = the probability that event A will occur, given that event B has
already occurred
P(B | A) = the probability that event B will occur, given that event A has
already occurred
P(A) = the probability that event A will occur
P(B | A') = the probability that event B will occur, given that event A has not
already occurred
P(A') = the probability that event A will not occur
A practical geostatistical application using Bayes‟ Theorem is described in an article by Doyen; et al. (1994) entitled Bayesian Sequential Indicator Simulation
of Channel Sands in the Oseberg Field, Norwegian North Sea.
Additive Law of Probability
Another approach to probability problems is based upon the classification of compound events, event relations, and two probability laws. The first is the
Additive Law of Probability, which applies to unions.
The probability of the union (A B) is equal to: P(A B) = P(A) + P(B) -P(AB)
If A and B are mutually exclusive, P(AB) = 0 and
P(A B) = P(A) + P(B)
Multiplicative Law of Probability
The second law of probability is called the Multiplicative Law of Probability, which applies to intersections. ) ' A ( P ) ' A | B ( P ) A ( P ) A | B ( P ) A ( P ) A | B ( P ) B | A ( P
Given two events, A and B, the probability of the intersection, AB, is equal to
P(AB) = P(A)P(B|A) = P(B)P(A|B)
If A and B are independent, then P(AB) = P(A)P(B)
RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS INTRODUCTION
Geoscientists are often tasked with estimating the value of a reservoir property at a location where that property has not been previously measured. The estimation procedure must rely upon a model describing how the phenomenon behaves at unsampled locations. Without a model, there is only sample data, and no
inference can be made about the values at locations that were not sampled. The underlying model and its behavior is one of the essential elements of the
geostatistical framework.
Random variables and their probability distributions form the foundation of the geostatistical method. Unlike many other estimation methods (such as linear regression, inverse distance, or least squares) that do not state the nature of their model, geostatistical estimation methods clearly identify the basis of the models used (Isaaks and Srivastava, 1989). In this section, we define the random variable and briefly review the essential concepts of important probability
distributions. The random variable is further explained later, in Spatial Correlation Analysis and Modeling.
THE PROBABILISTIC APPROACH
Deterministic models are applicable only when the process that generated the data is known in sufficient detail to enable an accurate description of the entire population to be made from only a few sample values. Unfortunately, few reservoir processes are understood well enough to permit application of deterministic models. Although we know the physics or chemistry of the fundamental processes, the variables we study in reservoir data sets are often the product of complex interactions that are not fully quantifiable. These processes include, for example, depositional mechanisms, tectonic processes, and diagenetic alterations.
For most reservoir data sets, we must accept that there is an unavoidable degree of uncertainty about how the attribute behaves between sample locations (Isaaks and Srivastava, 1989). Thus, a probabilistic approach is required, and the
following random function models introduced herein recognize this fundamental uncertainty, providing us with tools to estimate values at unsampled locations. The following discussion describes the two kinds of random variables. Next, we‟ll discuss the probability distributions or functions associated with each type random variable.
RANDOM VARIABLE DEFINED
A random variable can be defined as a numerical outcome of an experiment whose values are generated randomly according to some probabilistic
mechanism. A random variable associates a unique numerical value with every outcome, so the value of the random variable will vary with each trial as the experiment is repeated.
The throwing of a die, for example, produces values randomly from the set 1,2,3,4,5,6. The coin toss is another experiment that produces numbers
randomly. (In the case of a coin toss, however, we need to designate a numerical value to “heads” as 0 and “tails” as 1; then we can draw randomly from the set 0,1.)
TWO CLASSES OF RANDOM VARIABLES
There are two different classes of random variables, with the distinction based on the sample interval associated with the measurement. The two classes are the
discrete and the continuous random variable. We will discuss each in turn.
Discrete Random Variables
A discrete random variable may be identified by the number and nature of the values it assumes; it may assume only a finite range of distinct values (distinct values being the operative phrase here, e.g.: 0,1,2,3,4,5 -as opposed to each and every number between 0 and 1 -which would produce an infinite number of values).
In most practical problems, discrete random variables represent count (or enumerated) data, such as point counts of minerals in a thin section. The die and coin toss experiments also generate discrete random variables.
Discrete random variables are characterized by a probability distribution, which may be described by a formula, table or graph that provides the probability associated with each value of the discrete random variable. The probability
distribution function of discrete random variables may be plotted as a histogram.
Refer to Figure 1 (Probability histogram) as an example histogram for a discrete random variable.
Figure 1
Frequency Tables and Histograms
Discrete random variables are often recorded in a frequency table, and displayed as a histogram. A frequency table records how often data values fall within certain intervals or classes. A histogram is a graphical representation of the frequency table.
It is common to use a constant class width for a histogram, so that the height of each bar is proportional to the number of values within that class. Data is conventionally ranked in ascending order, and thus can be represented as a cumulative frequency histogram, where the total number of values below certain cutoffs are shown, rather than the total number of values in each class.
Table 1 Frequency and Cumulative Frequency tables of 100 values, X, with a class width of one (modified from Isaaks and Srivastava, 1989).
Class Interva l Frequency Occurrenc es Frequenc y Percenta ge Cumulati ve Number Cumulativ e Percentag e 0-1 1 1 1 1 1-2 1 1 2 2 2-3 0 0 2 2 3-4 0 0 2 2 4-5 3 3 5 5 5-6 2 2 7 7 6-7 2 2 9 9 7-8 13 13 22 22 8-9 16 16 38 38 9-10 11 11 49 49 10-11 13 13 62 62 11-12 17 17 79 79 12-13 13 13 92 92 13-14 4 4 94 94 >14 4 4 100 100
Figure 2a and 2b display Frequency and Cumulative frequency histograms of data in Table 1 (modified from Isaaks and Srivastava,
Figure 2a
1989).
(Sometimes, the histograms are converted to continuous curves by running a line from the midpoint of each bar in the histogram. This process may be convenient for comparing continuous and discrete random variables, but may tend to confuse the presentation.)
Continuous Random Variables
These variables are defined by an infinitely large number of possible values (much like a segment of a number-line, which can be repeatedly subdivided into smaller and smaller intervals to create an infinite number of increments).
In most practical problems, continuous random variables represent measurement data, such as the length of a line, or the thickness of a pay zone.
The probability density function of the continuous random variable may be plotted as a continuous curve. Although such curves may assume a variety of shapes, it is interesting to note that a very large number of random variables observed in
nature approximate a bell-shaped curve. A statistician would say that such a
curve approximates a normal distribution (Mendenhall, 1971).
Probability Distributions Of The Discrete Random Variable
The probability distribution of a discrete random variable consists of the relative frequencies with which a random variable takes each of its possible values. Four common probability distributions for discrete random variables are: Binomial,
Negative Binomial, Poisson, and Hypergeometric. Each of these distributions is
discussed using practical geological examples taken from Davis (1986). Binomial Probability Distribution
Binomial distributions only apply to a special type of discrete random variable, called a binary variable. Binary variables can only have two values: such as ON or OFF, SUCCESS or FAILURE, 0 or 1. (Often times, values such as ON or OFF, and SUCCESS or FAILURE will be assigned the numerical values of 1 or 0 respectively.) Similarly, binomial distributions are only valid for trials in which there are only two possible outcomes for each trial. Furthermore, the total number of trials must be fixed beforehand, all of the trials must have the same probability of success, and the outcomes of all the trials must not be influenced by the outcomes of previous trials. The probability distribution governing a coin toss or die throwing experiment is a binomial distribution.
We‟ll consider how the binomial distribution can be applied to the following oilfield example.
Problem: Forecast the probability of success of a drilling program. Assumptions: Each wildcat is classified as either:
0 = Failure (dry hole) 1 = Success (discovery)
The binomial distribution is appropriate when a fixed number of wells will be drilled during an exploratory program or during a single period (budget cycle) for which the forecast is made.
In this case, each well that is drilled in turn is presumed to be independent; this means that the success or failure of one hole does not influence the outcome of the next. Thus, the probability of discovery remains unchanged as successive
wildcats are drilled (true initially -as Davis pointed out in 1986, this assumption is difficult to justify in most cases, because a discovery or failure influences the selection of subsequent drilling locations).
The probability p that a wildcat well will discover gas or oil is estimated using an industry-wide success ratio for drilling in similar areas, or based on the
company‟s own success ratio. Sometimes the success ratio is a subjective “guess.” From p, the binomial model can be developed for exploratory drilling as follows:
P
The probability that a hole will be successful. -p
The probability of failure. P = (1 -p)n
The probability that n successive wells will be dry. P = (1 -p)n-1 p
The probability that the nth hole will be a discovery, but the preceding (n -1) holes will be dry.
P = n(1 -p)n-1 p
The probability of drilling one discovery well in a series of n wildcat holes, where the discovery can occur in any of the n wildcats.
P = (1 -p)n-r pr
The probability that (n -r) dry holes will be drilled, followed by r discoveries.
However, the (n -r) dry holes and the r discoveries may be arranged in combinations, or equivalently, in
n! / (n -r)!r! different ways, resulting in the equation: P = [n! / (n -r)!r!][(1 -p)n-r pr]
The probability that r discoveries will be made in a drilling program of n wildcats. This is an expression of the binomial distribution, and gives the probability that r successes will occur in n trials, when the probability of success in a single trial is
p.
For example, suppose we want to find the probability of success associated with a 5-well exploration program in a virgin basin where the success ratio is
anticipated to be about 10%. What is the probability that the entire exploration program will be a total failure, with no discoveries?
The terms of the equation are:
N = 5 r = 0 p = 0.10 P = [(5!/5!0!] [1] [0.95] = 0.59 Where:
P = the probability of success
r = the number of discovery wells
r
p = anticipated success ratio
n = the number of holes drilled in the exploration program
The probability of no discoveries resulting from exploratory effort is almost 60%. Using either the binomial equation or a table for the binomial distribution, Figure 3
(Discrete distribution giving the probability of making n discoveries in a five-well drilling program when the success ratio (probability of discovery) is 10% (modified from Davis,
Figure 3
1986) shows the probabilities associated with all possible outcomes of the
five-well drilling program.
Negative Binomial Probability Distribution
Other discrete distributions can be developed for experimental situations with different basic assumptions. We can develop a Negative Binomial Probability
Distribution to find the probability that x dry holes will be drilled before r
discoveries are made.
Problem: Drill as many holes as needed to discover two new fields in a virgin basin.
Assumption: The same conditions that govern the binomial distribution are assumed, except that the number of “trials” is not fixed.
The probability distribution governing such an experiment is the negative
binomial. Thus we can investigate the probability that it will require, 2, 3, 4, …, up
to n exploratory wells before two discoveries are made. The expanded form of the negative binomial equation is
Where:
P = the probability of success r= the number of discovery wells x = the number of dry holes p = regional success ratio
If the regional success ratio is 10 %, the probability that a two-hole exploration program will meet the company‟s goal of two discoveries can be calculated:
r = 2 x = 0 p = 0.10 P = 0.029
The calculated probabilities are low because they relate to the likelihood of obtaining two successes and exactly x dry holes (in this case: x = zero). It may be more appropriate to consider the probability distribution that more than x dry holes must be drilled before the goal of r discoveries is achieved. We do this by first calculating the cumulative form of the negative binomial. This gives the probability that the goal of two successes will be achieved in (x + r) or fewer holes, as shown in Figure 4 (Discrete distribution giving the cumulative
probability that two discoveries will be made by or before a specified hole is drilled, when the success ratio is 10% (modified from Davis, 1986)).
Each of these probabilities is then subtracted from 1.0 to yield the desired probability distribution illustrated in Figure 5 (Discrete distribution giving the
probability that more than a specified number of holes must be drilled to make two discoveries, when the success ratio is 10% (modified from Davis, 1986)).
Figure 5
Poisson Probability Distribution
A Poisson random variable is typically a count of the number of events that occur within a certain time interval or spatial area. The Poisson probability distribution seems to be a reasonable approach to apply to a series of geological events. For example, the historical record of earthquakes in California, the record of volcanic eruptions in the Mediterranean, or the incidence of landslides related to El Nino along the California coast can be characterized by Poisson distributions. The Poisson probability model assumes that:
events occur independently,
the probability that an event will occur does not change with time, the length of the observation period is fixed in advance,
the probability that an event will occur in an interval is proportional to the
length of the interval, and
the probability of more than one event occurring at the same time is vanishingly small.
When the probability of success becomes very small, the Poisson Distribution can be used to approximate the binomial distribution with parameters n and p. This is a discrete probability distribution regarded as the limiting case of the binomial when:
n, the number of trials becomes very large, and
p, the probability of success on any one trial becomes very small.
The equation in this case is p(X) = e-x/X!
Where
p(X) = probability of occurrence of the discrete random variable X = rate of occurrence
Note that the rate of occurrence, , is the only parameter of the distribution. The Poisson distribution does not require either n or p directly, because we use the product np = instead, which is given by the rate of occurrence of events. Hypergeometric Probability Distributions
The binomial distribution would not be appropriate for calculating the probability of discovery because the chance of success changes with each wildcat well. For example, we can use Statistics to argue two distinctly contradictory cases:
Discovery of one reservoir increases the odds against finding another (fewer fields remaining).
Drilling a dry hole increases the probability that the remaining untested features will prove productive.
What we need is to find all possible combinations of producing and dry features within the population, then enumerate those combinations that yield the desired number of discoveries.
The probability distribution generated by sampling without replacement, is called a hypergeometric distribution. Consider the following:
Problem: An offshore concession contains 10 seismic anomalies, with a historical success ratio of 40%. Our limited budget will permit only six anomalies to be drilled. Assume that if four structures are productive, the discovery of one reservoir increases the odds against finding another. What will be the number of discoveries?
The probability of making x discoveries in a drilling program consisting of n holes, when sampling from a population of N prospects of which S are believed to contain commercial reservoirs, is
SN -S
x n -x
P = N
n
Where:
x = the number of discoveries
n = the number of holes drilled
S = the number commercial reservoirs
This expression represents the number of combinations of reservoirs, taken by the number of discoveries, times the number of combinations of barren
anomalies, taken by the number of dry holes, all divided by the number of combinations of all prospects taken by the total number of holes in the drilling program (Davis, 1989).
Applying this to our offshore concession example containing ten seismic anomalies, from which four are likely to be reservoirs, what are the probabilities associated with a three-well drilling program?
The probability of total failure, with no discoveries among the three structures is about 17%.
The probability of one discovery is about 50%.
A histogram of all possible outcomes of this exploration strategy is shown in Figure 6 (Discrete distribution giving the probability of n discoveries in three holes drilled on ten prospects, when four of the ten contain reservoirs (modified from Davis, 1986)). Note that some probability of success is (1.00 -0.17), or 83%.
Frequency Distributions Of Continuous Random Variables
Frequency distributions of continuous random variables follow a theoretical probability distribution or probability density function that can be represented by a continuous curve. These functions can take on a variety of shapes. Rather than displaying the functions as a curve, the distributions may be displayed as a histogram, as shown in Figure 7a, 7b,
Figure 7a
7c,
7c
and 7d (Examples of some continuous variable probability distributions).
7d
In this section, we will discuss the following common distribution functions: Normal Probability Distribution
Normal Probability Distribution
It is often assumed that random variables follow a normal probability density function, and many statistical (and geostatistical) methods are based on this supposition. The Central Limit Theorem is the foundation of the normal probability distribution.
Central Limit Theorem
The Central Limit Theorem (CLT) states that under rather general conditions, as the sample size increases, the sums and means of samples drawn from a population of any distribution will approximate a normal distribution (Sokol and Rohlf, 1969; Mendenhall, 1971). The Central Limit Theorem is defined below: Central Limit Theorem:
If random samples of n observations are drawn from a population with finite mean, , and a standard deviation, , then, as n grows larger, the sample
mean, y, will be approximately normally distributed with mean equal to and standard deviation n. The approximation will become more and
more accurate as n becomes large (Mendenhall, 1971). The Central Limit Theorem consists of three statements:
1. The mean of the sampling distribution of means is equal to the mean of the population from which the samples were drawn.
2. The variance of the sampling distribution of means is equal to the variance of the population from which the samples were drawn, divided by the size of the samples.
3. If the original population is distributed normally (i.e. it is bell shaped), the sampling distribution of means will also be normal. If the original
population is not normally distributed, the sampling distribution of means will increasingly approximate a normal distribution as sample size increases (i.e. when increasingly large samples are drawn). The significance of the Central Limit Theorem is twofold:
1. It explains why some measurements tend to possess (approximately) a normal distribution.
2. The most important contribution of the CLT is in statistical inference. Many algorithms that are used to make estimations or simulations require knowledge about the population density function. If we can accurately predict its behavior using only a few parameters, then our predictions should be more reliable. If the CLT applies, then knowing the sample mean and sample standard deviation, the density distribution can be recreated precisely.
However, the disturbing feature of the CLT, and most approximation procedures, is that we must have some idea as to how large the sample size, n, must be in order for the approximation to yield useful results. Unfortunately, there is no clear-cut answer to this question, because the appropriate value of n depends upon the population probability distribution as well as the use we make of the approximation. Fortunately, the CLT tends to work very well, even for small samples, but this is not always true.
Properties of the Normal Distribution
Formally, the Normal Probability Density Function is represented by the following expression:
Where
Z is the height of the ordinate (y-axis) of the curve and represents the density of the function. It is the dependent variable in the expression, being
a function of the variable Y.
There are two constants in the equation: , well-known to be approximately 3.14159, making 1/2 equal 0.39894, and e, the base of the Naperian or
natural logarithms, whose value is approximately 2.71828.
There are two parameters in the normal probability density function. These are the parametric mean, , and the standard deviation, , which determine the location and shape of the distribution (these parameters are discussed under Summary Statistics). Thus, there is not just one normal distribution, rather there is an infinity of such curves, because the parameters can assume an infinity of values (Sokol and Rohlf, 1969).
Figure 8a
Figure 8a (Illustration of how changes in the two parameters of the normal distribution affect the shape and position of histograms. Left ( = 4, = 1). Right( = 8, = 0.5)) illustrates the impact of parameters on the shape of a probability distribution histogram.
The histogram (or curve) is symmetrical about the mean. Therefore the mean, median and mode (described later under this subtopic) of the normal distribution occur at the same point. Figure 8b (Bell curve) shows that the curve of a
2 2 1 2 1 Y e Z
Gaussian normal distribution can be described by the position of its maximum,
Figure 8b
which corresponds to its mean () and its points of inflection. The distance between and one of the points of inflection represents the standard deviation, sometimes referred to as the mean variation. The square of the mean variation is the variance.
In a normal frequency distribution, the standard deviation may be used to characterize the sample distribution under the bell curve. According to Sokol and Rohlf, (1969): 68.3% of all sample values fall within -1 to +1 from the mean, while 95.4% of the sample values fall within -2and +2 from the mean, and 99.7% of the values are contained within -3 and +3 of the mean. This bears repeating, in a different format this time:
(1 standard deviation) contains 68.3% of the data 2 (2 standard deviations) contain 95.46% of the data 3 (3 standard deviations) contain 99.73% of the data
How are the percentages calculated? The direct calculation of any portion of the area under the normal curve requires an integration of the function shown as the above expression. Fortunately, for those who have forgotten their calculus, the integration has recorded in tabular form (Sokol and Rohlf, 1969). These tables can be found in most standard statistical books, for example, see Statistical
Tables and Formulas, Table 1 (Hald, 1952).
Application of the Normal Distribution
The normal frequency distribution is the most widely used distribution in statistics. There are three important applications of the density function (Sokol and Rohlf, 1969).
1. Sometimes we need to know whether a given sample is normally distributed before we can apply certain tests. To test whether a sample comes from a normal distribution we must calculate the expected
frequencies for a normal curve of the same mean and standard deviation, then compare the two curves.
2. Knowing when a sample comes from a normal distribution may confirm or reject underlying hypotheses about the nature of the phenomenon studied. 3. Finally, if we assume a normal distribution, we may make predictions
based upon this assumption. For the geosciences, this means a better and unbiased estimation of reservoir parameters between the well data. Normal Approximation to the Binomial Distribution
Recall that approximately 95% of the measurements associated with a normal distribution lie within two standard deviations of the mean and almost all lie within
three standard deviations. The binomial probability distribution would nearly be
symmetrical if the distribution were able to spread out a distance equal to two standard deviations on either side of the mean, which in fact is the case. Therefore, to determine the normal approximation we calculate the following when the outcome of a trial (n) results in a 0 or 1 success with probabilities q and
p, respectively:
= np = npq
If the interval 2 lies within the binomial bounds, 0 and n, the approximation
will be reasonably good (Mendenhall, 1971). Lognormal Distribution
Many variables in the geosciences do not follow a normal distribution, but are highly skewed, such as the distribution in Figure 7b, and as shown below. Figure 9 Schematic histogram of sizes and numbers of oil field discoveries of hundred thousand-barrel equivalent.
Figure 9
The histogram illustrates that most fields are small, with decreasing numbers of larger fields, and a few rare giants that exceed all others in volume. If the histograms of Figure 7b and Figure 9 are converted to logarithmic forms (that is, we use Yi = log Xi instead of Yi =Xi for each observation), the distribution
becomes nearly normal. Such variables are said to be lognormal. Transformation of Lognormal data to Normal
The data can be converted into logarithmic form by a process known as
transformation. Transforming the data to a standardized normal distribution (i.e., zero mean and unit variance) simplifies data handling and eases comparison to different data sets.
Data which display a lognormal distribution, for example, can be transformed to resemble a normal distribution by applying the formula ln(z) to each z variate in the data set prior to conducting statistical analysis. The success of the
transformation can be judged by observing its frequency distribution before and after transformation. The distribution of the transformed data should be markedly less skewed than the lognormal data. The transformed values may be back-transformed prior to reporting results.
Because of its frequent use in geology, the lognormal distribution is extremely important. If we look at the transformed variable Yi rather than Xi itself, the
properties of the lognormal distribution can be explained simply by reference to the normal distribution.
In terms of the original transformed variable Xi, the mean of Y corresponds to the nth root of the products of Xi,
n Xi
GM
Where:
GM is the geometric mean
is analogous to , except that all the elements in the series are multiplied rather than added together (Davis, 1986).
In practice, it is simpler to convert the measurements into logarithms and compute the mean and variance. If you want, the geometric mean and variance compute the antilog of Y and s2y. If you work with the data in the transformed
state, all of the statistical procedures that are appropriate for ordinary variables are applicable to the log transformed variables (Davis, 1986).
The characteristics of the lognormal distribution are discussed in a monograph by Aitchison and Brown (1969) and in the geological context by Kock and Link (1981).
Random Error
Random errors for normal distributions are additive, which means that errors of opposite sign tend to cancel one another, and the final measurement is near the true value. Lognormal distribution random errors are multiplicative, rather than additive, thus produce an intermediate product near the geometric mean.
UNIVARIATE DATA ANALYSIS INTRODUCTION
There are several ways in which to summarize a univariate (single attribute) distribution. Quite often we will simply compute the mean and the variance, or plot its histogram. However, these statistics are very sensitive to extreme values (outliers) and do not provide any spatial information, which is the heart of a geostatistical study. In this section, we will describe a number of different methods that can be used to analyse data for a single variable.
SUMMARY STATISTICS
The summary statistics represented by a histogram can be grouped into three categories:
measures of location, measures of spread, and measures of shape. Measures of Location
Measures of location provide information about where the various parts of the data distribution lie, and are represented by the following:
Minimum: Smallest value. Maximum: Largest value.
Median: Midpoint of all observed data values, when arranged in ascending order. Half the values are above the median, and half are below. This statistic represents the 50th percentile of the cumulative frequency histogram and is not generally affected by an occasional erratic data point.
Mode: The most frequently occurring value in the data set. This value falls within the tallest bar on the histogram.
Quartiles: In the same way that the median splits the data into halves, the quartiles split the data in quarters. Quartiles represent the 25th, 50th and 75th percentiles on the cumulative frequency histogram.
Mean: The arithmetic average of all data values. (This statistic is quite sensitive to extreme high or low values. A single erratic value or outlier can significantly bias the mean.) We use the following formula to determine the mean of a Population:
Mean = = where:
= population mean
N = number of observations (population size)
ZI = sum of individual observations
We can determine the mean of a Sample in a similar manner. The below formula for the sample mean is comparable to the above formula, except that population notations have been replaced with those for samples.
Mean =
where:
= sample mean
n = number of observations (sample size)
ZI = sum of individual observations
Measures of Spread
Measures of spread describe the variability of the data values, and are represented by the following:
Variance: Average squared difference of the observed values from the mean. Because the variance involves squared differences, this statistic is very sensitive to abnormally high/low values.
Variance =
Kachigan (1986) notes that the above formula is only appropriate for defining variance of a population of observations. If this same formula was applied to a sample for the purpose of estimating the variance of the parent population from which the sample was drawn, then the formula above will tend to underestimate the population variance. This
underestimation occurs as repeated samples are drawn from the population and the variance is calculated from each, using the sample mean ( , rather than the population mean (). The resulting average of
N
i
ni
x
x
N i 2
xthese variances would be lower than the true value of the population variance (assuming we were able to measure every single member of the population).
We can avoid this bias by taking the sum of squared deviations and dividing that sum by the number of observations – less one. Thus, the
sample estimate of population variance is obtained using the following
formula:
Variance = s
Standard Deviation: Square root of the variance.
Standard Deviation =
This measure is used to show the extent to which the data is spread around the vicinity of the mean, such that a small value of standard deviation would indicate that the data was clustered near to the mean. For example, if we had a mean equal to 10, and a standard deviation of 1.3, then we could predict that most of our data would fall somewhere between (10 - 1.3) and (10 + 1.3), or between 8.7 to 11.3. The standard deviation is often used instead of the variance, because the units are the same as the units of the attribute being described.
Interquartile Range: Difference between the upper (75th percentile) and the lower (25th percentile) quartile. Because this measure does not use the mean as the center of distribution, it is less sensitive to abnormally high/low values.
Figure 1a and 1b illustrate histograms of porosity with a mean of about 15 %, but different variances.
1
2
n
x
i 2
1b
Outliers or “Spurious” Data
Figure 1a
Another statistic to consider is the Z-score; a summary statistic in terms of standard deviation. Data which “appear” to be anomalous based on its Z-score which have absolute values are greater than a specified cutoff are termed
outliers. The typical cutoff is 2.5 standard deviations from the mean. The formula
Zscore = (Zi -) /
This statistic serves as a caution, signifying either bad data, or a true local anomaly, which must be taken into account in the final analysis.
Note: The Z-score transform does not change the shape of the histogram. The transform re-scales the histogram with a mean equal 0 and a variance equal 1. If the histogram is skewed before being transformed, it retains the same shape after the transform. The X-axis is now in terms of standard deviation units about the mean of zero.
Measures of Shape
Measures of shape describe the appearance of the histogram and are represented by the following:
Coefficient of Skewness: Averaged cubed difference between the data values and the mean, divided by the cubed root of the standard deviation. This measure is very sensitive to abnormally high/low values:
CS1/nZi -)3/
where:
is the mean
is the standard deviation
n is the number of X and Y data pairs
The coefficient of skewness allows us to quantify the symmetry of the data distribution, and tells us when a few exceptional values (possibly outliers?) exert a disproportionate effect upon the mean.
positive: long tail of high values (median < mean) negative: long tail of low values (median > mean) zero: a symmetrical distribution
Figure 2a, 2b,
and 2c
2c
illustrate histograms with negative, symmetrical and positive skewness.
2b
Coefficient of Variation: Often used as an alternative to skewness as a measure of asymmetry for positively skewed distributions with a minimum at zero. It is defined as the ratio of the standard deviation to the mean. A value of CV > 1 probably indicates the presence of some high erratic values (outliers).
CV =
where:
is the standard deviation is the mean
SUMMARY OF UNIVARIATE STATISTICAL MEASURES AND DISPLAYS
Advantages
Easy to calculate.
Provides information in a very condensed form.
Can be used as parameters of a distribution model (e.g., normal distribution defined by sample mean and variance).
Limitations
Summary statistics are too condensed, and do not carry enough information about the shape of the distribution.
Certain statistics are sensitive to abnormally high/low values that properly belong to the data set (eg.,,CS).
Offers only a limited description, especially if our real interest is in a multivariate data set (attributes are correlated).
BIVARIATE STATISTICAL MEASURES AND DISPLAYS INTRODUCTION
Methods for bivariate description not only provide a means to describe the relationship between two variables, but are also the basis for tools used to analyze the spatial content of a random function (to be described in the Spatial Correlation and Modeling Analysis section). The bivariate summary methods described in this section only measure the linear relationship between the variables - not their spatial features.
THE RELATIONSHIP BETWEEN VARIABLES
Bivariate analysis seeks to determine the extent to which one variable is related to another variable. We can reason that if one variable is indeed related to another, then information about the first variable might help us to predict the behavior of the second. If, on the other hand, our analysis of these two variables shows absolutely no relationship between the two, then we might need to discard one from the pair in favor of a different variable which will be more predictive the other variable's behavior.
The relationship between two variables can be described as complementary, parallel, or reciprocal. Thus, we might observe a simultaneous increase in value between two variables, or a simultaneous decrease. We might even see a simultaneous decrease in the value of one variable while the other increases. An alternative way of characterizing the relationship between two variables would be to describe their behaviors in terms of variance. In this case, we observe how the value of one variable may change (or vary) in a manner that leaves the
was defined by a 1:10 ratio, then as the value of one variable changed, the other would vary by 10 times that amount - thus preserving the relationship.)
Dependent and Independent Variables
Where a relationship between variables does exist, we can characterize each variable as being either dependent or independent. We use the behavior of the independent (or predictor) variable to determine how the dependent (or criterion) variable will react. For instance, we might expect that an increase in the value of the independent variable would result in a corresponding increase in the value of the dependent variable.
COMMON BIVARIATE METHODS
The most commonly used bivariate statistical methods include: Scatterplots
Covariance
Product Moment Correlation Coefficient Linear Regression
We will discuss each of these methods in turn, below.
SCATTERPLOTS
The most common bivariate plot is the Scatterplot, Figure 1 (Scatterplot of
Porosity (dependent variable) versus Acoustic Impedance (independent variable)).
Figure 1
This plot follows a common convention, in which the dependent variable (e.g., porosity) is plotted on the Y-axis (ordinate) and the independent variable (e.g., acoustic impedance) is plotted on the X-axis (abscissa). This type of plot serves several purposes:
detects a linear relationship,
identifies potential outliers,
provides an overall data quality control check.
This plot displays an inverse relationship between porosity and acoustic impedance, that is, as porosity increases, acoustic impedance decreases. This display should be generated before calculating bivariate summary statistics, like the covariance or correlation coefficient, because many factors affect these statistical measures. Thus, a high or low value has no real meaning until verified visually.
A common geostatistical application of the scatterplot is the h-scatterplot. (In geostatistics, h commonly refers to the lag distance between sample points.) These plots are used to show how continuous the data values are over a certain
distance in a particular direction. If the data values at locations separated by h
are identical, they will fall on a line x = y, a 45-degree line of perfect correlation. As the data becomes less and less similar, the cloud of points on the
h-Scatterplot becomes fatter and more diffuse. A later section will present more detail on the h-scatterplot.
COVARIANCE
Covariance is a statistic that measures the correlation between all points of two variables (e.g., porosity and acoustic impedance). This statistic is a very
important tool used in Geostatistics to measure spatial correlation or dissimilarity between variables, and forms the basis for the correlogram and variogram (detailed later).
The magnitude of the covariance statistic is dependent upon the magnitude of the two variables. For example, if the Xi values are multiplied by the factor k, a scalar,
then the covariance increases by a factor of k. If both variables are multiplied by
k, then the covariance increases by k2. This is illustrated in the table below.
VARIABLES COVARIAN CE X and Y 3035.63 X*10 and Y 30356.3 X*10 and Y*10 303563
The covariance formula is:
COVx,y =
n
y
i
x
i
where:
Xi is the X variable Yi is the Y variable
x is the mean of X
y is the mean of Y
n is the number of X and Y data pairs
It should be emphasized that the covariance is strongly affected by extreme pairs (outliers).
Product Moment Correlation Coefficient
The product moment correlation coefficient ( ) is more commonly called simply the correlation coefficient, and is a statistic that measures the linear relation between all points of two variables (e.g., porosity and velocity). This linear relationship is assigned a value that ranges between +1 to -1, depending on the degree of correlation:
+1 = perfect, positive correlation
0 = no correlation -a totally random relation -1 = perfect inverse correlation.
Figure 2 illustrates scatterplots showing positive correlation, no correlation, and inverse correlation between two variables.
Figure 2
The numerator for the correlation coefficient is the covariance. This value is divided by the product of the standard deviations for variables X and Y. This normalizes the covariance, thus removing the impact of the magnitude of the data values. Like the covariance, outliers adversely affect the correlation coefficient. The Correlation Coefficient formula (for a population) is:
Corr. Coeff.x,y =
y
x
n
y
i
Y
x
i
X
y
,
x
where:
Xi is the X variable Yi is the Y variable
x is the mean of X
y is the mean of Y
x is the standard deviation of X
y is the standard deviation of Y n is the number of X and Y data pairs
As with other statistical formulas, Greek is used to signify the measure of a population, while algebraic notation ( r ) is used for samples.
Rho Squared
The square of the correlation coefficient 2
(also referred to as r2) is a measure of the variance accounted for in a linear relation. This measure tells us about the extent to which two variables covary. That is, it tells us how much of the variance seen in one variable can be predicted by the variance found in the other variable. Thus, a value of = -0.83 between porosity and acoustic impedance tells us that as porosity increases in value, velocity decreases, which has a real physical meaning. However, only about 70% (actually, it is -0.832, or 68.89%)of the variability in porosity is explained by its relationship with acoustic impedance. In keeping with statistical notation, the Greek symbol 2
is used to denote the correlation coefficient of a population, while the algebraic equivalent is used to r2 refer to the correlation coefficient of a sample.
Linear Regression
Linear regression is another method we use to indicate whether a linear
relationship exists between two variables. This is a useful tool, because once we establish a linear relationship, we may later be able to interpolate values between points, extrapolate values beyond the data points, detect trends, and detect points that deviate away from the trend.
Figure 3 (Scatterplot of inverse linear relationship between porosity and acoustic
impedance, with a correlation coefficient of -0.83), shows a simple display of
Figure 3
When two variables have a high covariance (strong correlation), we can predict a linear relationship between the two. A regression line drawn through the points of the scatterplot helps us to recognize the relationship between the variables. A positive slope (from lower left to upper right) indicates a positive or direct relationship between variables. A negative slope (from upper left to lower right) indicates a negative or inverse relationship. In the example illustrated in the above figure, the porosity clearly tends to decrease as acoustic impedance increases.
The regression equation has the following general form: Y = a + bXi,
where:
Y is the dependent variable, or the variable to be estimated (e.g., porosity) Xi is the independent variable, or the estimator (e.g., velocity)
b is the slope; defined as b = (y/x), and is the correlation coefficient between X and Y x is the standard deviation of X
y is the standard deviation of Y
a is a Constant, which defines the ordinate (Y-axis) intercept and:
a = x -by
x is the mean of X