• No results found

Probability Distributions

N/A
N/A
Protected

Academic year: 2021

Share "Probability Distributions"

Copied!
208
0
0

Loading.... (view fulltext now)

Full text

(1)

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 05 Oct 2010 15:08:33 UTC

(2)

Articles

Probability distribution 1

Continous Distributions

7 Beta distribution 7 Burr distribution 13 Cauchy distribution 15 Chi-square distribution 21 Dirichlet distribution 27 F-distribution 34 Gamma distribution 37 Exponential distribution 45 Erlang distribution 53 Kumaraswamy distribution 57

Inverse Gaussian distribution 60

Laplace distribution 64 Lévy distribution 68 Log-logistic distribution 71 Log-normal distribution 78 Logistic distribution 84 Normal distribution 88 Pareto distribution 108 Student's t-distribution 117

Uniform distribution (continuous) 129

Weibull distribution 134

Discrete distributions

140

Bernoulli distribution 140

Beta-binomial distribution 142

Binomial distribution 148

Uniform distribution (discrete) 156

Geometric distribution 158

Hypergeometric distribution 163

Negative binomial distribution 170

(3)

Multivariate distributions

186

Multinomial distribution 186

Multivariate normal distribution 189

Wishart distribution 196

References

Article Sources and Contributors 200

Image Sources, Licenses and Contributors 203

Article Licenses

(4)

Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of a random variable (when the variable is discrete), or the probability of the value falling within a particular interval (when the variable is continuous).[1] The probability distribution describes the range of possible values that a random variable can attain and the probability that the value of the random variable is within any (measurable) subset of that range.

The Normal distribution, often called the "bell curve".

When the random variable takes values in the set of real numbers, the probability distribution is completely described by the cumulative distribution function, whose value at each real x is the probability that the random variable is smaller than or equal to x.

The concept of the probability distribution and the random variables which they describe underlies the mathematical discipline of probability

theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.); almost all measurements are made with some intrinsic error; in physics many processes are described probabilistically, from the kinetic properties of gases to the quantum mechanical description of fundamental particles. For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate. There are various probability distributions that show up in various different applications. Two of the most important ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian distribution, has a familiar "bell curve" shape and approximates many different naturally occurring distributions over real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads and tails, each with probability 1/2.

Formal definition

In the measure-theoretic formalization of probability theory, a random variable is defined as a measurable function X from a probability space to measurable space . A probability distribution is the pushforward measure X*P = PX −1 on .

Probability distributions of real-valued random variables

Because a probability distribution Pr on the real line is determined by the probability of a real-valued random variable X being in a half-open interval (-∞, x], the probability distribution is completely characterized by its cumulative distribution function:

(5)

Discrete probability distribution

A probability distribution is called discrete if its cumulative distribution function only increases in jumps. More precisely, a probability distribution is discrete if there is a finite or countable set whose probability is 1.

For many familiar discrete distributions, the set of possible values is topologically discrete in the sense that all its points are isolated points. But, there are discrete distributions for which this countable set is dense on the real line. Discrete distributions are characterized by a probability mass function, such that

Continuous probability distribution

By one convention, a probability distribution is called continuous if its cumulative distribution function is continuous and, therefore, the probability measure of singletons for all . Another convention reserves the term continuous probability distribution for absolutely continuous distributions. These distributions can be characterized by a probability density function: a non-negative Lebesgue integrable function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not admit such a density.

Terminology

The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be understood as the points or elements that are actual members of the distribution.

A discrete random variable is a random variable whose probability distribution is discrete. Similarly, a continuous

random variable is a random variable whose probability distribution is continuous.

Simulated sampling

The following algorithm lets one sample from a probability distribution (either discrete or continuous). This algorithm assumes that one has access to the inverse of the cumulative distribution (easy to calculate with a discrete distribution, can be approximated for continuous distributions) and a computational primitive called "random()" which returns an arbitrary-precision floating-point-value in the range of [0,1).

define function sampleFrom(cdfInverse (type="function")): // input:

// cdfInverse(x) - the inverse of the CDF of the probability distribution

// example: if distribution is [[Gaussian]], one can use a [[Taylor approximation]] of the inverse of [[erf]](x) // example: if distribution is discrete, see explanation below pseudocode

// output:

// type="real number" - a value sampled from the probability distribution represented by cdfInverse r = random()

while(r == 0): (make sure r is not equal to 0; discontinuity possible) r = random()

(6)

For discrete distributions, the function cdfInverse (inverse of cumulative distribution function) can be calculated from samples as follows: for each element in the sample range (discrete values along the x-axis), calculating the total samples before it. Normalize this new discrete distribution. This new discrete distribution is the CDF, and can be turned into an object which acts like a function: calling cdfInverse(query) returns the smallest x-value such that the CDF is greater than or equal to the query.

define function dataToCdfInverse(discreteDistribution (type="dictionary")) // input:

// discreteDistribution - a mapping from possible values to frequencies/probabilities // example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]] with chance=p

// example: setting p=0.5 in the above example, this is a [[fair coin]] where P(X=1)->"heads" and P(X=0)->"tails" // output:

// type="function" - a function that represents (CDF^-1)(x) define function cdfInverse(x):

integral = 0

go through mapping (key->value) in sorted order, adding value to integral... stop when integral > x (or integral >= x, doesn't matter)

return last key we added return cdfInverse

Note that often, mathematics environments and computer algebra systems will have some way to represent probability distributions and sample from them. This functionality might even have been developed in third-party libraries. Such packages greatly facilitate such sampling, most likely have optimizations for common distributions, and are likely to be more elegant than the above bare-bones solution.

Some properties

• The probability density function of the sum of two independent random variables is the convolution of each of their density functions.

• The probability density function of the difference of two independent random variables is the cross-correlation of their density functions.

• Probability distributions are not a vector space – they are not closed under linear combinations, as these do not preserve non-negativity or total integral 1 – but they are closed under convex combination, thus forming a convex subset of the space of functions (or measures).

Common probability distributions

The following is a list of some of the most common probability distributions, grouped by the type of process that they are related to. For a more complete list, see list of probability distributions, which groups by the nature of the outcome being considered (discrete, continuous, multivariate, etc.)

Note also that all of the univariate distributions below are singly-peaked; that is, it is assumed that the values cluster around a single point. In practice, actually-observed quantities may cluster around multiple values. Such quantities can be modeled using a mixture distribution.

(7)

Related to real-valued quantities that grow linearly (e.g. errors, offsets)

• Normal distribution (aka Gaussian distribution), for a single such quantity; the most common continuous distribution

• Multivariate normal distribution (aka multivariate Gaussian distribution), for vectors of correlated outcomes that are individually Gaussian-distributed

Related to positive real-valued quantities that grow exponentially (e.g. prices, incomes,

populations)

• Log-normal distribution, for a single such quantity whose log is normally distributed

• Pareto distribution, for a single such quantity whose log is exponentially distributed; the prototypical power law distribution

Related to real-valued quantities that are assumed to be uniformly distributed over a

(possibly unknown) region

• Discrete uniform distribution, for a finite set of values (e.g. the outcome of a fair die) • Continuous uniform distribution, for continuously-distributed values

Related to Bernoulli trials (yes/no events, with a given probability)

Basic distributions

• Bernoulli distribution, for the outcome of a single Bernoulli trial (e.g. success/failure, yes/no)

• Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed total number of independent occurrences

• Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs

• Geometric distribution, for binomial-type observations but where the quantity of interest is the number of failures before the first success; a special case of the negative binomial distribution

Related to sampling schemes over a finite population

• Binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, using sampling with replacement

• Hypergeometric distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, using sampling without replacement

• Beta-binomial distribution, for the number of "positive occurrences" (e.g. successes, yes votes, etc.) given a fixed number of total occurrences, sampling using a Polya urn scheme (in some sense, the "opposite" of sampling without replacement)

(8)

Related to categorical outcomes (events with K possible outcomes, with a given probability

for each outcome)

• Categorical distribution, for a single categorical outcome (e.g. yes/no/maybe in a survey); a generalization of the Bernoulli distribution

• Multinomial distribution, for the number of each type of catergorical outcome, given a fixed number of total outcomes; a generalization of the binomial distribution

• Multivariate hypergeometric distribution, similar to the multinomial distribution, but using sampling without replacement; a generalization of the hypergeometric distribution

Related to events in a Poisson process (events that occur independently with a given rate)

• Poisson distribution, for the number of occurrences of a Poisson-type event in a given period of time • Exponential distribution, for the time before the next Poisson-type event occurs

Useful for hypothesis testing related to normally-distributed outcomes

• Chi-square distribution, the distribution of a sum of squared standard normal variables; useful e.g. for inference regarding the sample variance of normally-distributed samples (see chi-square test)

• Student's t distribution, the distribution of the ratio of a standard normal variable and the square root of a scaled chi squared variable; useful for inference regarding the mean of normally-distributed samples with unknown variance (see Student's t-test)

• F-distribution, the distribution of the ratio of two scaled chi squared variables; useful e.g. for inferences that involve comparing variances or involving R-squared (the squared correlation coefficient)

Useful as conjugate prior distributions in Bayesian inference

• Beta distribution, for a single probability (real number between 0 and 1); conjugate to the Bernoulli distribution and binomial distribution

• Gamma distribution, for a non-negative scaling parameter; conjugate to the rate parameter of a Poisson distribution or exponential distribution, the precision (inverse variance) of a normal distribution, etc.

• Dirichlet distribution, for a vector of probabilities that must sum to 1; conjugate to the categorical distribution and multinomial distribution; generalization of the beta distribution

• Wishart distribution, for a symmetric non-negative definite matrix; conjugate to the inverse of the covariance matrix of a multivariate normal distribution; generalzation of the gamma distribution

See also

• Copula (statistics)

• Cumulative distribution function • Histogram

• Inverse transform sampling

• Likelihood function • List of statistical topics

• Probability density function • Random variable

• Riemann–Stieltjes integral application to probability theory

Notes

[1] Everitt, B.S. (2006) The Cambridge Dictionary of Statistics, Third Edition. pp. 313–314. Cambridge University Press, Cambridge. ISBN 0521690277

(9)

External links

• An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern. (http://www.youtube.com/

watch?v=AUSKTk9ENzg) from Index Funds Advisors IFA.com (http://www.ifa.com), youtube.com • Interactive Discrete and Continuous Probability Distributions (http://www.socr.ucla.edu/htmls/

SOCR_Distributions.html), socr.ucla.edu

• A Compendium of Common Probability Distributions (http://www.causascientia.org/math_stat/Dists/ Compendium.pdf)

• A Compendium of Distributions (http://www.vosesoftware.com/content/ebook.pdf), vosesoftware.com • Statistical Distributions - Overview (http://www.xycoon.com/contdistroverview.htm), xycoon.com • Probability Distributions (http://www.sitmo.com/eqcat/8) in Quant Equation Archive, sitmo.com • A Probability Distribution Calculator (http://www.covariable.com/continuous.html), covariable.com • Sourceforge.net (http://sourceforge.net/projects/distexplorer/), Distribution Explorer: a mixed C++ and C#

Windows application that allows you to explore the properties of 20+ statistical distributions, and calculate CDF, PDF & quantiles. Written using open-source C++ from the Boost.org (http://www.boost.org) Math Toolkit library.

• Explore different probability distributions and fit your own dataset online - interactive tool (http://www.xjtek. com/anylogic/demo_models/111/), xjtek.com

(10)

Continous Distributions

Beta distribution

Beta

Probability density function

Cumulative distribution function

parameters: shape (real) shape (real)

support: pdf: cdf: mean:

median: no closed form

mode:

for

variance: skewness:

ex.kurtosis: see text entropy: see text mgf:

(11)

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval (0, 1) parameterized by two positive shape parameters, typically denoted by α and β. It is the special case of the Dirichlet distribution with only two parameters. Just as the Dirichlet distribution is the conjugate prior of the multinomial distribution and categorical distribution, the beta distribution is the conjugate prior of the binomial distribution and bernoulli distribution. In Bayesian statistics, it can be seen as the likelihood of the parameter p of a binomial distribution from observing α − 1 independent events with probability p and β − 1 with probability 1 − p.

Characterization

Probability density function

The probability density function of the beta distribution is:

where is the gamma function. The beta function, B, appears as a normalization constant to ensure that the total probability integrates to unity.

Cumulative distribution function

The cumulative distribution function is

where is the incomplete beta function and is the regularized incomplete beta function.

Properties

The expected value ( ), variance (second central moment), skewness (third central moment), and kurtosis excess (forth central moment) of a Beta distribution random variable X with parameters α and β are:

The skewness is

The kurtosis excess is:

(12)

where is a Pochhammer symbol representing rising factorial. It can also be written in a recursive form as

One can also show that

Quantities of information

Given two beta distributed random variables, X ~ Beta(α, β) and Y ~ Beta(α', β'), the information entropy of X is [1]

where is the digamma function. The cross entropy is

It follows that the Kullback–Leibler divergence between these two beta distributions is

Shapes

The beta density function can take on different shapes depending on the values of the two parameters: • is the uniform [0,1] distribution

• is U-shaped (red plot)

• or is strictly decreasing (blue plot) • is strictly convex

• is a straight line • is strictly concave

• or is strictly increasing (green plot) • is strictly convex

• is a straight line • is strictly concave • is unimodal (purple & black plots)

Moreover, if then the density function is symmetric about 1/2 (red & purple plots).

Parameter estimation

Let

be the sample mean and

be the sample variance. The method-of-moments estimates of the parameters are

When the distribution is required over an interval other than [0, 1], say , then replace with and with in the above equations.[2][3]

(13)

There is no closed-form of the maximum likelihood estimates for the parameters.

Related distributions

• If X has a beta distribution, then T = X/(1 − X) has a "beta distribution of the second kind", also called the beta prime distribution.

• The connection with the binomial distribution is mentioned below. • The Beta(1,1) distribution is identical to the standard uniform distribution.

• If X has the Beta(3/2,3/2) distribution and R > 0 is a real parameter, then Y := 2RX – R has the Wigner semicircle distribution.

• If X and Y are independently distributed Gamma(α, θ) and Gamma(β, θ) respectively, then X / (X + Y) is distributed Beta(α, β).

• If X and Y are independently distributed Beta(α,β) and F(2β, 2α) (Snedecor's F distribution with 2β and 2α degrees of freedom), then Pr(X ≤ α/(α + xβ)) = Pr(Y > x) for all x > 0.

• The beta distribution is a special case of the Dirichlet distribution for only two parameters. • The Kumaraswamy distribution resembles the beta distribution.

• If has a uniform distribution, then , which is a special case of the Beta distribution called the power-function distribution.

• Binomial opinions in subjective logic are equivalent to Beta distributions.

• Beta(1/2,1/2) is the Jeffreys prior for a proportion and is equivalent to arcsine distribution.

Beta(i, j) with integer values of i and j is the distribution of the i-th order statistic (the i-th smallest value) of a sample of i + j − 1 independent random variables uniformly distributed between 0 and 1. The cumulative probability from 0 to x is thus the probability that the i-th smallest value is less than x, in other words, it is the probability that at least i of the random variables are less than x, a probability given by summing over the binomial distribution with its p parameter set to x. This shows the intimate connection between the beta distribution and the binomial distribution.

Applications

Rule of succession

A classic application of the beta distribution is the rule of succession, introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. It states that, given s successes in n conditionally independent Bernoulli trials with probability p, that p should be estimated as . This estimate may be regarded as the expected value of the posterior distribution over p, namely Beta(s + 1, n − s + 1), which is given by Bayes' rule if one assumes a uniform prior over p (i.e., Beta(1, 1)) and then observes that p generated s successes in n trials.

Bayesian statistics

Beta distributions are used extensively in Bayesian statistics, since beta distributions provide a family of conjugate prior distributions for binomial (including Bernoulli) and geometric distributions. The Beta(0,0) distribution is an improper prior and sometimes used to represent ignorance of parameter values.

Task duration modeling

The beta distribution can be used to model events which are constrained to take place within an interval defined by a minimum and maximum value. For this reason, the beta distribution — along with the triangular distribution — is used extensively in PERT, critical path method (CPM) and other project management / control systems to describe the time to completion of a task. In project management, shorthand computations are widely used to estimate the mean and standard deviation of the beta distribution:

(14)

where a is the minimum, c is the maximum, and b is the most likely value.

Using this set of approximations is known as three-point estimation and are exact only for particular values of α and β, specifically when[4] :

or vice versa.

These are notably poor approximations for most other beta distributions exhibiting average errors of 40% in the mean and 549% in the variance[5][6][7]

Information theory

We introduce one exemplary use of beta distribution in information theory, particularly for the information theoretic performance analysis for a communication system. In sensor array systems, the distribution of two vector production is used for the performance estimation in frequent. Assume that s and v are vectors the (M − 1)-dimensional nullspace of h with isotropic i.i.d. where s, v and h are in CM and the elements of h are i.i.d complex Gaussian random values. Then, the production of s and v with absolute of the result |sHv| is beta(1, M − 2) distributed.

Four parameters

A beta distribution with the two shape parameters α and β is supported on the range [0,1]. It is possible to alter the location and scale of the distribution by introducing two further parameters representing the minimum and maximum values of the distribution.[8]

The probability density function of the four parameter beta distribution is given by

The standard form can be obtained by letting

References

[1] A. C. G. Verdugo Lazo and P. N. Rathie. "On the entropy of continuous probability distributions," IEEE Trans. Inf. Theory, IT-24:120–122,1978.

[2] Engineering Statistics Handbook (http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm)

[3] Brighton Webs Ltd. Data & Analysis Services for Industry & Education (http://www.brighton-webs.co.uk/distributions/beta.asp) [4] Grubbs, Frank E. (1962). Attempts to Validate Certain PERT Statistics or ‘Picking on PERT’. Operations Research 10(6), p. 912–915. [5] Keefer, Donald L. and Verdini, William A. (1993). Better Estimation of PERT Activity Time Parameters. Management Science 39(9), p.

1086–1091.

[6] Keefer, Donald L. and Bodily, Samuel E. (1983). Three-point Approximations for Continuous Random variables. Management Science 29(5), p. 595–609.

[7] DRMI Newsletter, Issue 12, April 8, 2005 (http://www.nps.edu/drmi/docs/1apr05-newsletter.pdf)

(15)

External links

• Weisstein, Eric W., " Beta Distribution (http://mathworld.wolfram.com/BetaDistribution.html)" from MathWorld.

• "Beta Distribution" (http://demonstrations.wolfram.com/BetaDistribution/) by Fiona Maclachlan, the Wolfram Demonstrations Project, 2007.

• Beta Distribution – Overview and Example (http://www.xycoon.com/beta.htm), xycoon.com • Beta Distribution (http://www.brighton-webs.co.uk/distributions/beta.asp), brighton-webs.co.uk

• Beta Distributions (http://isometricland.com/geogebra/geogebra_beta_distributions.php) – Applet showing beta distributions in action.

(16)

Burr distribution

Burr

Probability density function

Cumulative distribution function

parameters: support: pdf: cdf:

mean: where B() is the beta function

median: mode: variance: skewness: ex.kurtosis: entropy:

(17)

mgf: cf:

In probability theory, statistics and econometrics, the Burr Type XII distribution or simply the Burr distribution is a continuous probability distribution for a non-negative random variable. It is also known as the Singh-Maddala

distribution and is one of a number of different distributions sometimes called the "generalized log-logistic

distribution". It is most commonly used to model household income (See: Household income in the U.S. and compare to magenta graph at right).

The Burr distribution has probability density function:[1][2]

and cumulative distribution function:

See also

Log-logistic distribution

References

[1] Maddala, G.S.. 1983, 1996. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press. [2] Tadikamalla, Pandu R. (1980), "A Look at the Burr and Related Distributions" (http://links.jstor.org/

(18)

Cauchy distribution

Not to be confused with the Lorenz curve.

Cauchy–Lorentz

Probability density function

The purple curve is the standard Cauchy distribution Cumulative distribution function

parameters: location (real) scale (real)

support: pdf:

cdf:

mean: not defined

median: mode:

variance: not defined

skewness: not defined

ex.kurtosis: not defined

entropy:

mgf: not defined

cf:

The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as the Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution.

(19)

Its importance in physics is due to its being the solution to the differential equation describing forced resonance.[1] In mathematics, it is closely related to the Poisson kernel, which is the fundamental solution for the Laplace equation in the upper half-plane. In spectroscopy, it is the description of the shape of spectral lines which are subject to homogeneous broadening in which all atoms interact in the same way with the frequency range contained in the line shape. Many mechanisms cause homogeneous broadening, most notably collision broadening, and Chantler–Alda radiation.[2]

Characterization

Probability density function

The Cauchy distribution has the probability density function

where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter

which specifies the half-width at half-maximum (HWHM). γ is also equal to half the interquartile range. Cauchy himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta function (see there).

The amplitude of the above Lorentzian function is given by

The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density function

In physics, a three-parameter Lorentzian function is often used, as follows:

(20)

Cumulative distribution function

The cumulative distribution function (cdf) is:

and the inverse cumulative distribution function of the Cauchy distribution is

Properties

The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined. Its mode and median are well defined and are both equal to x0.

When U and V are two independent normally distributed random variables with expected value 0 and variance 1, then the ratio U/V has the standard Cauchy distribution.

If X1, ..., Xn are independent and identically distributed random variables, each with a standard Cauchy distribution, then the sample mean (X1 + ... + Xn)/n has the same standard Cauchy distribution (the sample median, which is not affected by extreme values, can be used as a measure of central tendency). To see that this is true, compute the characteristic function of the sample mean:

where is the sample mean. This example serves to show that the hypothesis of finite variance in the central limit theorem cannot be dropped. It is also an example of a more generalized version of the central limit theorem that is characteristic of all stable distributions, of which the Cauchy distribution is a special case.

The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly stable distribution. The standard Cauchy distribution coincides with the Student's t-distribution with one degree of freedom.

Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is closed under linear transformations with real coefficients. In addition, the Cauchy distribution is the only univariate distribution which is closed under linear fractional transformations with real coefficients. In this connection, see also McCullagh's parametrization of the Cauchy distributions.

Characteristic function

Let X denote a Cauchy distributed random variable. The characteristic function of the Cauchy distribution is given by

which is just the Fourier transform of the probability density. It follows that the probability may be expressed in terms of the characteristic function by:

Explanation of undefined moments

Mean

If a probability distribution has a density function f(x) then the mean is

(21)

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the Cauchy distribution, both the positive and negative terms of (2) are infinite. This means (2) is undefined. Moreover, if (1) is construed as a Lebesgue integral, then (1) is also undefined, since (1) is then defined simply as the difference (2) between positive and negative parts.

However, if (1) is construed as an improper integral rather than a Lebesgue integral, then (2) is undefined, and (1) is not necessarily well-defined. We may take (1) to mean

and this is its Cauchy principal value, which is zero, but we could also take (1) to mean, for example,

which is not zero, as can be seen easily by computing the integral.

Various results in probability theory about expected values, such as the strong law of large numbers, will not work in such cases.

Second moment

Without a defined mean, it is impossible to consider the variance or standard deviation of a standard Cauchy distribution, as these are defined with respect to the mean. But the second moment about zero can be considered. It turns out to be infinite:

Estimation of parameters

Since the mean and variance of the Cauchy distribution are not defined, attempts to estimate these parameters will not be successful. For example, if N samples are taken from a Cauchy distribution, one may calculate the sample mean as:

Although the sample values will be concentrated about the central value , the sample mean will become increasingly variable as more samples are taken, due to the increased likelihood of encountering sample points with a large absolute value. In fact, the distribution of the sample mean will be equal to the distribution of the samples themselves, i.e., the sample mean of a large sample is no better (or worse) an estimator of than any single observation from the sample. Similarly, calculating the sample variance will result in values that grow larger as more samples are taken.

Therefore, more robust means of estimating the central value and the scaling parameter are needed. One simple method is to take the median value of the sample as an estimator of and half the sample interquartile range as an estimator of . Other, more precise and robust methods have been developed [3] For example, the

truncated mean of the middle 24% sample order statistics produces an estimate for that is more efficient than using either the sample median or the full sample mean.[4][5] However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases if the mean more than 24% of the sample is used.[4][5]

Maximum likelihood can also be used to estimate the parameters and . However, this tends to be complicated by the fact that this requires finding the roots of a high degree polynomial, and there can be multiple roots that represent local maxima.[6] Also, while the maximum likelihood estimator is asymptotically efficient, it is relatively inefficient for small samples.[7] The log-likelihood function for the Cauchy distribution for sample size n is:

(22)

Solving just for requires solving a polynomial of degree 2n − 1,[6] and solving just for requires solving a polynomial of degree (first for , then ). It is also worthwhile to note that is a monotone function in and that the solution must satisfy . Therefore, whether solving for one parameter or for both paramters simultaneously, a numerical solution on a computer is typically required. The benefit of maximum likelihood estimation is asymptotic efficiency; estimating using the sample median is only about 81% as asymptotically efficient as estimating by maximum likelihood.[5][8] The truncated sample mean using the middle 24% order statistics is about 88% as asymptotically

efficient an estimator of as the maimum likelihood estimate.[5] When Newton's method is used to find the solution for the maximum likelihood estimate, the middle 24% order statistics can be used as an initial solution for

.

Multivariate Cauchy distribution

A random vector X = (X1, …, Xk)′ is said to have the multivariate Cauchy distribution if every linear combination of its components Y = a1X1 + … + akXk has a Cauchy distribution. That is, for any constant vector a ∈ Rk, the random variable Y = a′X should have a univariate Cauchy distribution.[9] The characteristic function of a multivariate Cauchy distribution is given by:

where and are real functions with a homogeneous function of degree one and a positive homogeneous function of degree one.[9] More formally:[9]

and

An example of a bivariate Cauchy distribution can be given by:[10]

Note that in this example, even though there is no analogue to a covariance matrix, x and y are not statistically independent.[10]

Related distributions

• The ratio of two independent standard normal random variables is a standard Cauchy variable, a Cauchy(0,1). Thus the Cauchy distribution is a ratio distribution.

• The standard Cauchy(0,1) distribution arises as a special case of Student's t distribution with one degree of freedom.

• Relation to stable distribution: if X ~ Stable , then X ~Cauchy(μ, γ).

Relativistic Breit–Wigner distribution

In nuclear and particle physics, the energy profile of a resonance is described by the relativistic Breit–Wigner distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner distribution.

(23)

See also

• McCullagh's parametrization of the Cauchy distributions • Lévy flight and Lévy process

• Slash distribution

• Wrapped Cauchy distribution

References

[1] http://webphysics.davidson.edu/Projects/AnAntonelli/node5.html Note that the intensity, which follows the Cauchy distribution, is the square of the amplitude.

[2] E. Hecht (1987). Optics (2nd ed.). Addison-Wesley. p. 603.

[3] Cane, Gwenda J. (1974). "Linear Estimation of Parameters of the Cauchy Distribution Based on Sample Quantiles" (http://www.jstor.org/ stable/2285535). Journal of the American Statistical Association 69 (345): 243–245. .

[4] Rothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1966). "A note on estimation from a Cauchy sample". Journal of the American Statistical Association 59 (306): 460–463.

[5] Bloch, Daniel (1966). "A note on the estimation of the location parameters of the Cauchy distribution" (http://www.jstor.org/pss/ 2282794). Journal of the American Statistical Association 61 (316): 852–855. .

[6] Ferguson, Thomas S. (1978). "Maximum Likelihood Estimates of the Parameters of the Cauchy Distribution for Samples of Size 3 and 4" (http://www.jstor.org/pss/2286549). Journal of the American Statistical Association 73 (361): 211. .

[7] Cohen Freue, Gabriella V. (2007). "The Pitman estimator of the Cauchy location parameter" (http://faculty.ksu.edu.sa/69424/USEPAP/ Coushy dist.pdf). Journal of Statistical Planning and Inference 137: 1901. .

[8] Barnett, V. D. (1966). "Order Statistics Estimators of the Location of the Cauchy Distribution" (http://www.jstor.org/pss/2283210). Journal of the American Statistical Association 61 (316): 1205. .

[9] Ferguson, Thomas S. (1962). "A Representation of the Symmetric Bivariate Cauchy Distribution" (http://www.jstor.org/pss/2237984). Journal of the American Statistical Association: 1256. .

[10] Molenberghs, Geert; Lesaffre, Emmanuel (1997). "Non-linear Integral Equations to Approximate Bivariate Densities with Given Marginals and Dependence Function" (http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n310.pdf). Statistica Sinica 7: 713–738. .

External links

• Earliest Uses: The entry on Cauchy distribution has some historical information. (http://jeff560.tripod.com/c. html)

• Weisstein, Eric W., " Cauchy Distribution (http://mathworld.wolfram.com/CauchyDistribution.html)" from MathWorld.

• GNU Scientific Library – Reference Manual (http://www.gnu.org/software/gsl/manual/gsl-ref. html#SEC294)

(24)

Chi-square distribution

Probability density function

Cumulative distribution function

notation: or

parameters: k ∈ N1 — degrees of freedom

support: x ∈ [0, +∞) pdf: cdf: mean: k median: mode: max{ k − 2, 0 } variance: 2k skewness: ex.kurtosis: 12 / k entropy: mgf: (1 − 2 t)−k/2   for t < ½ cf: (1 − 2 i t)−k/2      [1]

In probability theory and statistics, the chi-square distribution (also chi-squared or χ²-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing, or in construction of confidence intervals.[2][3][4][5]

The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of

(25)

qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks.

The chi-square distribution is a special case of the gamma distribution.

Definition

If X1, …, Xk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-square distribution with k degrees of freedom. This is usually denoted as

The chi-square distribution has one parameter: k — a positive integer that specifies the number of degrees of freedom (i.e. the number of Xi’s)

Characteristics

Further properties of the chi-square distribution can be found in the box at right.

Probability density function

The probability density function (pdf) of the chi-square distribution is

where Γ(k/2) denotes the Gamma function, which has closed-form values at the half-integers.

For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-square distribution.

Cumulative distribution function

Its cumulative distribution function is:

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function. In a special case of k = 2 this function has a simple form:

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages. For a closed form approximation for the CDF, see under Noncentral chi-square distribution.

(26)

Additivity

It follows from the definition of the chi-square distribution that the sum of independent chi-square variables is also chi-square distributed. Specifically, if {Xi}i=1n are independent chi-square variables with {ki}i=1n degrees of freedom, respectively, then Y = X1 + ⋯ + Xn is chi-square distributed with k1 + ⋯ + kn degrees of freedom.

Information entropy

The information entropy is given by where ψ(x) is the Digamma function.

Noncentral moments

The moments about zero of a chi-square distribution with k degrees of freedom are given by[6][7]

Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:

Asymptotic properties

By the central limit theorem, because the chi-square distribution is the sum of k independent random variables, it converges to a normal distribution for large k (k > 50 is “approximately normal”).[8] Specifically, if X ~ χ²(k), then as

k tends to infinity, the distribution of tends to a standard normal distribution. However,

convergence is slow as the skewness is and the excess kurtosis is 12/k.

Other functions of the chi-square distribution converge more rapidly to a normal distribution. Some examples are: • If X ~ χ²(k) then is approximately normally distributed with mean and unit variance (result credited

to R. A. Fisher).

• If X ~ χ²(k) then is approximately normally distributed with mean and variance (Wilson and Hilferty, 1931)

Related distributions

A chi-square variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables.

If Y is a k-dimensional Gaussian random vector with mean vector μ and rank k covariance matrix C, then

X = (Y−μ)TC−1(Y−μ) is chi-square distributed with k degrees of freedom.

The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-square distribution called the noncentral chi-square distribution.

If Y is a vector of k i.i.d. standard normal random variables and A is a k×k idempotent matrix with rank k−n then the quadratic form YTAY is chi-square distributed with k−n degrees of freedom.

The chi-square distribution is also naturally related to other distributions arising from the Gaussian. In particular, • Y is F-distributed, Y ~ F(k1,k2) if where X1 ~ χ²(k1) and X2  ~ χ²(k2) are statistically independent.

• If X is chi-square distributed, then is chi distributed.

• If X1  ~  χ2k1 and X2  ~  χ2k2 are statistically independent, then X1 + X2  ~ χ2k1+k2. If X1 and X2 are not

(27)

Generalizations

The chi-square distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

Chi-square distributions

Noncentral chi-square distribution

The noncentral chi-square distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means.

Generalized chi-square distribution

The generalized chi-square distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions

The chi-square distribution X ~ χ²(k) is a special case of the gamma distribution, in that X ~ Γ(k/2, 2) (using the shape parameterization of the gamma distribution).

Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X ~ χ²(2), then X ~ Exp(1/2) is an exponential distribution.

The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X ~ χ²(k) with even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

Applications

The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student’s t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-square distribution arises from a Gaussian-distributed sample.

• if X1, …, Xn are i.i.d. N(μ, σ2) random variables, then where .

• The box below shows probability distributions with name starting with chi for some statistics based on XiNormal(μi, σ2i), i = 1, ⋯, k, independent random variables:

(28)

Name Statistic chi-square distribution

noncentral chi-square distribution chi distribution

noncentral chi distribution

Table of χ² value vs P value

The P-value is the probability of observing a test statistic at least as extreme in a Chi-square distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the P-value. The table below gives a number of P-values matching to χ² for the first 10 degrees of freedom. A P-value of 0.05 or less is usually regarded as statistically significant.

Degrees of freedom (df) χ² value [9] 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 P value (Probability) 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 Nonsignificant Significant

(29)

See also

• Cochran's theorem

• Degrees of freedom (statistics)

• Fisher's method for combining independent tests of significance • Generalized chi-square distribution

• High-dimensional space • Inverse-chi-square distribution • Noncentral chi-square distribution • Normal distribution

• Pearson's chi-square test

• Proofs related to chi-square distribution • Wishart distribution

References

Footnotes

[1] M.A. Sanders. "Characteristic function of the central chi-square distribution" (http://www.planetmathematics.com/CentralChiDistr.pdf). . Retrieved 2009-03-06.

[2] Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" (http://www.math.sfu.ca/~cbm/aands/page_940.htm), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 940, MR0167642, ISBN 978-0486612720, .

[3] NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution (http://www.itl.nist.gov/div898/handbook/eda/section3/ eda3666.htm)

[4] Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9.

[5] Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6.

[6] Chi-square distribution (http://mathworld.wolfram.com/Chi-SquaredDistribution.html), from MathWorld, retrieved Feb. 11, 2009 [7] M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN

978-0-387-34657-1

[8] Box, Hunter and Hunter. Statistics for experimenters. Wiley. p. 46.

[9] Chi-Square Test (http://www2.lv.psu.edu/jxm57/irp/chisquar.html) Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State University. In turn citing: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV

Notations

• Wilson, E.B. Hilferty, M.M. (1931) The distribution of chi-square. Proceedings of the National Academy of

Sciences, Washington, 17, 684–688.

External links

• Earliest Uses of Some of the Words of Mathematics: entry on Chi square has a brief history (http://jeff560. tripod.com/c.html)

• Course notes on Chi-Square Goodness of Fit Testing (http://www.stat.yale.edu/Courses/1997-98/101/chigf. htm) from Yale University Stats 101 class.

• Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e.g. Σx², for a normal population (http://demonstrations.wolfram.com/StatisticsAssociatedWithNormalSamples/)

• Simple algorithm for approximating cdf and inverse cdf for the chi-square distribution with a pocket calculator (http://www.jstor.org/stable/2348373)

(30)

Dirichlet distribution

Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left:

α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4).

In probability and statistics, the Dirichlet distribution (after Johann Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parametrized by a vector of positive reals. It is the multivariate generalization of the beta distribution, and conjugate prior of the categorical distribution and multinomial distribution in Bayesian statistics. That is, its probability density function returns the belief that the probabilities of K rival events are given that each event has been observed times.

The support of the Dirichlet distribution (i.e. the set of values for which the density is non-zero) is a -dimensional vector of real numbers in the range , all of which sum to 1. These can be viewed as the probabilities of a K-way categorical event. Another way to express this is that the domain of the Dirichlet distribution is itself a probability distribution, specifically a -dimensional discrete distribution. Note that the technical term for the set of points in the support of a -dimensional Dirichlet distribution is the open standard

-simplex, which is a generalization of a triangle, embedded in the next-higher dimension. For example, with

, the support looks like an equilateral triangle embedded in a downward-angle fashion in three-dimensional space, with vertices at and , i.e. touching each of the coordinate axes at a point 1 unit away from the origin.

A very common special case is the symmetric Dirichlet distribution, where all of the elements making up the vector have the same value. In this case, the distribution can be parametrized by a single scalar value , called the concentration parameter. When this value is 1, the symmetric Dirichlet distribution is equivalent to a uniform distribution over the open standard standard -simplex, i.e. it is uniform over all points in its support. Values of the concentration parameter above 1 prefer variates that are dense, evenly-distributed distributions, i.e. all probabilities returned are similar to each other. Values of the concentration parameter below 1 prefer sparse distributions, i.e. most of the probabilities returned will be close to 0, and the vast majority of the mass will be concentrated in a few of the probabilities.

(31)

Probability density function

The Dirichlet distribution of order K ≥ 2 with parameters α1, ..., αK > 0 has a probability density function with respect to Lebesgue measure on the Euclidean space RK–1 given by

for all x1, ..., xK–1 > 0 satisfying x1 + ... + xK–1 < 1, where xK is an abbreviation for 1 – x1 – ... – xK–1. The density is zero outside this open (K − 1)-dimensional simplex.

The normalizing constant is the multinomial beta function, which can be expressed in terms of the gamma function:

Properties

Let , meaning that the first K – 1 components have the above density and

Define . Then

in fact, the marginals are Beta distributions:

Furthermore, if

(note that the matrix so defined is singular). The mode of the distribution is the vector (x1, ..., xK) with

Conjugate to multinomial

The Dirichlet distribution is conjugate to the multinomial distribution in the following sense: if

where βi is the number of occurrences of i in a sample of n points from the discrete distribution on {1, ..., K} defined by X, then

This relationship is used in Bayesian statistics to estimate the hidden parameters, X, of a categorical distribution (discrete probability distribution) given a collection of n samples. Intuitively, if the prior is represented as Dir(α), then Dir(α + β) is the posterior following a sequence of observations with histogram β.

(32)

Entropy

If X is a Dir(α) random variable, then we can use the exponential family differential identities to get an analytic expression for the expectation of and its associated covariance matrix:

and

where is the digamma function, is the trigamma function, and is the Kronecker delta. The formula for yields the following formula for the information entropy of X:

Aggregation

If , then . This aggregation property may be used to derive the marginal distribution of mentioned above.

Neutrality

If , then the vector~ is said to be neutral[1] in the sense that is

independent of and similarly for .

Observe that any permutation of is also neutral (a property not possessed by samples drawn from a generalized Dirichlet distribution).

The derivation of the neutrality property:

Let . And let

,   ,   ,   ,  

For the purpose of convenience, we set . Here we aim to derive that

also follow a Dirichlet distribution as .

We start the derivation with change of variables from to . The Jacobian can be calculated easily:

Thus, the probability density function of is the following:

From the above equation, it is obvious that the derived probability density function is actually a joint distribution of two independent parts, a Beta distributed part and a Dirichlet distributed part. By trivially integrating out , the result is obvious.

(33)

Related distributions

• If, for

then

and

Though the Xis are not independent from one another, they can be seen to be generated from a set of independent gamma random variables. Unfortunately, since the sum is lost in forming X, it is not possible to recover the original gamma random variables from these values alone. Nevertheless, because independent random variables are simpler to work with, this reparametrization can still be useful for proofs about properties of the Dirichlet distribution.

The following is a derivation of Dirichlet distribution from Gamma distribution.

Let Yi, i=1,2,...K be a list of i.i.d variables, following Gamma distributions with the same scale parameter θ

then the joint distribution of Yi, i=1,2,...K is

Through the change of variables, set Then, it's easy to derive that Then, the Jacobian is It means

So,

By integrating out γ, we can get the Dirichlet distribution as the following.

According to the Gamma distribution,

Finally, we get the following Dirichlet distribution

where XK is (1-X1 - X2... -XK-1)

(34)

Random number generation

Gamma distribution

A fast method to sample a random vector from the K-dimensional Dirichlet distribution with parameters follows immediately from this connection. First, draw K independent random samples

from gamma distributions each with density

and then set

Marginal beta distributions

A less efficient algorithm[2] relies on the univariate marginal and conditional distributions being beta and proceeds as follows. Simulate from a distribution. Then simulate in order, as follows.

For , simulate from a distribution, and let .

(35)

Intuitive interpretations of the parameters

String cutting

One example use of the Dirichlet distribution is if one wanted to cut strings (each of initial length 1.0) into K pieces with different lengths, where each piece had a designated average length, but allowing some variation in the relative sizes of the pieces. The α/α0 values specify the mean lengths of the cut pieces of string resulting from the distribution. The variance around this mean varies inversely with α0.

Pólya's urn

Consider an urn containing balls of K different colors. Initially, the urn contains α1 balls of color 1, α2 balls of color 2, and so on. Now perform N draws from the urn, where after each draw, the ball is placed back into the urn with an additional ball of the same color. In the limit as N approaches infinity, the proportions of different colored balls in the urn will be distributed as Dir(α1,...,αK).[3]

For a formal proof, note that the proportions of the different colored balls form a bounded [0,1]K-valued martingale, hence by the martingale convergence theorem, these proportions converge almost surely and in mean to a limiting random vector. To see that this limiting vector has the above Dirichlet distribution, check that all mixed moments agree.

Note that each draw from the urn modifies the probability of drawing a ball of any one color from the urn in the future. This modification diminishes with the number of draws, since the relative effect of adding a new ball to the urn diminishes as the urn accumulates increasing numbers of balls. This "diminishing returns" effect can also help explain how large α values yield Dirichlet distributions with most of the probability mass concentrated around a single point on the simplex.

(36)

See also

• Beta distribution • Binomial distribution • Categorical distribution

• Generalized Dirichlet distribution • Latent Dirichlet allocation • Dirichlet process

• Multinomial distribution • Multivariate Polya distribution

References

[1] Connor, Robert J.; Mosimann, James E (1969). "Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution" (http://jstor.org/stable/2283728). Journal of the American statistical association (American Statistical Association) 64 (325): 194–206. doi:10.2307/2283728. .

[2] A. Gelman and J. B. Carlin and H. S. Stern and D. B. Rubin (2003). Bayesian Data Analysis (2nd ed.). pp. 582. ISBN 1-58488-388-X. [3] Blackwell, David; MacQueen, James B. (1973). "Ferguson distributions via Polya urn schemes". Ann. Stat. 1 (2): 353–355.

doi:10.1214/aos/1176342372.

External links

• Dirichlet Distribution (http://www.cis.hut.fi/ahonkela/dippa/node95.html)

• Estimating the parameters of the Dirichlet distribution (http://research.microsoft.com/~minka/papers/ dirichlet/minka-dirichlet.pdf)

(37)

F-distribution

Fisher-Snedecor

Probability density function

Cumulative distribution function

parameters: deg. of freedom

support: pdf: cdf: mean: for median: mode: for variance: for

(38)

skewness:

for

ex.kurtosis: see text entropy:

mgf: does not exist, raw moments defined elsewhere[1] [2] cf: defined elsewhere[1] [2]

In probability theory and statistics, the F-distribution is a continuous probability distribution.[1][2][3][4] It is also

known as Snedecor's F distribution or the Fisher-Snedecor distribution (after R.A. Fisher and George W. Snedecor). The F-distribution arises frequently as the null distribution of a test statistic, especially in likelihood-ratio tests, perhaps most notably in the analysis of variance; see F-test.

Characterization

A random variate of the F-distribution arises as the ratio of two chi-squared variates:

where

• U1 and U2 have chi-square distributions with d1 and d2 degrees of freedom respectively, and • U1 and U2 are independent (see Cochran's theorem for an application).

The probability density function of an F(d1, d2) distributed random variable is given by

for real x ≥ 0, where d1 and d2 are positive integers, and B is the beta function. The cumulative distribution function is

where I is the regularized incomplete beta function.

The expectation, variance, and other details about the are given in the sidebox; for , the kurtosis is

where

The F-distribution is a particular parametrization of the beta prime distribution, which is also called the beta distribution of the second kind.

(39)

Generalization

A generalization of the (central) F-distribution is the noncentral F-distribution.

Related distributions and properties

• If then has the chi-square distribution • is equivalent to the scaled Hotelling's T-square distribution

.

• If then .

• if has a Student's t-distribution then .

• if and then has a Beta-distribution.

• if is the quantile for and is the quantile for

then .

References

[1] Johnson, Norman Lloyd; Samuel Kotz, N. Balakrishnan (1995). Continuous Univariate Distributions, Volume 2 (Second Edition, Section 27). Wiley. ISBN 0-471-58494-0.

[2] Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26" (http://www.math.sfu.ca/~cbm/aands/page_946.htm), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 946, MR0167642, ISBN 978-0486612720, .

[3] NIST (2006). Engineering Statistics Handbook - F Distribution (http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm) [4] Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 246-249).

McGraw-Hill. ISBN 0-07-042864-6.

External links

• Table of critical values of the F-distribution (http://www.itl.nist.gov/div898/handbook/eda/section3/ eda3673.htm)

• Earliest Uses of Some of the Words of Mathematics: entry on F-distribution contains a brief history (http:// jeff560.tripod.com/f.html)

(40)

Gamma distribution

Gamma

Probability density function

Cumulative distribution function

parameters: shape scale support: pdf: cdf: mean:

median: no simple closed form

mode: variance: skewness: ex.kurtosis: entropy: mgf: cf:

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. It has a scale parameter θ and a shape parameter k. If k is an integer, then the distribution represents an Erlang distribution, i.e., the sum of k independent exponentially distributed random variables, each of which has a mean of θ (which is equivalent to a rate parameter of θ −1) .

The gamma distribution is frequently a probability model for waiting times; for instance, in life testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution.[1] Gamma distributions

(41)

were fitted to rainfall amounts from different storms, and differences in amounts from seeded and unseeded storms were reflected in differences in estimated k and parameters [2]

Characterization

A random variable X that is gamma-distributed with scale θ and shape k is denoted

Probability density function

The probability density function of the gamma distribution can be expressed in terms of the gamma function parameterized in terms of a shape parameter k and scale parameter θ. Both k and θ will be positive values.

The equation defining the probability density function of a gamma-distributed random variable x is

(This parameterization is used in the infobox and the plots.)

Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter:

If α is a positive integer, then

Both parametrizations are common because either can be more convenient depending on the situation.

Illustration of the Gamma PDF for parameter values over k and x with θ set to 1, 2, 3, 4, 5 and 6. One can see each θ layer by itself here [3] as well as by k [4] and x.

[5].

Cumulative distribution

function

The cumulative distribution function is the regularized gamma function:

(42)

where is the lower incomplete gamma function.

It can also be expressed as follows, if k is a positive integer (i.e., the distribution is an Erlang distribution)[6] :

Properties

Summation

If Xi has a Γ(ki, θ) distribution for i = 1, 2, ..., N, then

provided all Xi' are independent.

The gamma distribution exhibits infinite divisibility.

Scaling

If

then for any α > 0,

Exponential family

The Gamma distribution is a two-parameter exponential family with natural parameters k − 1 and −1/θ, and natural statistics X and ln (X).

Information entropy

The information entropy is given by where ψ(k) is the digamma function.

One can also show that (if we use the shape parameter k and the inverse scale parameter β),

References

Related documents

C hapter 6 proposes two m ethods for eliciting a standard Dirichlet prior distribution for m ultinom ial probabilities, using either a m arginal or a conditional

This distribution is interesting as it consists of exponentiated Cauchy distribution and distributions of record values of Cauchy distribution as special cases.. Various properties

Keywords: Estimating and testing scaling parameters, Non-standard model assumptions, Generalized Chi-square and Fisher distributions, Heavy and light distribution tails, Heavy and

[3] considered skewed distributions of the R, G, and B color bands, and used a mixture of Beta distributions to model the image data.. The Beta distribution can approximate

Excel Program Statdisk Program Analysis Probability Distributions Binomial Probability

The following five probability distributions namely, normal, log normal, gamma, weibull and exponential distribution were used to select the best fit probability distribution

Precipitation depths estimated for up to 1 in 100 years return periods based on the best-fit distributions according to the K-S and A-D tests and the Burr distribution

 Find probabilities using a normal distribution table and apply the normal distribution to business problems..  Recognize when to apply the uniform and exponential