Sampling a Probability Density Function - Visuelle Analyse von hochdimensionalen Räumen

Using our framework, the user may define an arbitrary one- or two-dimensional prob- ability density functionas input by simply drawing it on the screen. This input is then interpreted as a finite list of values representing a PDF. Directly interpolating these values into a continuous function and using common statistical methods to sample this distribution, however, can be computationally expensive depending on the given function. For that reason, we develop an alternative algorithm that allows us to quickly sample any function the user may define as input.

5.5.1 Input Functions

Before we can assume that the finite list of discrete values that the user provides as input represents an actual probability density functionρ, we have to prove that the input, after interpolation, will adhere to the definition of a PDF.

First, the user interface is designed in such a way as to only allow values greater than or equal to 0, thus fulfilling:

ρ(x) ≥ 0, x ∈ R. (5.1)

Second, the input is limited to a user-defined range, given by a lower boundα and an upper boundω. Our system further assumes that all values beyond that range are 0. This yields:

lim

x→−∞ρ(x) = limx→∞ρ(x) = 0. (5.2)

Finally, a user-drawn function is considered to be implicitly normalized:

Z ∞

−∞ρ(x)dx = 1.

(5.1), (5.2) and (5.3) together define a continuous PDF [BSMM01, p.772].

We therefore know that the given input, after interpolation, adheres to the mathe- matical definition of a continuous PDF.

5.5.2 Sampling a Discrete Function

When generating random samples from a continuous PDF ρ, we make use of ρ’s cumulative probability function P. A cumulative probability function [Bul79, pp.36] is defined as

P(t) :=

−∞ρ(x)dx, ∀ t,x ∈ R.

Since, in our case, the continuous functionρ is represented by a list of l discrete values ρarr, we can calculate Parr[x] as

Parr[x] := x

∑

i=0

parr[i], ∀x ∈ [0, l − 1] ∩ N.

Generating a random variable r with distribution ρarr is equivalent to selecting an array-index i_{∈ [0,l − 1] and mapping that index onto [α,ω]:}

r:= (α + i(ω− α)),

beingα and ω the lower and upper bounds, respectively.

Since most trivial random number generators can only generate uniformly distributed random variables, we cannot directly reproduce the arbitrary distribution described inρarr. Therefore, we have to map these uniformly distributed random variables onto a weighted representation ofρarr. This weighted representation is the cu-

mulative probability function Parrofρarr.

Given a uniformly distributed random variable_{λ ∈ [0,1), we select an index i so} that

Parr[i] ≤ λ ≤ Parr[i + 1]. (5.4)

Figure 5.8: An index i is selected according to the position of a random variableλ in the cumulative probability function Parr.

5.5.3 Sampling an Interpolated Function

A set of random data samples generated from a discrete array of size l can only have one distinct data sample per array cell. In order to approximate subcell accuracy, we have to use interpolation.

It is important to note that in this case interpolation is not a trivial problem. We are not looking to re-create any function values ofρ, but are instead searching for a way to manipulate the discrete random data samples generated according toρarr. Our goal is matching up the distribution of such generated data samples with the continuous PDF ρ as closely as possible.

An intuitive approach to this is to add an offset using a uniformly distributed random variable_{λ ∈ [0,1] to the selected index i in order to generate a new index i}cont∈ R:

icont:= i +

λ −1₂

This successfully closes definition gaps in our distribution, allowing us to generate any number of values for each array cell. However, the accuracy of this method can still be improved.

We found that a more precise interpolation can be achieved by emulating a tech- nique that is commonly used in the field of signal processing. When recreating an equidistant discrete sampling of a continuous signal, it is known that convolving the signal with a tent filter yields a result akin to linear interpolation.

Consider a triangular PDF t [Sau00] t(x) =

min_{(0, x + 1), if x ≤ 0} max_{(1 − x, 1), else} .

Figure 5.9: Convolution ofρarr(black) and t (yellow), results in a linearly interpolated

function (green).

Using an algorithm to generate random data samples from a triangular distribution [Sau00] we can calculate the interpolated index icontas:

icont:= i +

−1 +√2λ , if_{λ ≤ 0.5}

1+p2(1 − λ), else , ∀λ ∈ [0,1).

In order to calculate the offset of our index, we overlay t ontoρ at the selected index i. This generates a linear probability gradient between i and its neighboring indices. In Figure 5.9 we can see that performing this operation for all indices is equivalent to a convolution ofρ and t. The resulting function, marked in green, is a continuous linear interpolation ofρarr. In consequence, the distribution described by the randomly generated data samples equals a linearly interpolated sampling ofρarr(Figure 5.10).

Figure 5.10: Linear interpolation ofρarr(green) as an approximation of what the user

may have intended to draw (red).

This method is not limited to the one-dimensional case but may also be analogously extended into n-dimensions.

In document Visuelle Analyse von hochdimensionalen Räumen (Page 124-128)