Hypothesis Testing for Novelty Detection - System Identification Methods

2.2 System Identification Methods

2.3.1 Hypothesis Testing for Novelty Detection

2.3.1.1 Univariate Damage Features

The probability density function (PDF) of the normally distributed variable x ∈ X can be described by the mean value µ and the standard deviation σ or the variance σ2 as p(x) = 1 σ√2πe −1/2((x−µ)2_/σ2₎ (2.54) where µ = E[x] =X x∈X xp(x) (2.55) σ2= V ar[x] = E[(x − µ)2] =X x∈X (x − µ)2p(x) (2.56)

Normally distributed data tend to cluster around the mean. Numerically, the probabilities obey

P [|x − µ| ≤ σ] ' 0.68 (2.57a)

P [|x − µ| ≤ 2σ] ' 0.95 (2.57b)

P [|x − µ| ≤ 3σ] ' 0.997 (2.57c)

A natural measure of the distance from x to the mean µ is the distance |x − µ| measured in units of standard deviation

r = |x − µ|

σ (2.58)

A standardized normal variable rn= (x − µ)/σ has a zero mean and unit standard deviation, so the normalized PDF can be written as

p(rn) = 1 √ 2πe −r2 n/2 (2.59)

The error function is a finite integral of the Gaussian distribution defined as erf (rn) = 2 √ π Z rn 0 e−x2dx (2.60)

The error function corresponds to the area under the standardized Gaussian PDF between −√2rnand

√

2rn. That is, if x is a standardized Gaussian random variable, P r[|x| ≤√2rn] = erf (rn). Thus, the complementary probability 1 − erf(u) is the probability that a sample is chosen, with |x| >√2rn.

In this sense, for each CP, a PDF can be fitted over all data instances considered in training. For a normal distribution, decision boundaries can be defined by the confidence interval upper and lower limits. In terms of statistical process control

(SPC), these decision boundaries are called upper control limit (UCL) and lower control limit (LCL). The confidence interval is an estimated range of the PDF [α, β], which is likely to include a value, for instance a CP, and which is calculated from a given set of sampled data. The confidence level 1 − a gives the confidence or the probability P (W ) ∈ [0, 1] that a CP lies within the confidence interval, so that

P (W ) = P (α < Xi< β) = 1 − a (2.61)

where a is the significance level. The confidence interval can be conceived as a range of values on the horizontal axis of the PDF function, while the confidence level is the area under the PDF curve within this interval. The upper and lower limits of the two-sided confidence interval (1 − a) for the mean value of a CP, µX, are then defined as

LCLn= µX− ta

2%,ns−1σX (2.62a)

U CLn= µX+ ta

2%,ns−1σX (2.62b)

taking into account the variance of the sampled normal distribution σ2X= 1 ns− 1 X (xi− ¯x)2 (2.63) and the (1 −a 2)

th_{percentile of the t-distribution with n}

s− 1DOF given by ta 2,ns−1.

t-distributions are formed by taking many samples from a normal distribution and the values of ta

2,ns−1can be found in tables. In many applications, these factors are

equal to three, setting the control limits to distance ±3σ from the mean value. If the PDF of a variable is unknown, percentiles can be used for select- ing the decision boundaries as a function of the significance level a, where a = (1 −confidence level). In this case, the center line (CL) is the 50th_{percentile and the} LCL and UCL are percentiles Xa/2%and X1−a/2%respectively.

LCLd= Xa/2% (2.64a)

U CLd= X1−a/2% (2.64b)

2.3.1.2 Multivariate Damage Features

Multivariate features are composed of η normally distributed random variables, each with its own mean value and variance, so that pxi(xi) ∼ N (µi, σ

i). For this case, the mean value becomes a vector containing the mean values of the d individual variables, and the covariance becomes a matrix ∈ Rηxη_{[97]. The mean value is}

µ = E[x] =

Z +∞

−∞

the covariance matrix is Σ =      σ12 σ122 · · · σ1η2 σ221 σ22 · · · σ2η2 .. . ... ... ... σ2η1 σ2η2 · · · σ2η      (2.66) with Σ = E[(x − µ)(x − µ)T] = Z +∞ −∞ (x − µ)(x − µ)p(x)dx (2.67)

and the PDF for a multivariate normal distribution with a mean µ and a covariance matrix Σ is given by p(x) = 1 (2π)η/2_|Σ|1/2exp h −1 2(x − µ) T Σ−1(x − µ)i (2.68)

Multivariate normal data tend to cluster about the mean vector µ, forming an ellipsoidally shaped cloud, whose principal axes are the eigenvectors of the covariance matrix. The measure of the distance from x to the mean µ is provided by the Mahalanobis squared distance (MSD) from x to µ

r2M = (x − µ)TΣ−1(x − µ) (2.69)

As a result, rn= Σ−1/2(x − µ)is the analogous expression for rn= (x − µ)/σ. Contour plots of constant density are ellipsoids defined by x, so that

(x − µ)TΣ−1(x − µ) = κ2 (2.70)

These ellipsoids are centered at µ with axes ±κpλm i e

i , where Σemi = λmi emi , for i = 1, 2, ..., η. For the η-dimensional normal distribution, the MSD follows a χ2_- distribution, so that κ2_{= χ}2

p(a)can be defined, with χ2p(a)being the 100thpercentile of the χ2_{-distribution. This choice leads to contours, which contain (1 − a) of the} probability.

The equivalent of the confidence interval for multivariate distributions is the confidence level. For the multivariate case confidence regions can also be defined; these serve as boundaries for hypothesis testing. It can be shown that the confidence level of a multivariate CP is given by

P (ns(x − µ)TΣ(x − µ)) ≤

(ns− 1)η ns− η

Fη,ns−η(a) = 1 − a (2.71)

where Fη,ns−η(a)denotes critical values of the F -distribution, which can be found in

tables. The F -distribution is employed when dealing with statistics formed by ratios of variance estimates and is used extensively in the analysis of variance applications.

In document Transmissibility-based monitoring and combination of damage feature decisions within a holistic structural health monitoring framework (Page 80-83)