Analysis of Model Tasks from Selected Application Areas Using
a Computer (software R, Excel)
Task
The aim of this study is to assess the wastewater treatment plant (WWTP) with given the presence of synthetic musk compounds in water passed this WWTP.
The real sample consists of 60 fish from the carp family, specifically of the European chub (Leuciscus cephalus), which were caught in the Svratka River. Half of them were caught in front of (Group 1), and half of them behind (Group 2) the WWTP. Fish tissue samples (specifically in muscle) were analyzed, and two nitromusk compounds (musk ambrette (AMB), musk tibetene (TIB)), and two polycyclic musk compounds (phantolide (PH), traseolide (TR)) were explored. Fish of approximately the same age were chosen for the analysis.
The measuring instrument had two limitations: the limit of detection LOD =xD and the limit of quantification LOQ=xQ,xD<xQ.
Task
Data was left doubly time censored (the limit of detection and the limit quantification are in the role of censor time) and for each group we know the number of censored
observations, ie the number observationnD prior the detection, the number of
observationsnQ in the interval (xD,xQiand the number of non-censored observationsn0 and their current values.
Data can be found in the article Fusek, Michalek, Zouhar, Vavrova: Statistical Analysis of Musk Compounds Concentrations in Fish Tissue Based on Doubly Left-Censored Samples. DEED 2013.
The model
Suppose thatX1, . . . ,Xn is the random sample from the exponential distribution with the parameterλand the densityf(x, λ) and the distribution functionF(x, λ). X(1), . . . ,X(n) is the ordered random sample.
Double censoring wit the censorsxD andxQ:
nD is a frequency of censored inh0,xDi,
nQ is a frequency of censored in (xD,xQi,
nO is a frequency of non-censored, bigger thanxQ,X(n−nO+1), . . . ,X(n).
The vector (nD,nQ,nO)∼Mu3(n, θD, θQ, θO),n=nD+nQ+nO (the multinomial distribution), the marginal distributions are
nD∼Bi(n, θD), nQ∼Bi(n, θQ), nO∼Bi(n, θO), where θD=F(xD, λ), θQ =F(xQ, λ)−F(xD, λ), θO = 1−F(xQ, λ).
The likelihood function and the empirical Fisher information
The likelihood function is L(λ,nD,nQ,x(n−nO+1), . . . ,x(n)) = n! nD!nQ![F(xD, λ)] nD[F(xQ, λ)−F(xD, λ)]nQ n Y j=n−nO+1 f(xj).
The log-likelihood function is of the form
l(λ,nD,nQ,x(n−nO+1), . . . ,x(n)) = log n! nD!nQ! +nDlog [F(xD, λ)] +nQlog [F(xQ, λ)−F(xD, λ)] + n X j=n−nO+1 log f(x(j)) .
The likelihood function and the empirical Fisher information
∂l ∂λ = nD Fλ0(xD, λ) F(xD, λ) | {z } HD(xD,λ) +nQ Fλ0(xQ, λ)−F 0 λ(xD, λ) F(xQ, λ)−F(xD, λ) | {z } HQ(xQ,λ) + n X j=n−nO+1 fλ0(x(j)) f(x(j)) ∂2l ∂λ2 = nD ∂HD(xD, λ) ∂λ +nQ ∂HQ(xQ, λ) ∂λ + ∂ ∂λ n X j=n−nO+1 fλ0(x(j)) f(x(j))ML parameter estimate for the random sample from
Ex
(
λ
)
The exponential distribution with the distribution functionF(x) = 1−exp (−λx) and the densityf(x) =λexp (−λx). The likelihood function is
l(λ,nD,nQ,x(n−nO+1), . . . ,x(n)) = log n! nD!nQ! +nDlog [1−exp (−λxD)]
+nQlog [exp (−λxD)−exp (−λxQ)] +nOlog (λ)−λ n X i=n−nO+1 x(i) ∂l ∂λ = nD xDexp (−λxD) 1−exp (−λxD) +nQ xQexp (−λxQ)−xDexp (−λxD) exp (−λxD)−exp (−λxQ) +nO λ − n X i=n−nO+1 x(i) (1)
ML parameter estimate for the random sample from
Ex
(
λ
)
∂2l ∂λ2 =nD −x 2 Dexp (−λxD) 1−exp (−λxD) − x 2 Dexp (−2λxD) [1−exp (−λxD)]2 +nQ x2 Dexp (−λxD)−xQ2exp (−λxQ) exp (−λxD)−exp (−λxQ) −[xQexp (−λxQ)−xDexp (−λxD)] 2 [exp (−λxD)−exp (−λxQ)]2 −nO λ2The empirical Fisher information is of the form
Jempir =−∂ 2 l ∂λ2 =nD xD2exp (−λxD) 1−exp (−λxD) + x 2 Dexp (−2λxD) [1−exp (−λxD)]2 −nQ x2 Dexp (−λxD)−xQ2exp (−λxQ) exp (−λxD)−exp (−λxQ) −[xQexp (−λxQ)−xDexp (−λxD)] 2 [exp (−λxD)−exp (−λxQ)]2 +nO λ2.
ML parameter estimate for the random sample from
Ex
(
λ
)
E(nD) = nθD=n[1−exp (−λxD)],
E(nQ) = nθQ=n[exp (−λxD)−exp (−λxQ)],
E(nO) = nθO=nexp (−λxQ). The theoretical Fisher information is
Jteor =−E∂ 2 l ∂λ2 =n xD2exp (−2λxD) [1−exp (−λxD)] +nxQ2exp (−λxQ) +n[xQexp (−λxQ)−xDexp (−λxD)] 2 [exp (−λxD)−exp (−λxQ)] + n λ2exp (−λxQ).
Comparison of two censored exponential distributions
We will test the null hypothesisH0:λ1−λ2= 0 against the alternativeH1:λ1−λ26= 0, whereλ1andλ2are parameters of the exponential distribution describing data in two locations. The test statisticsT is
T = pbλ1−λb2 b σ2 1+σb 2 2 , (2) wherebσ 2 1 andbσ 2
Comparison of two censored exponential distributions
The variance (asympt.) σ2kis
σ2k=J −1
k ,k= 1,2, (3) whereJkis the Fisher information and we can calculate is by the second derivatives of the log-likelihood function.
Results
It was discovered that there is no significant difference between Group 1 and Group 2 in expected concentrations of Phantolide, Traseolide and Musk Ambrette. However, there is a difference between Group 1 and Group 2 in expected concentrations of Musk Tibeten.
Compound H p-value
PH 0 0.47
TR 0 0.58
AMB 0 1.00
TIB 1 0.00
Table: Comparison of the expected concentrations of musk compounds between Group 1 and Group 2. H= 0 (H= 1) denotes that the null hypothesis is not rejected (is rejected) on the significance level 0.05.