• No results found

Let there be ๐‘›๐‘› subjects observed during time 0 to ๐‘‡๐‘‡. ๐‘Œ๐‘Œ๐‘–๐‘–๐‘–๐‘– is the outcome measured for the ๐‘–๐‘–-th subject at time ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘– (0 โ‰ค ๐‘—๐‘— โ‰ค ๐‘‡๐‘‡). ๐‘‹๐‘‹๐‘–๐‘–๐‘˜๐‘˜ is the covariate for the ith subject at time ๐‘ก๐‘ก

๐‘–๐‘–๐‘˜๐‘˜ (0 โ‰ค ๐‘˜๐‘˜ โ‰ค ๐‘‡๐‘‡). Because the outcome is measured less frequently than the covariate, for each subject, when k=j both outcome and covariate are measured, and when there are no matching j for k only covariate is measured. The parametric or nonparametric mixed model cannot be applied to this type of data

directly because of the inconsistent measurement. We introduce a three-step estimation method to explore this type of data, which enables the use of all data with no loss of information by discarding or modifying the extra covariate measurements.

3.2.1 Step one โ€“ insert pseudo data

At time ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ when only the covariate are measured, pseudo data points of the outcome will be inserted for every subject to create a new dataset. At times ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ and ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘– (0 โ‰ค ๐‘ฃ๐‘ฃ < ๐‘ข๐‘ข โ‰ค ๐‘‡๐‘‡)both the outcome and the covariate are measured, and between these two time points ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ (๐‘ฃ๐‘ฃ < ๐‘˜๐‘˜ < ๐‘ข๐‘ข), only covariate are measured. We assume that the change of outcome from time ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ to ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘– is linear. By solving for ai and bi in function below we will get the straight line from ๐‘Œ๐‘Œ๐‘–๐‘–๐‘ฃ๐‘ฃ to ๐‘Œ๐‘Œ๐‘–๐‘–๐‘–๐‘– for

subject ๐‘–๐‘–.

๏ฟฝ๐‘Œ๐‘Œ๐‘–๐‘–๐‘ฃ๐‘ฃ = ๐‘Ž๐‘Ž๐‘–๐‘– ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ+ ๐‘๐‘๐‘–๐‘–

๐‘Œ๐‘Œ๐‘–๐‘–๐‘–๐‘– = ๐‘Ž๐‘Ž๐‘–๐‘– ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘–+ ๐‘๐‘๐‘–๐‘– (3.2.1a) ๐‘Œ๐‘Œ๐‘–๐‘–๐‘˜๐‘˜ = ๐‘Ž๐‘Ž๐‘–๐‘–๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ + ๐‘๐‘๐‘–๐‘– (3.2.1b) Then by substituting ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ (๐‘ฃ๐‘ฃ < ๐‘˜๐‘˜ < ๐‘ข๐‘ข) into the function above we get pseudo data ๐‘Œ๐‘Œ๐‘–๐‘–๐‘˜๐‘˜ that matches real measurements ๐‘‹๐‘‹๐‘–๐‘–๐‘˜๐‘˜ between times ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ and (๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘–0 โ‰ค ๐‘ฃ๐‘ฃ < ๐‘ข๐‘ข โ‰ค ๐‘‡๐‘‡) for subject ๐‘–๐‘–. This procedure is done repeatedly in the same way between all of the adjacent time points tij for each subject to get all of the pseudo data of Y at the time points when only the covariate is recorded. After this step, a new dataset (๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜,๐‘Œ๐‘Œ๐‘–๐‘–๐‘˜๐‘˜, ๐‘‹๐‘‹๐‘–๐‘–๐‘˜๐‘˜) is created.

Missing data are common in longitudinal studies. If data are missing for subject ๐‘–๐‘– at time ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ or ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘– (0 โ‰ค ๐‘ฃ๐‘ฃ < ๐‘ข๐‘ข โ‰ค ๐‘‡๐‘‡), there is no way to insert data for Y between these two time points using functions (3.4.1a) and (3.4.1b). Missing data will be inserted as pseudo data for Y in this situation.

3.2.2 Step two โ€“ raw estimates

For subjects at each time ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ (0 โ‰ค ๐‘˜๐‘˜ โ‰ค ๐‘‡๐‘‡), a standard linear model (3.2.2) is fitted to get the raw estimates and standard errors (๐ต๐ต๐‘’๐‘’๐‘˜๐‘˜) of ฮฒ

Y(๐‘ก๐‘ก๐‘˜๐‘˜) = (X(๐‘ก๐‘ก๐‘˜๐‘˜) ฮฒ(๐‘ก๐‘ก๐‘˜๐‘˜) + e(๐‘ก๐‘ก๐‘˜๐‘˜), (3.2.2) where e(๐‘ก๐‘ก๐‘˜๐‘˜) is the error term. Here, sample sizes of the local linear regressions will not be the same, due to missing data.

3.2.3 Step three โ€“ smooth raw estimates

Local polynomial smoothing will be used to smooth the raw estimates from step two. This step is a modification of local polynomial kernel smoothing. Rather than using kernel functions to assign weights during smoothing, only the analytical weights that indicate the importance of the raw estimates will be used.

Let ๐‘๐‘ be the degree of the polynomial being fit. At time point ๐‘ก๐‘ก๐‘˜๐‘˜(0 โ‰ค ๐‘˜๐‘˜ โ‰ค ๐‘‡๐‘‡), the smoothed estimate ฮฒsmooth(k) are obtained by smoothing raw estimates in the local neighborhood

๐ผ๐ผโ„Ž (๐‘ก๐‘ก๐‘˜๐‘˜) = [๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ โ„Ž, ๐‘ก๐‘ก๐‘˜๐‘˜+ โ„Ž]. By using only analytical weight ๐‘Š๐‘Š, the smoothed estimate ฮฒsmooth(k) at

๐‘ก๐‘ก๐‘˜๐‘˜ is the value of estimated ๐›ฝ๐›ฝฬ‚0, where ๐œท๐œท๏ฟฝ = (๐›ฝ๐›ฝฬ‚0, ยทยทยท,๐›ฝ๐›ฝฬ‚๐‘๐‘) minimizes

โˆ‘๐‘๐‘๐‘˜๐‘˜=0๏ฟฝ๐›ฝ๐›ฝ๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘Ÿ๐‘Ÿ(๐‘˜๐‘˜)โˆ’ ๐›ฝ๐›ฝ0โˆ’ ๐›ฝ๐›ฝ1(๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ ๐‘ก๐‘ก) โˆ’ ยทยทยท โˆ’ ๐›ฝ๐›ฝ๐‘๐‘(๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ ๐‘ก๐‘ก)๐‘๐‘๏ฟฝ2๐‘Š๐‘Š. Weighted least squares theory leads to the solution

๐›ฝ๐›ฝฬ‚ = (๐‘ก๐‘กฬƒ๐‘‡๐‘‡ ๐‘Š๐‘Š ๐‘ก๐‘กฬƒ )โˆ’1๐‘ก๐‘กฬƒ๐‘‡๐‘‡ ๐‘Š๐‘Š ๐›ฝ๐›ฝ๏ฟฝ ๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘Ÿ๐‘Ÿ where ๐œท๐œท๏ฟฝ๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘Ÿ๐‘Ÿ is the vector of raw estimates from step two, and

๐’•๐’•๏ฟฝ = ๏ฟฝ 1โ‹ฎ 1 t1โˆ’ ๐‘ก๐‘ก๐‘˜๐‘˜ โ‹ฎ tnโˆ’ ๐‘ก๐‘ก๐‘˜๐‘˜ โ€ฆ โ‹ฑ โ€ฆ (t1โˆ’ ๐‘ก๐‘ก๐‘˜๐‘˜)p โ‹ฎ (tnโˆ’ ๐‘ก๐‘ก๐‘˜๐‘˜)p ๏ฟฝ

is an ๐‘›๐‘› ร— (๐‘๐‘ + 1) design matrix. ๐‘พ๐‘พ is an ๐‘›๐‘› ร— ๐‘›๐‘› diagonal matrix of analytical weights given by ๐‘พ๐‘พ = ๐‘‘๐‘‘๐‘–๐‘–๐‘Ž๐‘Ž๐‘‘๐‘‘{๐‘ค๐‘ค1 , โ€ฆ , ๐‘ค๐‘ค๐‘›๐‘›}.

Because pseudo data were inserted in step one, the raw estimates ๐›ฝ๐›ฝ๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘Ÿ๐‘Ÿ (๐‘˜๐‘˜) do not have the same accuracy. The most accurate raw estimates will the ones that were estimated by fitting local linear regressions at ๐‘ก๐‘ก๐‘˜๐‘˜ (๐‘˜๐‘˜ = ๐‘—๐‘—) where ๐‘Œ๐‘Œ๐‘–๐‘–๐‘˜๐‘˜ and ๐‘‹๐‘‹๐‘–๐‘–๐‘–๐‘– were both measured, so the highest analytical weights are given to those raw estimates during smoothing. The raw estimates from the local linear regressions that used pseudo data but are close to the time of the real measurement are also given higher analytical weights than the raw estimates that are far from the time of real measurement. The measure of the time distance between a raw estimate at time tk using pseudo data and the adjacent raw estimate using data of real measurement will be defined as

๐ท๐ท๐‘˜๐‘˜ = ๐‘š๐‘š๐‘–๐‘–๐‘›๐‘› (|๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ ๐‘ก๐‘ก๐‘™๐‘™๐‘Ž๐‘Ž๐‘™๐‘™๐‘ก๐‘ก ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘™๐‘™ ๐‘š๐‘š๐‘Ÿ๐‘Ÿ๐‘Ž๐‘Ž๐‘™๐‘™๐‘–๐‘–๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ| + 1, |๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ tnext real measure| + 1).

For example, at times ๐‘ก๐‘ก๐‘–๐‘–๐‘ฃ๐‘ฃ and ๐‘ก๐‘ก๐‘–๐‘–๐‘–๐‘– (0 โ‰ค v < u โ‰ค T)both outcome and covariate are measured and between these two time points ๐‘ก๐‘ก๐‘–๐‘–๐‘˜๐‘˜ (v < k < u), only the covariate is measured. ๐ท๐ท๐‘ฃ๐‘ฃ and ๐ท๐ท๐‘–๐‘– will equal 1 and ๐ท๐ท๐‘˜๐‘˜ will be calculated as

๐ท๐ท๐‘˜๐‘˜= ๐‘š๐‘š๐‘–๐‘–๐‘›๐‘› (|tkโˆ’ tv| + 1, |๐‘ก๐‘ก๐‘˜๐‘˜โˆ’ ๐‘ก๐‘ก๐‘–๐‘–| + 1).

We define the analytical weights in four different ways using ๐ท๐ท๐‘˜๐‘˜ and the standard error to reflect the importance of the raw estimate.

Type 1: ๐‘ค๐‘ค๐‘˜๐‘˜= 1 ๏ฟฝ๐ท๐ท๐‘˜๐‘˜ Type 2: ๐‘ค๐‘ค๐‘˜๐‘˜= 1 ๐ท๐ท๐‘˜๐‘˜ Type 3: ๐‘ค๐‘ค๐‘˜๐‘˜= 1 ๏ฟฝ๐ท๐ท๐‘˜๐‘˜ + 1 ๏ฟฝ๐‘™๐‘™๐‘Ÿ๐‘Ÿ๐‘˜๐‘˜ 23

Type 4: ๐‘ค๐‘ค๐‘˜๐‘˜= 1

๐ท๐ท๐‘˜๐‘˜ +

1 ๐ต๐ต๐‘’๐‘’๐‘˜๐‘˜

The performance of local polynomial smoothing also depends on the values chosen for bandwidth โ„Ž and order of polynomial ๐‘๐‘, but how to choose the best โ„Ž and ๐‘๐‘ for the proposed model is not the focus of this paper. We will use the value ๐‘๐‘=3 as recommended by Wand and Jones (1995) as this has been shown to be adequate. For data that are not equally spaced choosing certain bandwidth will result in less data that fall into the window where the data are sparser. In this situation, we will use a proportion of the data to which we will fit a local polynomial smoother like LOWESS. Because useful values of the smoothing parameter typically lie in the range 0.25 to 0.5 for most LOWESS applications, we will use 0.3.

Related documents