B. Study Design
3. Data analysis methods
a. Data checking
A preliminary review of the environmental exposure (e.g., beach sand samples, environmental beach conditions, and meteorologic conditions) and health outcome secondary data of physical symptoms and illness from all rounds of the NEEAR water study were performed to check for outliers and implausible data points. Errors in data were evaluated and if deemed actual errors by comparisons with responses of other participants and based upon substantive reasoning, then they were deleted from the data set. Data sets were also cleaned and evaluated for missing values. Beach sand
were recorded as continuous measures in data sets on secure servers at the EPA, Research Triangle Park, NC.
b. Overview of data analysis
We examined the association between beach sand exposure and physical health symptoms and illness using prospective data collected during the NEEAR water study (2003-2005 and 2007) from 7 recreational beach sites across the U.S. Data analysis was completed in two phases. First, we evaluated the association between self-reported exposure to beach sand [digging in the sand or building sand castles; burying one’s body in the sand] and the incidence of self-reported physical health symptoms and illness during 10-12 days follow-up (Study Question 1, Hypothesis 1, and Specific Aim 1). Next, we evaluated the association between exposure to fecal indicators (Enterococcus,
Bacteroides, B. thetaiotaomicron, and F+-specific coliphage) measured in beach sand samples and the incidence of self-reported health symptoms and illness during 10-12 days of follow-up (Study Question 2, Hypothesis 2, and Specific Aim 2).
There were 2 hypotheses that corresponded to the 2 phases of research. For phase 1, we hypothesized that among people who were exposed to beach sand the incidence proportion of illness would be higher than the incidence proportion of illness among people unexposed to beach sand (including among a subgroup of children ≤10 years of
age). For phase 2, we hypothesized that among people who were exposed to beach sand with higher levels of fecal indicator organisms, the incidence proportion of illness would be higher than the incidence proportion among people who were exposed to beach sand with lower levels of fecal indicator organisms (including among a subgroup of children
55
The primary outcomes were self-reports of physical symptoms and illness,
measured on a binary scale with 1 being YES to a physical symptom and 2 being NO to a physical symptom. Physical symptoms were grouped into 7 outcomes (illnesses). We considered the following: (1) GI illness, 2) Diarrhea, (3) Upper Respiratory illness (URI); (4) Rash; (5) Eye; (6) Earache; and 7) Infected cuts or wounds.
To exclude those with prevalent illness, participants who were ill in the 3 days prior to their beach visit were excluded for the outcome with which they were afflicted. For example, a subject who reported GI symptoms at baseline was excluded for the GI analysis, but was eligible for other outcomes. We also examined different definitions of GI illness, including diarrhea alone (three or more loose stools in a 24 hour period).
c. Data analysis (Specific Aims 1 & 2)
For phase 1 (Specific Aim 1) the hypothesis was that the incidence proportion of illness would be higher among those who were exposed to beach sand compared to the incidence proportion of illness among those unexposed to beach sand (including among a subgroup of children ≤10 years of age). For phase 2 (Specific Aim 2) the hypothesis was
that among people who were exposed to beach sand with higher levels of fecal indicator organisms, the incidence proportion of illness would be higher than the incidence proportion among people who were exposed to beach sand with lower levels of fecal indicator organisms (including among a subgroup of children ≤10 years of age).
Critical covariates for this analysis included age, sex, race/ethnicity, swimming, beach, contact with animals, contact with other persons with diarrhea, number of other visits to the beach, any other chronic illnesses (GI, skin, asthma), eating food while at the beach, eating raw or undercooked meat since the time of the beach interview, and eating
raw or undercooked eggs since the time of the beach interview. For respiratory and skin outcomes, the use of insect repellent and sun block were also considered. A variable was also created to control for a large festival that took place at Silver Beach, drawing over 17,000 visitors directly adjacent to the beach. A number of covariates, especially swimming status and beach site, were considered potential important effect measure modifiers of exposure-illness associations.
We considered using log-linear binomial regression to model the association between beach sand quality measures and health effects, however, a small sample size led to problems with model convergence. We therefore used logistic regression models to estimate the association between the incidence of illness and Enterococcus qPCR CCE/g,
Enterococcus Method 1600 CFU/g, Bacteroides qPCR CCE/g, and F+-specific coliphage PFU/g, respectively in sand: (1) among those who had contact with the sand; and (2) among all participants with those who did not have contact with sand as the reference category. Generalized linear models (GLM) using an identity link and a binomial error structure were also considered to estimate the attributable risk (contact with sand minus no-contact with sand), which we also will refer to as sand-associated illness. We considered that the robust variance estimates, or the “sandwich” estimator of variance, was used in all models to account for the non-independence resulting from the
household-cluster sample.
Incidence proportion ratios (IPR) were used to estimate the association between exposure data (i.e., self-reported contact with sand) and follow-up health outcome data (self-reported illness). At the start of analysis a 5% change-in-estimate of IPRs and backwards elimination approach was employed. Potential effect measure modifiers were
57
evaluated using stratification and the Breslow-Day test of homogeneity. All analyses were completed using SAS version 9 © (SAS Institute Inc., Cary, NC, USA), Stata version 9.2 © (StataCorp LP, College Station, TX, USA).
d. Statistical power calculation 1
This study involved hypothesis testing and the study size was fixed. We
estimated power. The first hypothesis was was that among people who were exposed to beach sand the incidence proportion of illness would be higher than the incidence proportion of illness among people unexposed to beach sand.
For this power calculation we assumed that alpha was 0.05 and that our study size, based on data from a preliminary study by Wade et al, (2006) and from unpublished data, was 20,436. We were trying to estimate beta. To examine statistical power for binary outcome variables, we used the formulae given by Clayton and Hills (1996), Friedman et al., (1998), and Selvin (1996). To simplify the power analysis, we considered the simple problem of comparing two levels of exposure (exposed and unexposed) with 8,975 participants in the exposed group (those digging in the sand) and 11,461 participants in the unexposed group (those not digging in the sand) as observed by Wade et al, (2006) in a preliminary study and from unpublished data. This power
calculation reflected phase one plans for binary exposure categories. We considered other exposure classification methods. In phase two of the proposed research, exposure was represented by a continuous index (e.g. CFU, qPCR CCE, or PFU per g of sand). In addition, we assumed that individuals are independent. We assumed two-sided tests and a Type I error rate (significance level) of 5%.
Considering a binary outcome variable, onset of symptoms and again using unpublished data data provided by Tim Wade (EPA), we defined “onset” for a given symptom as a “Yes” self-report on a binary scale. Among the seven illnesses and physical symptoms studied, incidence ranged from 7.4% for GI illness, to 0.4% for infected cuts, with an average incidence across all symptoms of 3.1%. For the purposes of power analysis, we considered the incidence distribution of each of the illnesses studied by Wade et al, (2006) and from unpublished data sets.
Table 1. Smallest Detectable Incidence Proportion Ratio Power Incidence of illness and symptoms
(from preliminary data) 80% 90%
7.4% (GI illness) 1.15 1.17
5.8% (Upper respiratory illness) 1.17 1.20
3.0% (Eye irritation) 1.25 1.29
2.7% (Rash) 1.26 1.31
1.5% (Earache) 1.37 1.45
0.6% (Urinary tract infection) 1.66 1.81
0.4% (Infected cut) 1.88 2.10
Table 1 shows the smallest detectable incidence proportion ratio (IPR) across the distribution of incidence for each illness, for 80% and 90% power. For GI illness the study has good power to detect an IPR of 1.17. In general, the study has good power to detect IPRs around 1.25.
e. Statistical power calculation 2
For this power calculation we focused on children as a sub-group (≤10 years of
age). The study involved hypothesis testing and the study size was fixed. We estimated power. The hypothesis was that, among a subgroup of children ≤10 years of age who are
59
exposed to beach sand, the incidence proportion of illness was higher compared to the incidence proportion among those who were not exposed to beach sand.
We assumed that alpha was 0.05, that our sub-group analysis study size was 4,712 based on unpublished data from Tim Wade and that we were estimating beta. The 4,712 individuals consisted of children (10 years of age or younger) from each of the five 2003- 2005 NEEAR water studies beach sites. To examine statistical power for binary outcome variables, we used the formulae given by Clayton and Hills (1996), Friedman et al
(1998), and Selvin (1996). To simplify the power analysis, we considered the simple problem of comparing two levels of exposure (exposed and unexposed) with 3,566 children in the exposed group (those digging in the sand) and 720 children in the unexposed group (those not digging in the sand) as observed by Wade et al., 2006 in a preliminary study and from unpublished data. This power calculation reflected phase 1 plans to consider binary exposure categories (other exposure classification methods were considered). In phase 2 of the proposed research, exposure was represented by a
continuous index (e.g., CFU or qPCR CCE Enterococcus concentration per g of sand). We assumed that individuals were independent. We assumed two-sided tests and a Type I error rate (significance level) of 5%.
Considering a binary outcome variable, onset of symptoms, and again using data from our preliminary study and unpublished data, we defined “onset” for a given
symptom as a YES self-report on a binary scale. Across the seven illnesses and physical symptoms, incidence ranged from 8.6% for GI illness, to 0.2% for infected cuts, with an average incidence across all symptoms of 3.6%.
Table 2. Smallest Detectable Incidence Proportion Ratio Power Incidence of illness and symptoms among
children (from preliminary data) 80% 90%
8.6% (GI illness) 1.56 1.70
8.2% (Upper respiratory illness) 1.58 1.73
3.2% (Rash) 2.40 3.20
2.2% (Eye irritation) 3.61 8.00
1.8.% (Earache) 6.00 N/A*
0.7% (Urinary tract infection) N/A* N/A*
0.2% (Infected cut) N/A* N/A*
* Incidence of illness is too low to calculate 90% power.
We excluded participants with illness at baseline (prevalent cases). For the purposes of power analysis, we considered the incidence distribution of each of the illnesses studied by Wade et al, (2006). Table 2 shows the smallest detectable cumulative incidence proportion ratio (IPR) across the distribution of incidence for each illness, for 80% and 90% power. For GI illness among the sub-group of children ≤10 years old the
study has good power to detect IPRs of around 1.56. Preliminary unpublished data and data from Wade et al, (2006) show that the study may not have good power to detect IPRs for illnesses with an incidence below 2.2%. In general, this power calculation demonstrated that the sub-group study had good power to detect IPRs around 1.7.