GENERAL STATISTICAL ANALYSIS - MATERIALS AND METHODS

MATERIALS AND METHODS

2.14 GENERAL STATISTICAL ANALYSIS

The following section describes the methods employed for the case-control study of young males with CHD, which is the main component of this study. Additional sta tistical methods, when employed, will be described in the appropriate section.

2.14.1 Data Management

The results were stored on the Australian National U niversity’s previous VAX "main frame" computer for statistical analysis using the Statistical Package for Social Sciences programme. Subsequently, these programmes where transferred onto a Sun Computer and accessed through a Sun Workstation.

Consultation with an applied statistician (Dr Owen Dent, Australian National Uni versity) was undertaken for confirmation of the appropriate planning of data collec tion, design of the research proposal, and methods for analysis of results.

2.14.2 Study Groups

The main study is a case-control study with 48 control and 53 case subjects unless otherwise stated. In the other associated studies undertaken, the number of subjects evaluated will be defined in the particular section.

2.14.2.1 Patients. Males with proven CAD by coronary angiography who were aged less than 50 years at the time of diagnosis and most, but not all, were incident cases.

2.14.2.2 Controls. Males from a population control group who were age, race, socioeconomically and community matched to the study group.

2.14.2.3 Selection of the case-control study groups. The study groups were selected from specific populations and the method of selection is described below. For the preparatory studies, the description of the study groups will be provided within the specific section and appendix. The limitations o f case-control studies are reviewed in Chapter 4. In addition, the problems associated with angiographic stud ies are discussed in Chapters 8 and 9.

The same exclusion criteria, as described in Chapter 3, were applied to the case and control groups. The admission criteria fo r the case group are described in section 42.4.

1. Selection of cases. The case group were selected because they had presented with CHD and not because they had undergone cardiac catheterisation. All case subjects, however, had undergone coronary angiography because of clinical indications or because the supervising cardiologist considered their young age at presentation to be an indication.

The case group had clinically stable CHD for approximately 3 months. Exclusion criteria included diabetes m ellitus; a stroke, coronary artery bypass grafting (CABG), percutaneous transluminal coronary angioplasty (PTCA) or an acute myocardial infarction (AMI) within 3 months; any significant medical illness, liver and renal disease; those unable to cease aspirin or non-steroidal anti-inflammatory drugs; and any recent surgery or acute illness, in particular, infective or inflamma tory disorders.

2. Selection of Controls. The control group were selected by the case group. The case group subjects were requested to ask a friend or work colleague to volunteer for the control group. Except for 4 subjects, all the controls were recruited in this way. The former 4 control group subjects were selected from hospital per sonnel in the same age and socioeconomic groups as their case group subject who were unable to ascertain a control case. Of these, 3 controls were naive to the labo ratory and one was familiar with the laboratory.

Even though the control subjects were personal friends or co-workers of the subjects in the case group, they were required to fit into specific selection and exclusion cri teria. Those in the control group were required to have had no cardiac symptoms, normal resting supine ECGs and a normal maximal exercise treadmill test and exer cise 12 lead ECG. The exercise test was performed after the control subjects volun teered and a normal result was a selection requirement. They were not selected on the presence or absence of CHD risk factors, although the possibility of a self-selec tion bias exists.

2.14.3 Variables

Information about each patient and control were documented as outlined (Tables 2.1 and 2.2).

2.14.4 Descriptive Statistics for the Measured Variables

A descriptive assessment of variables was obtained, including histograms plotted for visual inspection of the shape of the distribution, the variability (spread) and to look for unusual outlying measures. Normally distributed data was described by the mean and standard deviation (and variance) for the relevant variables of each group. Further measures of central tendency, namely the medium, mode skewness and kurtosis, were determined for variables which may have skewed distributions. This allowed an assessment to be made of the distribution for the variables prior to the univariate, bivariate and multivariate analysis. Those variables measured on a con tinuous scale were described by:-

1. Histogram (shape and spread of the distribution) 2. Mean and median as measures of central tendency 3. Standard deviation and variance

4. Skewness and kurtosis

5. 95% Confidence limits (indicating the limits for the mean of the population from which the sample was selected).

The above statistics were obtained for all variables in each study group when appropriate, using the Statistical Package for Social Sciences (SPSSX) program.

2.14.5 Measures of Association

Measures of association between variables across groups and between variables within groups were undertaken by the appropriate correlation and regression meth ods (following the appropriate transformation if the distribution was non-normal and the transformation gave a normal distribution) for continuous variables.

2.14.6 Univariate Analysis

2.14.6.1 Comparisons. The difference of the means between two groups that have a normal distribution was assessed with the Student t-test and paired t-test were appropriate. Testing the differences between the means of the same variable for more than two groups (normal distribution) was by the analysis of variance (ANOVA) programme on the SPSSX.

2.14.6.2 Variable Distributions. If the variables were not normally distributed, then pretest transformation of continuous variables was performed to obtain a normal distribution. If such transformation was unable to change the distribution into a Gaussian distribution, then distribution-free methods to test for differences between means of groups was used. This later choice was not required. The variables with a skewed distribution were all skewed to the left and were transformed to a normal distribution logarithmically. The variables which had a distribution skewed to the left were the LT50 of ADP, adrenaline and collagen, the rate of aggregation with ADP, adrenaline and collagen, the lag phase before the shape change with collagen induced aggregation (LAG), WCC, triglycerides, TXB2 produced in clotted whole blood and beta-thromboglobulin. When they were logarithmically transformed their frequency distribution approximated a normal curve.

2.14.6.3 Categorised Data. The Chi-square test for testing of proportions and association of categorised data within groups was used when appropriate.

2.14.6.4 Regression to the Mean. Regression to the mean refers to the phenome non that an extreme variable on its first measurement will tend to be closer to the centre of the distribution on later measurements550. The possible influence of this potential circumstance is more fully discussed in section 9.4.3 when within person variability is reviewed as a potential limiting factor for the study.

2.14.7 Bivariate Analysis

2.14.7.1 Linear Associations. Bivariate scatter diagrams are examined for evi dence of nonlinear relationships. Pearsons product moment correlation coefficients were reported as the measure of association between variables if there was a linear relation.

2.14.7.2 Comparisons of means. Analysis of variance was used to examine the distinct groups which were qualitatively different with respect to the dependent variable. It provides a method of simultaneous comparison of means in order to decide if a relationship exists between variable(s) within a group and across the separate groups.

2.14.7.3 Variable Distribution. Distribution-free methods can be used for com parison of variables which do not have a normal distribution. More commonly, in this study, linear bivariate regression analysis with appropriate transformations for continuous variables was performed. Logarithmic transformation of the continuous variables resulted in normalisation of the values for the variables indicated in section 2.14.6.2.

2.14.8 Multivariate Analysis

2.14.8.1 Background. The simplest approach to evaluate the relationship of a characteristic to the occurrence of future disease is to classify individuals in regard to the characteristic, and subsequently compare classes by their incidence rates of disease in a prospective study.162 An adequate, although less satisfactory alternative is to evaluate the relationship(s) in a cross-sectional case-control study, which also may involve classification of some of the continuous variables to be examined into discontinuous intervals. Unfortunately problems arise with classification because most biological characteristics are continuous variables, and classifying into discon tinuous intervals results in loss of information and implies that associated risk is also discontinuous. More importantly, several characteristics can be simultaneously involved in influencing the development of disease.

An alternative to the classification of variables to examine for any relationship bet ween a characteristic and disease, is to apply appropriate mathematical models. Re gression analysis is one such model, and examines the predictive relationship bet ween continuous numerical variables. The variable considered to be influenced by other factors is the dependent variable, and those potentially exerting that influence are the independent variables. The identification of the appropriate independent

variables is required in advance, along with the dependent variable, so that the number of potential independent variables are known before the analysis. The inde pendent variable, within each group, is free to take on different values, and is also numerical.

In this study, a correlation matrix to assess dependent-independent and dependent- dependent relationships was examined. The number of variables to be examined was limited to those considered essential and with substantial preexisting justification for examination. This restriction is necessary in order to decrease the likelihood of a type I statistical error, and was also preferred for the regression analysis.

2.14.8.2 Regression analysis. Selected clinical and laboratory variables which were significantly correlated with a predefined dependent variable were used in multivariate linear regression analysis. The application of multivariable linear re gression analysis allows the determination for each variable an estimate of its inde pendent contribution to the variance of the dependent variable. However, regression analysis does not deal with the problem of simultaneous influence of other variables that may be interrelated.

The logistic regression model can be used to yield a probability of the risk which groups have in relation to the independent variable, knowing their dependent varia bles. This estimated probability has no greater discriminating power than multiple linear regression, but has the advantage of easier interpretation.

2.14.8.3 Definition. Multivariate linear regression analysis will be used to identify subsets of independent variables that are most useful for predicting the dependent variable. This is achieved by developing an equation that summarises the relationship between a dependent variable and a set of independent variables1177,1178.

2.14.8.4 Procedure for selecting the independent variables to enter the regres sion model. The procedure for choosing the independent variables will be by step wise forward and backward selection. First the variable will be chosen by forward selection. The variable which is considered for entry into the statistical model is that with the largest positive or negative correlation with the dependent variable. If the criteria for selection are present, the second variable is then selected based on the highest partial correlation. The first variable is also examined to see whether it should be removed according to the removal criterion. After each step, variables already in the equation are examined for removal.

The entry criteria used will be a p value equal to or less than 0.05 for the correlation coefficient. The removal criterion will be a minimum F value of 2.71 in order to remain in the model, and a maximum probability of F being 0.10 for the variable. Variable selection will end when no more variables meet the entry and removal criterion.

2.14.8.5 Summary statistics. The summary statistics reported will include the partial regression coefficient (B), the standard error of the regression coefficient (SE B), the standardised regression coefficient (Beta) and the coefficient of determination

ay-

The partial regression coefficients (B) is the coefficient for a particular variable after adjustment for other independent variables in the equation. This coefficient does not give the relative magnitude or a measure of relative importance of the association of the independent variable with the dependent variable after adjustment, unless the units of all the variables entered into the regression equation are the same.

The coefficient of determination (R ) gives a measure of the goodness of fit for a sq

variable in a linear regression model. The statistic is the square of the correlation coefficient between the observed value of the dependent variable and the predicted value of that value determined from the line of best fit from the scattergram. If all the observations fall on the regression line, Rscj= l. If there is no linear relationship between the dependent and independent variables, Rsq=0, that is, there is no linear association. Partitioning the sum of squares of the dependent variable gives the proportion of the variation in the dependent variable "explained" by the model and the independent variable.

2.15 ETHICAL CONSIDERATIONS IN HUMAN EXPERIMENTATION

In document Abnormal platelet reactivity as a risk factor for premature coronary heart disease in males (Page 63-68)