Z-Score Normalization - Statistical Techniques

2.10 Statistical Techniques

2.10.1 Z-Score Normalization

The simplest DOS metric would be an un-normalized value of one of the -measured parameters,

e.g.,StO2. Another common metric is tissue-to-normal ratio-normalized dataStO2T /N, wherein a

measuredStO2value is divided by the averageStO2of a known normal tissue region. ThisStO2T /N

quantity improves upon the un-normalizedStO2by accounting for the inter-subject variability in the

healthy breast tissue also exhibits significant intra-subject heterogeneity in these quantities [99; 230; 56]. This variation is not accounted for in the tumor-to-normal ratio normalization technique. Thus,StO2T /N provides no indication of whether a value is within the expected range of the normal

tissue StO2 due to heterogeneity or whether the value is significantly different from the healthy

tissue. To resolve this issue, the normalization technique needs to account for both the mean and the standard deviation of the normal tissueStO2 (see Example 2 in Figure 2.14). A z-score

normalization scheme has been previously developed for this purpose with respect to differentiating malignant and healthy tissue [56; 55] (see Chapter 5).

Briefly, the natural logarithm of each data point is first taken because the log-data for each parameter was empirically determined to be more normally distributed,i.e., Gaussian, across healthy

tissue than the raw data [56]. Gaussian distributions, which are well-characterized by a mean and standard deviation, enable the use of z-scores, which are a measure of how many standard deviations away from the mean a given data point is [257]. These z-scores can provide a quantitative measure of how likely it is that a data point belongs to a given distribution; in this case, the z-scores measure whether a tumor physiological parameter is significantly different from the same quantity in healthy tissue.

To transform raw tumor data into z-score data, the mean and standard deviation of a normal (healthy) region of tissue is used as in Equation 3.1

Zj=

lnXj− hlnXjN ormi σ[lnXjN orm]

. (2.190)

Here, Xj is the un-normalized jth measured parameter in the tumor region, XjN orm is the un-

normalizedjth_{measured parameter in a normal (healthy) region of either the tumor-bearing breast}

or the contralateral breast,hlnXjN ormirepresents the mean over all points in the normal (healthy)

region, andσ[lnXjN orm] represents the standard deviation over all points in the normal (healthy)

region. Zj is then the tumor region z-score relative to the healthy tissue for thejthparameter.

Thus, every tumor data point is measured in units of standard deviations from the mean of a given parameter in healthy tissue. In addition to transforming all parameters to be approximately the same magnitude, this method better accounts for the inter-subject systemic variations by finding the difference of each parameter from the mean value of the normal (healthy) tissue. It also more fully accounts for intra-subject variation in healthy tissue by normalizing with the healthy tissue standard deviation. Finally, this techniques transforms the normal tissue measurements such that they more closely obey a normal Gaussian distribution, which improves statistical robustness [206; 136; 141]. A concrete example of the benefit of this statistical transformation scheme is

Figure 2.14: Normalization to Account for Inter- and Intra-Subject Heterogeneity. Example 1 - Here, Patient A and Patient B are indistinguishable if only the un-normalized StO2 tumor values

are considered. However, if tumor-to-normal ratio normalization is performed by dividing the mean tumor tissue value by the mean normal tissue value, it appears that Patient A’s tumor is hyperoxic relative to the normal tissue, while Patient B’s tumor is hypoxic. Thus, Tumor-to-Normal normalization was able to account for systemic inter-subject variation. Example 2 - Here, Patient A and Patient B are indistinguishable with either un-normalized or tumor-to-normal data. However, if the normal tissue standard deviation is included, it is clear that Patient B has much more heterogeneous normal tissue. Accounting for this variation, via a z-score, enables a more accurate measure of whether each tumor oxygen saturation falls within the expected range of the normal tissue. Thus, the z-score normalization can account for both inter- and intra-subject heterogeneity.

shown in Figure 2.15 for Early time-point tissue oxygen saturation in the ACRIN-6691 multi-site trial subject cohort (see Chapter 3).

Figure 2.15: Histograms of the Early Normal Tissue StO2. A) Fractional histograms of the un-

normalized StO2 of the normal tissue on the tumor-bearing breast at the Early time-point for

each subject. Each line represents a different subject. B) Fractional histograms of the z-score normalized log-transformed StO2 data of the normal tissue on the tumor-bearing breast at the

Early time-point for each subject. Each line represents a different subject. Note that with the z-score normalization, the distributions for all subjects have the same means and approximately Gaussian distributions; this effect is consistent across all measured parameters and time-points. Features that obey Gaussian distributions typically produce more statistically robust models.

In document Diffuse Optical Biomarkers Of Breast Cancer (Page 88-91)