The Breast Cancer Data Sets - Computer-Aided, Multi-Modal, and Compression Diffuse Optical St

7.2 Methods

7.2.1 The Breast Cancer Data Sets

The hand-held techniques described in Section 2.3.2do not use (or give) spatial information about the lesion, beyond the probe placement. As cancers are known to be heterogeneous252, 253 _{in gene expression}

and metabolism254_{, this approach has obvious limitations. Furthermore, in our discussion of lesion mean}

optical properties, we have so far ignored how the lesion was localized and segmented from healthy tissue in the first place. Tomographic reconstructions for each of several chromophores in 3D present multi- parameter signatures of malignant lesions that are generally difficult to fully grasp. Additionally, DOT reconstructions sometimes have image artifacts, especially when only a single chromophore is visualized and/or when lesions are near instrument boundaries.

Our data set consists of 3D tomograms of total hemoglobin concentration (Hbt), blood oxygen saturation

(StO2), and reduced scattering coefficient (µ0

s) in 35 biopsy-confirmed cancer-bearing breasts. DOT images

from these subjects were collected with a parallel plate optical imaging system described in previous work31.

This present data sample is somewhat smaller than that reported by Choeet al.31(total: 51) as we excluded

subjects with multiple (1) or benign (10) lesions. Additionally, a few subjects with very little healthy tissue or large reconstruction artifacts in the optical field of view were excluded from both test and training sets (5). Table7.1contains the demographics and clinical diagnosis of the population used in this analysis.

The cancers in this analysis had an average volume of6.7_±5.2cm3_{, corresponding to}₈₄₁

±656image voxels. The average size of the entire breast was374_±231cm3_{, corresponding to}₄_.₇

×104

# Diagnosis Age BMI Tumor Size [yrs] [kg/m2] [ cm3]

8 IDC 44_±11 27_±6.2 2.9_±1.2

2 DCIS 60_±4.9 29_±6.6 0.7_±0.28

2 ILC 62_±3.5 22_±2 1.4_±0.35

22 IDC & DCIS 49_±10 28_±7 1.8_±0.97

1 DCIS & LCIS 39_±0 19_±0 5_±0

35 All 49_±11 27_±6.5 2.1_±1.2

Table 7.1: Demographic breakdown of cancers in this study. IDC: Invasive Ductal Carcinoma; DCIS: Ductal CarcinomaIn Situ; ILC: Invasive Lobular Carcinoma; LCIS: Lobular CarcinomaIn Situ; BMI: Body Mass Index. Numeric data are given as mean_±standard deviation. 16 subjects were pre-menopausal and 19 were post-menopausal. 10 20 30 40 50 60 0 50 Voxels [%] Hb_t [µM] 50 60 70 80 90 0 50 Voxels [%] Sat. [%] 5 10 15 20 25 0 20 Voxels [%] µ_s‘ [1/cm] −5 0 5 0 50 Voxels [%] zHb_t −5 0 5 0 50 Voxels [%] zSat. −5 0 5 0 10 20 Voxels [%] zµ_s‘

Figure 7.2: Intra-subject data normalization brings inter-subject data distributions close to a normal distribution. The top row shows, for the full population, absolute values of Hbt, StO2, andµ0s[785 nm]; the

bottom row shows the population distribution of Z-transformed variables after intra-subject normalization; see Eqn.7.1and note that each subject is normalized individually. Each trace represents the healthy region of one subject. For clarity of presentation, the vertical axis is normalized to the total number of voxels in each subject. Section7.5shows similar plots for Hb and HbO2.

image voxels (mean_±standard deviation). Note, for each parameter, traditional regional averaging analysis of this data, as described above, reduces these_∼5_×104_{data points per subject to two numbers (cancer and} healthy region averages). Figure7.1shows sample intra-subject spatial heterogeneity of these regions and Figure7.2plots the distribution of Hbt, StO2, andµ0s[785nm] for the healthy regions of all subjects.

We demonstrate our new statistical analysis method with a leave-one-out cross-validation (e.g. as described by Hastieet al.266_{), in which 34 of our 35 subjects serve as the training set and the remaining subject}

provides the test data. Permuting these sets, such that each subject serves as the test set once, provides 35 training/test data combinations and enables an estimation of classification accuracy. Note,Gold-Standard

segmentation of the DOT images into tumor and healthy regions is required for the training set and is required for the test set classification validation (i.e., to assess how well the classifier performed compared to the gold-standard).

Figure 7.3: Example of masks applied to segment breast tissue for CAD; a slice through the center of the tumor is shown. The background matching fluid (blue) is first segmented from the remainder of the tissue and the chest wall removed from the data set (the chest wall is not shown in this figure). The remaining data is the breast tissue (cyan). This is further segmented into the tumor region (red). The healthy region (yellow) is defined as that breast tissue 2 cm away from the lateral sides of the image and from the tumor; voxels within 1 cm of the source and detector planes were also excluded.

spatial localization of the cancers; a full description of the procedure utilized to identify cancer regions is given by Choeet al.31_{. Briefly, a traditional clinical imaging method, typically MRI, was used to approxi-}

mately locate each tumor. We then selected nearby regions of high optical contrast as the starting point for a region-growing algorithm to identify the spatial extent of the tumor. A 2 cm border region about the tumor (e.g. Figure7.3) and voxels within 1 cm of the source and detector planes were excluded from the training data; the remainder of the breast is defined as healthy tissue. We exclude these boundary regions to reduce the effect of physiological changes near the tumor, errors in tumor positioning, and optode artifacts. In the training set, we assume perfect segmentation into malignant and healthy tissue. In the test set, gold standard segmentation is used only to determine the accuracy of our malignancy prediction.

In document Computer-Aided, Multi-Modal, and Compression Diffuse Optical Studies of Breast Tissue (Page 148-150)