• No results found

Data Processing and Analysis Results

OPTICAL BIOPSY TECHNICAL DEVELOPMENT

CORE-CUT BIOPSY

9.2.4 Data Processing and Analysis Results

Da t a Nu m b e r s:

50 female patients were recruited into the study, all of which were undergoing breast surgery under general anaesthetic. There were a total of 147 data sets collected, which included most of the common types of normal and pathological breast tissue. The total numbers for each are presented in table 9.1.

Histological Type Number o f data sets

Normal breast tissue 86

Fibroadenoma 13

Other benign breast disease 19

Carcinoma-in-situ 6

Invasive carcinoma 23

Table 9.1 - Total numbers o f data sets fo r each tissue type

For the purpose of clarity of presentation the wide variety of histological subsets have been condensed into broad headings. Normal breast tissue was either fibro-fatty or glandular. Benign breast tissue incorporated a variety of tissue types ranging from sclerosing adenosis to atypical ductal hyperplasia. The breast cancer spectra incorporated both ductal and lobular types. No other, less common type of breast cancer was encountered. It is evident that the numbers of non-cancer data are greater than for both in-situ and invasive cancer. This simply reflects the bias towards obtaining benign or normal spectra from the methods of biopsy as outlined above. The cancer spectra were all obtained via the core-cut method which was used approximately 40% of the time (table 9.2).

Two spectra were obtained for each ‘optical biopsy’. These were compared to each other in order to ensure that there was reproducibility at each biopsy site. All these dual spectra were compared using a ‘Pearson Correlation’. No two spectra were

however found to be statistically different. An average was therefore obtained before input into the various analysis techniques outlined below.

Type o f biopsy Number Percentage

Tumour bed 64 43.5%

Core-cut 58 39.5%

‘Open’ biopsy for benign lesions 25 17%

Table 9 . 2 - Numbers fo r each type o f biopsy

D a t a P r o c e s s i n g :

For the pure computer based analysis methods (ANN and HCA), all spectra were first normalised to the same total value i.e. area under the curve. This eliminated variations due to optical coupling, transmission via the different optical probes and fibre or the number of light pulses emitted to obtain the spectrum. The normalized spectra were next divided into 21 wavelength bands of 20nm width between the spectral range of 330-750nm. From these bands, an average value (or intensity) was calculated. In addition to the averaged intensity values, a series of gradient values were also used. This was achieved by determining the slope of the curve between various set points. A total of 10 gradients were calculated which along with the intensity values, comprised all of the input parameters for the Artificial Intelligence analysis.

For the artificial neural network analysis, the training and testing data sets were split into two groups of 80% and 20% respectively. This was repeated five times to ensure that all spectra were tested. The output parameters were either malignant (positive) or non-malignant (negative). A commercially available neural net package was used (‘Brain Maker Pro’, California Scientific Software). Results are presented below.

Hierarchical cluster analysis relies on large data sets to develop testing algorithms so therefore to maximise the training data, a ‘leave-one-out’ method was employed i.e. all but one of the data sets were used for training and the final spectra then tested.

This was repeated for all spectra and the statistics determined from the sum of all the tests. The software used was also commercially available ( ‘VERT, Sandia National Laboratories). As mentioned in chapter 8, HCA differs from ANN in that more than 2 outputs can be achieved. Although it would have been possible to attempt to classify all different histological subtypes, this would require far more data than we have collected so far. It was therefore decided to simplify the system with only three possible outcomes for the test spectra i.e. malignant, non-malignant and unclassified. This third parameter simply means a spectrum that does not fall into a recognisable pattern based on the training set. Again, results are presented below.

The model-based analysis differed from the other two in that a certain amount of pre­ processing was employed. This is outlined in detail in section 8.4.3. In effect the spectra were standardised to the same value at 800nm and also the gradient between 650 and 800nm. Also they were corrected for levels of haemoglobin absorption and for the sentinel nodes, blue dye contamination. These allowed easier identification of the true differences due to elastic scattering rather than absorption or the set up of the system. Following this pre-processing, a significantly reduced set of input parameters were then easily assessed using a linear descriminant analysis software package (Systat 9.01 - SPSS Software). An 80/20 split was again employed for the training and testing.

An a l y s i s Re s u l t s:

In this study, the aim was to evaluate the ability of the OB system to discriminate between cancerous and non-cancerous tissue, with development of the subsequent analysis methods. Therefore for the purpose of testing the system, both in-situ and invasive cancer spectra were classified as ‘malignant’ and all other spectra classified ‘non-malignant’. Table 9.3 below illustrates the results of the in-vivo testing of breast tissue with separate figures given for each type of analysis. The percentage figures represent the ability to determine if a test spectrum was malignant or not i.e. a malignant spectrum was deemed positive, and a non-malignant spectrum, negative. In presenting the statistical results, sensitivity and specificity are presented in the standard way.

Sensitivity = TP / (TP + FN) Specificity = TN / (TN+ FP)

Where TP, TN, FP, FN represent true positives, true negatives, false positives and false negatives respectively, as determined by the corresponding histopathology.

ANN HCA MBA

Sensitivity 69% 67% 94%

Specificity 85% 79% 92%

% classified (HCA only) 91.5%

Table 9.3 - Results fo r classification o f spectra by different analysis techniques AN N - Artificial Neural Network; HCA - Hierarchical Cluster Analysis; MBA Model Based Analysis