• No results found

MICROARRAY IMAGE SIMULATION METROD

3.3 M icroarray S im u lation Process

W hen evaluating the performance o f the segm entation algorithms, it is hard to discover the exact gene expression levels or pixel level intensities. Hence, we utilized the simulation m odule to obtain different qualities o f m icroarray images. The “ground true information” for each simulated image was pre-assigned before analysis. In order to obtain a com prehensive understanding o f different algorithm s' performance, we analyzed the simulated image based on the gene expression level in order to observe the expression sensitivity o f different algorithms will be directly analyzed. In our research, we mainly focused on high quality resolution simulation since now adays every high density oligonucleotide microarray is produced in an advanced experimental environment. Bubble, scratch and other layout disadvantages are rarely generated.

Therefore, we could control the gene signal values and intensity values to implement the simulation for each microarray image group. All the simulated images for each quality microarray group will share the same signal and intensity inputs. Next, these images were analyzed separately with both SBC and G C O S to generate two sets of intensity values and used the built in G COS gene signal o f the M A S5 algorithm to

4 2

compute the corresponding signal values from those two sets of intensity values. In this way. we could com pare the gene expression results between these two methods. This feature was more desirable than just to validate the intensity accuracy because the gene expression level has a m ore influential m eaning in practice than the intensities have.

W hen the simulation step was completed, a simulated microarray image TIFF file was generated and written into a specific DAT file. Next, the DAT file and TIFF file were imported into both SBC and G COS in order to obtain tw o sets o f intensity and signal values. Therefore, not only was the intensity accuracy evaluated, but also the signal accuracy was validated using both SBC and G C O S m ethods in the evaluation process. This whole process is illustrated in Figure 3.3.

(itos (it <IS SBC vet: 12 ( A f f y m e t r i x I n t e n s i t y ) * . C C 1 3 ( S B C I n t e n s i t y ) M A S S V C H P 2 { A f f y m e t r i x S i g n a l ) M i c r o j r r a y I m a g e S i m u l a t e d M i c r o a r r o y I m a g e V C L L 1 ( T r u e I n t e n s i t y ) V C H P 1 T r u e S i g n a l ) ' . C H P 3 ( A f f y m e t r i x S i g n a l )

43

1. Write simulated microarrav image into DAT file.

In what follows, we defined each step o f the process, after obtaining the simulated microarray images, the Affymetrix G COS will analyze the images. However, the Affymetrix G C O S software cannot analyze the m icroarray image type file (JPG, TIFF and so on) directly. In order to allow the Affym etrix G C O S to analyze the simulated image, we needed to rewrite the simulated microarray im age into the *.DAT file. The *.DAT file contains the 16-bit grey level image pixels data, header information, layout information and so on. It is a special data format designed by Affymetrix. The first section in DAT file is the header information stored in DAT file. The pixel data matrix is stored as 16-bit unsigned integer value at byte 512 following the header. The new simulated microarray image has everything as the original im age except has for the image pixels data. Thus, we extracted the new simulated pixels data and wrote them into the original DAT file and remained any other layout inform ation the same. In this case, w'e got a new DAT file corresponded to the new simulated microarray image.

2. Analyze the new DAT file by GCOS.

W hen the simulated image was written into the specific DAT file format, the new DAT file was im ported into GCOS. G C O S automatically analyzed and generated the CEL2 file and CH P2 file, wfiich presented the intensity and expression information for that specific DAT file. In such a way, one set o f intensity values and signal values from GCOS was obtained.

3. Analyze the new simulated image by SBC.

SBC is the segmentation based contour algorithm im plem ented in JAVA platform. The input o f SBC is a batch file, which contains both image and grid location information

4 4

together. N ew simulated image and its corresponding grid information are written into this batch file and imported into SBC. The intensity results were saved as the output in a TXT file, which contains four columns. They were the num bers o f pixels assigned as the foreground, foreground/background mean, cell intensity and cell background median. In addition, the SBC generated a segmented image show n in Figure 3.4 with the boundary colored in white for each cell.

.b bWb ii'ai b b b.T ■ B£Wry#jis a ^ la ■ f* ■ i

Mm

i\»m

iVfe?; il ■ P i ■ "v > ■ ■ ■ ■ k.j b : i HJi

m m mm m mm *,■ r L • 1 tm m wm w ■ ■ ■ m m h m * * - -w% a m * .•« ,' ■ ■ mm • u mm mm *, ■ k | b

st m

B 'B j tf B .s te a a ■ ■ ■ wm e

m

■ b ^ a jT b * i £ b a a it w ; £ i i

mtfn.ni-Sm

B ’a ' ^ a q u . ' - _-w_r=-..._.---' ■ - ^ B a L'H]BBa«»iBL 1 flT* B P

?p

B yWff ■ ^ ' / • c-bb' NUB _ k b bSEHbM1 ' i f l B B I ^ B B B B B n ' B 1 1 ^ 1 1 1 1 ' B B e iJ w 'tl B B S f Z * tiB 'B V £ .5 lB B B B B * fl'fl 2 ' I B StlrJ.B B B B B B 21 B iA tf'S j p BL'krm m ■iiv.'ai *♦, «k.\ fc.w.?* B B B B B £ B ' M ^ . f B » ‘: 2 B '« B ^ B B " i " ( C « f l I > > * ■ b£ bh ■hjgjJWTff.c a a B ^ B ^ I « r r > : - _ B v t B a a S M ^ f m S f ^ e i i m a t f a £ r . ' B ^ i ' w B j a 1* a b n 'B J e T iB b b *!*£*■

a m

a .m 'tf a ' * sk ByfET1

*s

b ; o b P s m * b b b b i k b V v j b . ^ b ^ - S I b 'b ^ b i ^ b '

Figure 3.4 Segmented image by SBC

4. The SBC intensity output is written into CEL file to get the corresponding gene expression signal values.

SBC algorithm will generate intensity output in T X T file after image segmentation. Since our objective was to detect the signal expression sensitivity for each segmentation algorithm, we needed to calculate the gene expression values by MAS5 based on the intensity values obtained from SBC. However, the G C O S requires special CEL file format as the input to MAS5 algorithm. In another words, w e needed to write

45

the SBC intensity T X T file output into this specified CEL file. The C E L file format was an ASCII text file, which was divided into m any sections begin w ith a section name defined by a line. It contained the information for each p robe cell. It included the layout coordinates o f each cell, the intensity value for each cell, the num ber o f pixels included for each cell and the standard deviation o f each cell, etc.

Therefore. CEL3 was generated from the intensity values obtained from SBC. Next, we used M AS5 to analyze these intensity values stored in CEL3 file to get the corresponding gene expression signal values stored in CHP3 file.

We started with the picture that we want to segm ent and using the segmentation method, em bedded into the G COS, we created a CELL1 file, which stores the results of the intensity calculations on the pixel values o f the DAT file. Using the data acquisition module in the G C O S, we extracted the gene values, the so called “true-values” in CHP1 file. On the CELL1 file, we applied the algorithm, and w e generated a new picture. This picture was segm ented using the segmentation module in the G C O S and generated CELL2 file. It was segmented using the SBC m ethod and generated CELL3 file. To eliminate any bias in our analysis, we processed both C E L L 2 and CELL3 files with the data acquisition module inside the G C O S and obtained gene values in CH P2 file from Affymetrix and gene values in CHP3 file from SBC. These tw o sets o f gene signal values were com pared with the "true values” and statistical analysis perform ed to determine which segmentation m ethod performs better.

Hence, after modifying the original simulation algorithm, we were able to generate Affymetrix type o f images by using the obtained intensity value, the so-called "true values.” Previous studies in the literature did not have such inform ation; therefore.

4 6

the statistical analysis that could be done was rather limited. A fter applying our modification to the algorithm, we were able to generate any num ber o f pictures we needed for an experiment. To these pictures w e could apply and any segmentation method in existence, and we could analyze which segm entation m ethod offered values closer to the "true values.” This simulation method also enables us to identify weak or strong points within each segmentation method. In our research, we only com pared the performance o f two methods: the SBC and the G C O S m ethods since G C O S was the currently known m ethod used in Affymetrix microarray analysis.

Related documents