Image and Data Analysis 1 Image processing

Materials & Methods

2.5 Image and Data Analysis 1 Image processing

Images from the confocal and fluorescence microscope were processed using FIJI. The image contrast was adjusted, if muscles in image were too bright, using Image; Adjust; Window/Level. Scale bars were added using Analyse; Tools; Scale Bar.

Whole third instar larva pictures were a combination of six images. The images were overlapped carefully in Microsoft PowerPoint and grouped together. The edges were filled with black bars to get the image in a rectangular shape. The scale bar was added in on one image via FIJI as explained previously.

2.5.2 Microarray Analysis

Microarray data used in this project was analysed by Helen White-Cooper using RMA normalization in R using the affy package. Normalized data was provided in an Excel file and was used for further analyses for this project.

2.5.3 Whole larvae Sequencing Analysis

The data analysis was done via Galaxy and GenePattern (Reich et al. 2006). The reads were mapped to the reference genome dm3 (UCSC) with Bowtie v1.1.1 (Langmead et al. 2009). The tuxedo suite v2.0.2 (Trapnell et al. 2012) was used to process the mapped reads through Cufflinks, Cuffmerge and Cuffdiff. The data from Cuffdiff was imported to excel and contained the significant changes in transcript expression, splicing, and promoter use. All programs used their default settings for analysis.

2.5.4 Larval carcasses Sequencing Analysis

The data analysis was done with self-made scripts using bash and awk (Aho et al 1978; Bourne 1978). Reads were mapped to reference genome r6.13 (FlyBase) with BDGP6.87 as GTF file using TopHat v2.1.1 (Trapnell,Pachter, and Salzberg 2009) with Bowtie v1.1.1 (Langmead et al.2009). The mapped reads were then processed with the tuxedo suite v2.2.1 (Trapnell et al. 2012) through Cufflinks, Cuffmerge, CuffQuant and Cuffdiff. The output of Cuffdiff was imported to excel for further analysis. All programs used their default settings for analysis.

2.5.5 Ovaries Sequencing Analysis

First round of small RNA sequencing

PiPipes (Han, Wang, Zamore, et al. 2015) was used to analyse the data. The small RNA-seq pipeline reports the abundance, length distribution, nucleotide composition and 5’-to-5’ distance of piRNAs assigned to genomic annotations, including individual piRNA clusters and transposon families.

The general workflow for analysing small RNA data via piPipes is as follows. All reads aligned to ribosomal RNA sequences were removed. The remaining reads were mapped to miRNA hairpin sequences for 5’- and 3’-end heterogeneities of mature miRNAs. The remaining reads were mapped to the reference genome dm3 (UCSC) via Bowtie 1.2.0 (Langmead et al. 2009). Reads were then separated to different genomic features by their coordinates. piPipes drew graphs of length distribution, nucleotide composition and ‘Ping-Pong’ amplification for each

65 genomic feature. The piRNA read coverage was normalized per 1 million mapped miRNA reads.

Second round of small RNA sequencing

For the second round of small RNA-seq data, self-made scripts using bash and awk were used to analyse the data (Aho et al 1978; Bourne 1978). The fastQ files were processed through the annotation pipeline provided by Julius Brennecke group. These data provided information about the quality, read coverage and normalization depth. The collapsed annotation fasta file was used for further analysis with self-made scripts and processed the following:

The fastx-toolkit v0.0.14 (http://hannonlab.cshl.edu/fastx_toolkit/) was used to convert the files to Fasta, remove artifacts, clip adapters and trim sequence lengths.

Bowtie v1.1.1 (Langmead et al. 2009) was used to map reads to different genomic features such as rRNA, miRNA, transposons, drosophila genome r6.13 (FlyBase), tRNAs, exons, introns, 5’ UTR and 3’ UTR. Samtools (Li, Handsaker, et al. 2009) and Bedtools v2.26.0 (http://quinlanlab.org) were used to convert the mapped data into a .bed file.

Reads mapping to transposons were counted by manipulating the .bed file from earlier analyses. In short, reads were grouped by length, strand type, transposon and count. Length distributions from different genomic features were produced by manipulating the .bed file with awk and bash. Both read mappings and length distributions were imported to Excel to calculate the normalized reads.

The .bed file from reads mapping uniquely to the Drosophila genome was used to produce BigWig files to load tracks on the UCSC genome browser (www.genome.ucsc.edu). These tracks contained the unique reads mapping across the genome.

Oxidized small RNA sequencing

The same script used for non-oxidized samples were used for the oxidized samples. The length distribution was calculated with reads mapping across all known transposons with a length of 23-29 nt. All data was imported into Excel for read normalization.

IP-sequencing

The same script was also used to analyse IP-sequencing data. Length distribution was calculated with reads mapping across all known transposons with a length of 23-29nt. All data was imported into Excel for read normalization.

Reads Normalization

Different normalization methods were used for the different small RNA sequencing data. Non-oxidized data was normalized to 1 million mapped miRNA reads, whereas oxidized and IP-seq data were normalized to 10 million uniquely mapped reads due to the absence of miRNAs.

2.5.6 FlyMine as a data analysis tool

Lists of differentially expressed genes from sequencing were uploaded on FlyMine (Lyne et al. 2007) to look at the gene expression in Drosophila, gene ontology and pathway enrichment. A selection of genes based on the data was further analysed by qRT-PCR for double confirmation.

FlyMine was also used for the genes with introns analysis. A python script, kindly provided by Rachel Lyne, was used to filter out genes with introns and provide information based on total intron length, number of introns, smallest intron and largest intron. Total intron length is the sum of all introns of all isoforms for each gene. The number of introns is the sum of all unique introns of all isoforms for each gene. The script was adjusted by Rachel Lyne to also provide the gene length (sum of all isoforms, including introns), number of transcripts, shortest transcript and longest transcript for each provided list. Microsoft Excel was used to calculate the median values from all the lists.

2.5.7 Statistical Analysis

The error bars from the mobility assay were the standard error of the mean. The statistical test performed for this assay was the student’s t-test. The graphs were generated either via Microsoft Excel or R. Statistical tests were performed in Microsoft Excel. Microsoft PowerPoint was used to add in the asterisks.

67 The median of total intron length and gene length of the down-regulated genes and up- regulated genes in the 1.5-, 2-, 4-, and 16-fold categories were compared with the non- differentially expressed genes with the Mann-Whitney test. P-value lower than 0.05 indicates statistically significant.

Chapter 3. Loss of Nxt1 causes muscle

In document Understanding the importance of RNA export factor, Nxt1, during the metamorphosis and in ovaries in Drosophila melanogaster (Page 72-77)

Image and Data Analysis 1 Image processing

Materials &amp; Methods

2.5 Image and Data Analysis 1 Image processing

Chapter 3.

Loss of Nxt1 causes muscle

Materials & Methods