• No results found

Using MATLAB: Bioinformatics Toolbox for Life Sciences

N/A
N/A
Protected

Academic year: 2021

Share "Using MATLAB: Bioinformatics Toolbox for Life Sciences"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Using MATLAB: Bioinformatics

Using MATLAB: Bioinformatics

Toolbox for Life Sciences

Toolbox for Life Sciences

MR. SARAWUT WONGPHAYAK

BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY,

KING MONGKUT’S UNIVERSITY OF TECHNOLOGY THOBURI

(2)

Goal of Presentation

„ To introduce you about the using and advantage of MATLAB and Bioinformatics Toolbox.

„ MATLAB and Bioinformatics Toolbox will be applied to the teaching, study and research in our Bioinformatics program.

(3)

Outline

„ Introduction to MATLAB and Bioinformatics Toolbox

„ Bioinformatics Toolbox’s function

„ Example of the study that used MATLAB and Bioinformatics Toolbox

(4)

What is MATLAB?

„ MATLAB short for “Matrix Laboratory”.

„ MATLAB is a tool for doing numerical computations with matrices and vectors.

„ It is very powerful and easy to use

„ integrates computation, visualization and programming

„ Can be used on almost all platforms:

(5)

„ Teaching – Bioinformatics graduate and undergraduate courses

„ MIT, Harvard, Stanford, Cornell, Carnegie Mellon, … „ Research -- recent papers use MATLAB for:

„ Sequencing

„ Base calling algorithm design

„ Microarray analysis

„ Statistical modeling of microarrays, image analysis

„ Proteomics

„ Mass spectrometry data classification

„ Systems Biology

„ Flux Analysis, Simulation of Metabolic Pathways,

Interaction Network Identification

Robert Henson:The MathWorks,Inc.

MATLAB is widely used in academic

bioinformatics applications

(6)

Bioinformatics Toolbox

„ The toolbox provides access to

„ genomic and proteomic data formats,

„ analysis techniques, and specialized visualizations for genomic and proteomic sequence

„ microarray analysis.

„ Most functions are implemented in the open

MATLAB® language, enabling you to customize the algorithms or develop your own.

(7)

Bioinformatics Toolbox

For more information on related products, visit www.mathworks.com/products/bioinfo

M

ATLAB ® Statistics Toolboxes Bioinformatics Database Image Processing Neural Network Optimization

Signal Processing Required Products Related Products

(8)

Outline

„ Introduction to MATLAB and Bioinformatics Toolbox

„ Bioinformatics Toolbox’s function

„ Example of the study that used MATLAB and Bioinformatics Toolbox

(9)

Key Features

„ Support for genomic, proteomic, and gene expression file formats

„ Internet database access

„ Sequence Analysis

„ Microarray Analysis and visualization

„ Phylogenetic Analysis

„ Mass Spectrometry Preprocessing and Visualization

(10)

File Formats and Database Access

„ Sequence data:

„ FASTA, PDB, and SCF

„ Microarray data:

„ Affymetrix DAT, EXP, CEL, CHP, and CDF files, SPOT format

data, ImaGene results format data, and GenePix GPR and GAL

files

„ Directly interface with major Web-based databases

„ Supports other industry-specific file formats

„ Microsoft Excel

(11)

Key Features

„ Support for genomic, proteomic, and gene expression file formats

„ Internet database access

„ Sequence Analysis

„ Microarray Analysis and visualization

„ Phylogenetic Analysis

„ Mass Spectrometry Preprocessing and Visualization

(12)

Sequence Analysis

„ The Bioinformatics Toolbox provides several

MATLAB based sequence alignment functions, as well as graphical tools for viewing sequence

alignment results.

„ Sequence Utilities and Statistics „ Protein Feature Analysis

„ Sequence Tool (GUI) „ Sequence Alignment

(13)

Sequence Utilities and Statistics

„ You can manipulate and analyze your sequences to gain a deeper understanding of your data.

„ Bioinformatics Toolbox routines let you:

„ Convert DNA or RNA sequences to amino acid sequences using the genetic code

„ Perform statistical analysis on the sequences and search for specific patterns within a sequence

„ Apply restriction enzymes and proteases to perform

in-silico digestion of sequences or create random

sequences for test cases

(14)

Example

:

Sequence Statistics

>> mitochondria = getgenbank('NC_001807','SequenceOnly',true);

>> ntdensity(mitochondria)

>> basecount(mitochondria,'chart','pie');

(15)

Example

:

Sequence Statistics

>> codoncount(mitochondria) for frame = 1:3 figure('color',[1 1 1]) subplot(2,1,1); codoncount(mitochondria,'frame',frame,'figure',true); title(sprintf('Codons for frame %d',frame));

subplot(2,1,2);

codoncount(mitochondria,'reverse',true,'frame',frame,'figure',true); title(sprintf('Codons for reverse frame %d',frame));

(16)

Protein Feature Analysis

„ Calculate properties of a peptide sequence

„ Determine the amino acid composition of protein sequences

>> aacount(ND2AASeq, 'chart','bar') >> atomiccomp(ND2AASeq)

ans = C: 1818 H: 3574 N: 420 O: 817 S: 25 ans = 3.8960e+004 >> molweight (ND2AASeq)

(17)

Sequence Tool

(18)

Sequence Alignment

„ The Bioinformatics Toolbox offers a comprehensive list of analysis methods for performing pairwise sequence and sequence profile alignment.

„ These analysis methods include:

„ MATLAB implementations of standard algorithms for local and global sequence alignment, such as the Needleman

-Wunsch, Smith-Waterman, and profile-hidden Markov model algorithms

„ Graphical representations of alignment results matrices

„ Standard scoring matrices, such as the PAM and BLOSUM families of matrices

(19)

Example: Sequence Alignment

Globally align the two amino acid sequences, using the Needleman-Wunsch algorithm.

Locally align the two amino acid sequences using a Smith-Waterman algorithm.

>> [LocalScore, LocalAlignment] = swalign(humanProtein,... mouseProtein)

>> showalignment(LocalAlignment) >> [Score, Alignment] = nwalign(humanProteinORF, mouseProteinORF);

(20)

Key Features

„ Support for genomic, proteomic, and gene expression file formats

„ Internet database access

„ Sequence Analysis

„ Microarray Analysis and visualization

„ Phylogenetic Analysis

„ Mass Spectrometry Preprocessing and Visualization

(21)

Microarray Normalization

„ The Bioinformatics Toolbox provides several methods for normalizing microarray data,

„ lowess, global mean, and median absolute deviation (MAD) normalization.

„ Filtering functions let you clean raw data before running analysis and visualization routines.

(22)

Data Visualization

„ Together, the Bioinformatics Toolbox, the Statistics Toolbox, and MATLAB provide an integrated set of visualization

tools.

>> maloglog >> maimage >> maboxplot

(23)

Data Visualization

>> mairplot >> clustergram >> kmeans >> princomp >> cluster

(24)

Key Features

„ Support for genomic, proteomic, and gene expression file formats

„ Internet database access

„ Sequence Analysis

„ Microarray Analysis and visualization

„ Phylogenetic Analysis

„ Mass Spectrometry Preprocessing and

(25)

Phylogenetic Analysis

„ The Bioinformatics Toolbox enables you to create and edit phylogenetic trees.

„ You can calculate pairwise distances between aligned or unaligned nucleotide or amino acid sequences using a broad range of similarity metrics, such as

„ Jukes-Cantor, „ p-distance,

„ alignment-score, or

„ a user-defined distance method.

(26)

Phylogenetic Analysis

„ Through the graphical user interface (GUI), you can prune, reorder, and rename branches;

explore distances; and read or write Newick -formatted files.

(27)

Mass Spectrometry Data Analysis

„ The mass spectrometry functions are designed for preprocessing and classification of raw data from SELDI-TOF and MALDI-TOF spectrometers.

„ Reading raw data into MATLAB „ Preprocessing raw data

„ Spectrum analysis

(28)

Outline

„ Introduction to MATLAB and Bioinformatics Toolbox

„ Bioinformatics Toolbox’s function

„ Example of the study that used MATLAB and

(29)

„ THE CHALLENGE

„ To accurately predict the clinical outcome for

breast cancer patients

„ THE SOLUTION

„ Use MathWorks products to develop a tool that lets clinicians make a

prognosis based on the gene expression profile of the patient’s primary tumor

„ THE RESULTS

„ Accurate prediction of disease outcome

„ Fast, effective response to scientists’ needs

„ Flexibility to adjust algorithms whenever necessary

(30)
(31)

Enable you to develop your own

functions

(32)

Summary

„ The Bioinformatics Toolbox appropriates to used in life sciences study

„ Sequence Analysis

„ Microarray Analysis and visualization „ Phylogenetic Analysis

„ Mass Spectrometry Preprocessing and Visualization

(33)

Thank you for your attention

„

ACKNOWLEDGEMENT

References

Related documents

Activity presented as standardised uptake values (SUV, mean ± SEM), corrected for animal body weight and time of injection. B) Representative images of radiotracer uptake in

The said estimated model equation shows the existence of positive significant relationship between money market instruments (Treasury Bills, Commercial Papers and

• Primerica Life Insurance Company of Canada Seg Fund Course (restricted – qualifies licensees to sell Primerica Seg Funds Only) Students who are enrolled in the CAIFA Seg Fund

You can connect to your printer directly from your computer using USB, or connect over a network using an Ethernet cable or wireless connection.. Hardware and cabling requirements

Formal energy policy but no active commitment from top management Energy manager accountable to energy committee representing all users, chaired by a member of the managing board

Subsections of Section 3 present the baseline model from Freitas & Serrano (2015), our proposal to endogenize (the growth of) autonomous demand and

The design and manufacturing perspectives of the compressor disc have been formalised and all concepts, pertinent to the understanding in Figure 6, have been specialised from the

(If for example the NMOS turns off ∆ t seconds earlier than the PMOS, the output voltage tends to track the input for the remaining ∆ t seconds, giving rise to distortion in