Using MATLAB: Bioinformatics
Using MATLAB: Bioinformatics
Toolbox for Life Sciences
Toolbox for Life Sciences
MR. SARAWUT WONGPHAYAK
BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY,
KING MONGKUT’S UNIVERSITY OF TECHNOLOGY THOBURI
Goal of Presentation
To introduce you about the using and advantage of MATLAB and Bioinformatics Toolbox.
MATLAB and Bioinformatics Toolbox will be applied to the teaching, study and research in our Bioinformatics program.
Outline
Introduction to MATLAB and Bioinformatics Toolbox
Bioinformatics Toolbox’s function
Example of the study that used MATLAB and Bioinformatics Toolbox
What is MATLAB?
MATLAB short for “Matrix Laboratory”.
MATLAB is a tool for doing numerical computations with matrices and vectors.
It is very powerful and easy to use
integrates computation, visualization and programming
Can be used on almost all platforms:
Teaching – Bioinformatics graduate and undergraduate courses
MIT, Harvard, Stanford, Cornell, Carnegie Mellon, … Research -- recent papers use MATLAB for:
Sequencing
Base calling algorithm design
Microarray analysis
Statistical modeling of microarrays, image analysis
Proteomics
Mass spectrometry data classification
Systems Biology
Flux Analysis, Simulation of Metabolic Pathways,
Interaction Network Identification
Robert Henson:The MathWorks,Inc.
MATLAB is widely used in academic
bioinformatics applications
Bioinformatics Toolbox
The toolbox provides access to
genomic and proteomic data formats,
analysis techniques, and specialized visualizations for genomic and proteomic sequence
microarray analysis.
Most functions are implemented in the open
MATLAB® language, enabling you to customize the algorithms or develop your own.
Bioinformatics Toolbox
For more information on related products, visit www.mathworks.com/products/bioinfo
M
ATLAB ® Statistics Toolboxes Bioinformatics Database Image Processing Neural Network OptimizationSignal Processing Required Products Related Products
Outline
Introduction to MATLAB and Bioinformatics Toolbox
Bioinformatics Toolbox’s function
Example of the study that used MATLAB and Bioinformatics Toolbox
Key Features
Support for genomic, proteomic, and gene expression file formats
Internet database access
Sequence Analysis
Microarray Analysis and visualization
Phylogenetic Analysis
Mass Spectrometry Preprocessing and Visualization
File Formats and Database Access
Sequence data:
FASTA, PDB, and SCF
Microarray data:
Affymetrix DAT, EXP, CEL, CHP, and CDF files, SPOT format
data, ImaGene results format data, and GenePix GPR and GAL
files
Directly interface with major Web-based databases
Supports other industry-specific file formats
Microsoft Excel
Key Features
Support for genomic, proteomic, and gene expression file formats
Internet database access
Sequence Analysis
Microarray Analysis and visualization
Phylogenetic Analysis
Mass Spectrometry Preprocessing and Visualization
Sequence Analysis
The Bioinformatics Toolbox provides several
MATLAB based sequence alignment functions, as well as graphical tools for viewing sequence
alignment results.
Sequence Utilities and Statistics Protein Feature Analysis
Sequence Tool (GUI) Sequence Alignment
Sequence Utilities and Statistics
You can manipulate and analyze your sequences to gain a deeper understanding of your data.
Bioinformatics Toolbox routines let you:
Convert DNA or RNA sequences to amino acid sequences using the genetic code
Perform statistical analysis on the sequences and search for specific patterns within a sequence
Apply restriction enzymes and proteases to perform
in-silico digestion of sequences or create random
sequences for test cases
Example
:
Sequence Statistics
>> mitochondria = getgenbank('NC_001807','SequenceOnly',true);
>> ntdensity(mitochondria)
>> basecount(mitochondria,'chart','pie');
Example
:
Sequence Statistics
>> codoncount(mitochondria) for frame = 1:3 figure('color',[1 1 1]) subplot(2,1,1); codoncount(mitochondria,'frame',frame,'figure',true); title(sprintf('Codons for frame %d',frame));subplot(2,1,2);
codoncount(mitochondria,'reverse',true,'frame',frame,'figure',true); title(sprintf('Codons for reverse frame %d',frame));
Protein Feature Analysis
Calculate properties of a peptide sequence
Determine the amino acid composition of protein sequences
>> aacount(ND2AASeq, 'chart','bar') >> atomiccomp(ND2AASeq)
ans = C: 1818 H: 3574 N: 420 O: 817 S: 25 ans = 3.8960e+004 >> molweight (ND2AASeq)
Sequence Tool
Sequence Alignment
The Bioinformatics Toolbox offers a comprehensive list of analysis methods for performing pairwise sequence and sequence profile alignment.
These analysis methods include:
MATLAB implementations of standard algorithms for local and global sequence alignment, such as the Needleman
-Wunsch, Smith-Waterman, and profile-hidden Markov model algorithms
Graphical representations of alignment results matrices
Standard scoring matrices, such as the PAM and BLOSUM families of matrices
Example: Sequence Alignment
Globally align the two amino acid sequences, using the Needleman-Wunsch algorithm.
Locally align the two amino acid sequences using a Smith-Waterman algorithm.
>> [LocalScore, LocalAlignment] = swalign(humanProtein,... mouseProtein)
>> showalignment(LocalAlignment) >> [Score, Alignment] = nwalign(humanProteinORF, mouseProteinORF);
Key Features
Support for genomic, proteomic, and gene expression file formats
Internet database access
Sequence Analysis
Microarray Analysis and visualization
Phylogenetic Analysis
Mass Spectrometry Preprocessing and Visualization
Microarray Normalization
The Bioinformatics Toolbox provides several methods for normalizing microarray data,
lowess, global mean, and median absolute deviation (MAD) normalization.
Filtering functions let you clean raw data before running analysis and visualization routines.
Data Visualization
Together, the Bioinformatics Toolbox, the Statistics Toolbox, and MATLAB provide an integrated set of visualization
tools.
>> maloglog >> maimage >> maboxplot
Data Visualization
>> mairplot >> clustergram >> kmeans >> princomp >> clusterKey Features
Support for genomic, proteomic, and gene expression file formats
Internet database access
Sequence Analysis
Microarray Analysis and visualization
Phylogenetic Analysis
Mass Spectrometry Preprocessing and
Phylogenetic Analysis
The Bioinformatics Toolbox enables you to create and edit phylogenetic trees.
You can calculate pairwise distances between aligned or unaligned nucleotide or amino acid sequences using a broad range of similarity metrics, such as
Jukes-Cantor, p-distance,
alignment-score, or
a user-defined distance method.
Phylogenetic Analysis
Through the graphical user interface (GUI), you can prune, reorder, and rename branches;
explore distances; and read or write Newick -formatted files.
Mass Spectrometry Data Analysis
The mass spectrometry functions are designed for preprocessing and classification of raw data from SELDI-TOF and MALDI-TOF spectrometers.
Reading raw data into MATLAB Preprocessing raw data
Spectrum analysis
Outline
Introduction to MATLAB and Bioinformatics Toolbox
Bioinformatics Toolbox’s function
Example of the study that used MATLAB and
THE CHALLENGE
To accurately predict the clinical outcome for
breast cancer patients
THE SOLUTION
Use MathWorks products to develop a tool that lets clinicians make a
prognosis based on the gene expression profile of the patient’s primary tumor
THE RESULTS
Accurate prediction of disease outcome
Fast, effective response to scientists’ needs
Flexibility to adjust algorithms whenever necessary
Enable you to develop your own
functions
Summary
The Bioinformatics Toolbox appropriates to used in life sciences study
Sequence Analysis
Microarray Analysis and visualization Phylogenetic Analysis
Mass Spectrometry Preprocessing and Visualization
Thank you for your attention