Identifying Behavioral Strategies through
Large Scale Phenotyping
and Statistical Analysis
Stephen Helms, Ph.D.
March 12, 2014 – SURFsara Data & Computing Infrastructure Event FOM Institute AMOLF, Amsterdam, Netherlands
Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF)
How Do We Understand Complex Systems
With Many Parts?
(Also a general “big data” question!)
A Model Complex System Traditional approaches for understanding complex biological systems Statistical approach for understanding biological systems Data and computation problems Proposed computational platform
Outlook for the future
A Simple Model Nervous System:
C. elegans
Stimuli Response The Worm • ~1000 total cells • 302 neurons • 95 muscles • ~20000 genes • Smell (volatile odors) • Taste (soluble chemicals) • Feel (touch, heat) • Movement • Neural activity • Biochemical reactionsA Biologist’s Toolbox
• Break individual parts, see what
happens
Genetics
• Look at how parts chemically
interact
Biochemistry
• See where the parts are
Cell Biology
End result:
•
A list of lots of details about what individual genes and proteins are doing
Idea:
Finding Simple Models Through
Quantitative, Comparative Studies
•
Build quantitative models
that are
just
complicated
enough to explain the phenotypes
we can observe and care about
•
Compare models
across multiple strains and
species to see what phenotypes
biology
cares
about
•
The molecular and cellular details can be filled in
later using traditional approaches
•
Model system: Motile behavior
–
Behavior is the output of all the complicated systems
C. elegans Behavior
•
Undulatory motion
•
Occasional reversals
•
Occasional sharp
“omega” turns
•
Continuous turning
Gray and Lissmann (1964) J. Exp. Biol. 41:135-54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-Shimomura et al. (1999) J. Neurosci. 19:9557-69. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-80. Helms (2013) Figshare.http://dx.doi.org/10.6084/m9.figshare.705155
Experimental Overview
Record video of freely moving worms up to 30
minutes
Sampling Behavioral Variability:
Individual, Intra- and Inter-Species
Holovachov, O. et al. (2009) Nematology 11(6):927-950. Chiang, J.-T.A. et al. (2006) J. Exp. Biol. 209(10):1859-73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-90. Up to 20 individuals
Building Quantitative Models
• Correlation functions • Phase spaces
• Fitting linear models
Deterministic dynamics
• Distributions
Stochastic components
• Monte carlo simulations • Comparison with
statistics of data
Comparing Quantitative Models
Parameter
Data Challenges
Storage
• Videos are large • 240 GB/h raw
• 12 GB/h compressed • Using ~1 TB of storage
for a proof of concept project
• Want to scale up: • # individuals by
10-fold
• Sampling rate by 3-fold
Processing
• >3-fold slower than data collection on a desktop computer • Results in:
• A backlog of data to analyze
• A long delay before experiments can be interpreted
Sharing
• Videos are too big to regularly transfer around
• Extracted data is also big
• 2 GB for the proof of concept project
• Limited ability for others to explore the data themselves
Proposal:
Centrally located
data processing
and
analysis services
at SARA
SARA
Video storage Video processing Standard analyses
Experimental Users (AMOLF, VCU, etc.)
Generate videos Visualize data Develop analyses
Theory Users (VU, OIST, etc.)
Visualize data Develop analyses
Exchange datasets and analysis results (few GBs, weekly)
Upload videos Download datasets
(hundreds of GBs, daily at peak)
Download datasets (tens of GBs, weekly) •Loading large (>10 GB) videos •Processing 104-106 frames / video
How SURFnet/SURFsara/eScience Center
Are Helping
Storage
• SURFsara will provide up to 20 TB of storage for the video data
Processing
• SURFsara will provide computing resources • Cloud or grid
• eScience Center is helping with migrating analysis code to run on HPC infrastructure
Sharing
• SURFnet is connecting the involved institutes with SURFsara using high-speed lightpath connections • FOM Institute AMOLF • VU • Okinawa Institute of Science and Tech. • Virginia
Commonwealth University
Growth Prospects
• Open source aspects of C. elegans community
– WormBook - textbook – WormBase - genetics – WormAtlas - anatomy – etc.
• As an analysis service available to other researchers
– Motility is widely used as a simple phenotype by C. elegans researchers
• Collaborative development of new analysis methods
– Other researchers developing statistical analysis approaches for worm behavior
• Integration of neuronal imaging data
– Ongoing experiments in the systems biology group at AMOLF
These Are General Challenges
• Increasing temporal and spatial
resolution
more data
Advances in
imaging sensors
• Increasing experimental throughput
more data, access to statistical
approaches
Advances in
experimental
techniques
• Distortion of data due to compression
artifacts is a major concern among
experimentalists
Lack of
compression
Acknowledgements
•
Enlighten Your Research 4 and Global Teams
–
Nicole Gregoire (SURFnet)
–
Sylvia Kuijpers (SURFnet)
–
Jan Bot (SURFsara)
–
Frank Seinstra (eScience Center)
•
eScience Center
–
Rob van Nieuwpoort
–
Elena Ranguelova
•
Everyone else involved @ SURFnet, SURFsara
•
Local ICT members