New Applications of
Computer Analysis to
Biomedical Data Sets
QB3 Seminar
May 28, 2015
9:00am – 5:00pm
Byers Hall 212
Overview
This seminar will explore a range of new techniques to use large data sets to uncover correlations, causation, and trends in different fields of research to explore new applications to biomedical research
Objectives
Primary objective is to mix specialists in mathematics, computer science, and data analysis with biomedical researchers whose research can scale into very large datasets.
• Explore new ways to view and analyze diverse data with large-scale computation
• Discuss current problems in biomedical research and the opportunities with large data sets
Speakers and Outline of Discussion Topics
I. Emerging techniques in mathematics and machine-aided learning and discovery:
-‐ James Brase, LLNL, Application of graph analytics and machine learning
-‐ Gunnar Carlsson, Stanford & Ayasdi, Topological data analysis
-‐ Steve Libby, LLNL, Dynamics in complex networks
-‐ Felice Lightstone, LLNL, Ligand Binding in Proteins
-‐ Larry Smarr, CalIT2 UCSD, Human Gut Microbiome Ecology and Function using Metagenomics and Supercomputers
-‐ Geoff Sanders, LLNL, Structure and analysis of networks
-‐ George Sugihara, UCSD, Understanding Correlation and Causation in a Nonlinear World II. New opportunities in biomedical research:
-‐ David Agard, QB3 UCSF: Computational challenges in very low signal to noise light & electron microscopic imaging
-‐ Nevan Krogan, QB3 UCSF: Protein Interaction Networks – Toward a Deeper Understanding of Virus-Host Interactions?
-‐ Ryan Hernandez, QB3 UCSF: Evolutionary Forces Driving Patterns of Human Genetic Variation
-‐ Andrej Sali, QB3 UCSF: Integrative modeling of biomolecular assemblies and pathways
-‐ Brian Shoichet, QB3 UCSF: Computational Screening of Organic Molecules
Follow up Discussion
New Applications of Computer Analysis to Biomedical Data Sets
Discussion Schedule (tentative):8:30am—9:00am Light Breakfast
9:00am—9:10am Reg Kelly Welcome
Steven Beckwith Goals for the meeting
9:10am—10:40am George Sugihara Understanding Correlation and Causation in a Nonlinear World
Andrej Sali Integrative modeling of biomolecular assemblies and pathways
Gunnar Carlsson Topological Data Analysis
10:40am—11:00am Break
11:00am—12:30pm Brian Shoichet, Computational Screening of Organic Molecules Jim Brase Application of graph analytics and machine learning
David Agard Computational challenges in very low signal to noise light and electron microscopic imaging
12:30pm—1:30pm Lunch (provided)
1:30pm—3:30pm Felice Lightstone Ligand Binding in Proteins
Ryan Hernandez Evolutionary Forces Driving Patterns of Human Genetic Variation Larry Smarr Human Gut Microbiome Ecology and Function using Metagenomics and
Supercomputers
3:00pm—3:30pm Break
3:30pm—5:00pm Geoff Sanders Structure and analysis of networks
Nevan Krogan Protein Interaction Networks Steve Libby Dynamics in complex networks
5:00pm—6:00pm Reception
Speakers and Discussion Topics
David Agard, QB3 UCSF, (http://www.msg.ucsf.edu/agard/):
Discussion topic: Computational challenges in very low signal to noise light and electron microscopic imaging
• Live samples are readily damaged during fluorescent microscopic imaging. An example of a new deconvolution strategy
that provides useful reconstructions at <1% of the signal will be discussed
• Electron beam damage, beam induced movement, and sample heterogeneity severely limit resolution in cryoEM of
biological samples, requiring weeks to months of computation
David Agard is a Professor in the Department of Biochemistry and Biophysics at UCSF. His lab focusing on structural mechanistic studies of cell protein folding machinery and cytoskeleton formation as well as improving hardware technology and algorithms for light and electron microscopy.
Jim Brase, Lawrence Livermore National Laboratory:
Discussion topic: Applications of high performance computing to biology: graph analytics and machine learning • Applications of graph analytics and machine learning for the intelligence community.
• Applications of high performance computing to biology (BAASiC). The LLNL initiative will have both data analytics and
simulation components. Network models for complex molecular interactions are an important component.
Jim Brase is the Deputy Associate Director for Computation at Lawrence Livermore National Laboratory (LLNL). He leads LLNL research in the application of large-scale data analytics and simulation to national security missions. Jim’s research interests are in the areas of machine learning, agent-based modeling, statistical methodologies, and applications to cybersecurity. He has also led programs in laser and imaging research at LLNL.
Gunnar Carlsson, Ayasdi & Stanford University (math.stanford.edu/~gunnar/, http://www.ayasdi.com/company/leadership/):
Discussion topic: Topological Data Analysis
• Ayasdi’s software reveals patterns and relationships in complex, highly dimensional datasets using Topological Data
Analysis (TDA). TDA makes complex data simple enabling users to better understand the nature of the data set and detect subtle patterns that often remain hidden from other techniques.
Gunnar Carlsson is renowned for his advances of the branch of mathematics called topology, the study of shapes. He is President and co-founder of Ayasdi, a young company in the Bay Area. He has an undergraduate degree from Harvard University and a doctorate from Stanford, where he was Chair of the Department of Mathematics from 1995 - 1998. Although topology has been field of study since the 1700s, Gunnar was one of the first pioneers to apply topology to solve complex real world problems. In the early 2000s, this work led to $10M in research grants from the National Science Foundation (NSF) and DARPA to study the application of Topological Data Analysis (TDA) to problems of interest within the U.S. government. In 2008, based on the success of these efforts, Gunnar, along with two other Stanford mathematicians, co-founded Ayasdi.
Gunnar has taught at the University of Chicago, University of California, Princeton University, and since 1991, Gunnar has been a professor of mathematics at Stanford University, where he has been a thought leader in a branch of mathematics called topology, the study of shape. He is married, has 3 grown boys including two sons who are mathematicians, and lives in Palo Alto.
Ryan Hernandez, QB3 UCSF, (dbts.ucsf.edu/hernandez_lab/):
Discussion topic: Evolutionary Forces Driving Patterns of Human Genetic Variation
Ryan Hernandez, PhD, is an Assistant Professor in the Department of Bioengineering and Therapeutic Sciences, and a member of the Institute for Quantitative Biosciences (QB3) and the Institute for Human Genetics. Ryan has a BA in
mathematics from Pitzer College, and a PhD in Biometry from Cornell University. Before joining UCSF in 2010, Ryan was a postdoctoral scholar in the Department of Human Genetics at the University of Chicago. The Hernandez Lab at UCSF studies patterns of genetic variation using the tools of population genomics. Ryan has played a leading role in the analysis of several large-scale genetic variation data sets from a wide range of complex populations including humans, medically relevant
non-human primates (e.g., rhesus macaque), as well as domesticated species (e.g., cow and rice). Ryan has also developed widely used software for complex computer simulations of population genetic data.
Nevan Krogan, QB3 UCSF, (kroganlab.ucsf.edu):
Discussion topic: Protein Interaction Networks – Toward a Deeper Understanding of Virus-Host Interactions
Nevan Krogan was born and raised in Regina, Saskatchewan, Canada and obtained his undergraduate degree from the University of Regina. As a graduate student at the University of Toronto, Dr. Krogan led a project that systematically identified protein complexes in the model organism, Saccharomyces cerevisiae, through an affinity tagging-purification/mass spectrometry strategy. This work led to the characterization of 547 complexes, comprising over 4000 proteins, and represents the most comprehensive protein-protein interaction map to date in any organism. To complement this physical interaction data, Dr. Krogan developed an approach, termed E-MAP (or epistatic miniarray profile), which allows for high-throughput generation and quantitative analysis of genetic interaction data. Dr. Krogan’s lab at UCSF focuses on applying these global proteomic and genomic approaches to formulate hypotheses about various biological processes, including transcriptional regulation, DNA repair/ replication and RNA processing. His lab at UCSF is now developing and applying methodologies to create genetic and physical interactions between pathogenic organisms, including HIV, Mtb, and Dengue, and their hosts, which is providing insight into the human pathways and complexes that are being hijacked during the course of infection
Steve Libby, Lawrence Livermore National Laboratory:
Discussion topic: Dynamics in complex networks • Relationship to statistical physics
• A concrete LLNL example addressing a specific question • Why this is hard
• Analysis and computational approaches Challenges and open problems – Where is the field going?
Steve Libby is the Theory and Modeling Group Leader in the Physics Division at LLNL. His current research focuses on high energy density physics (inertial confinement fusion and short wavelength lasers) as well as applications of atomic physics to novel quantum coherent technologies. In the latter arena he is currently developing high precision cold atom based gravity sensors for both security and fundamental physics applications. He received his B.A. from Harvard University in 1972, and his Ph.D. in Physics from Princeton University in 1977 as a student of David J. Gross. He performed postdoctoral work at the Yang Institute for Theoretical Physics at SUNY at Stony Brook, and was subsequently a Research Assistant Professor at Brown University. During this period, he worked on quantum chromodynamics, co-developing key factorization theorems, and the theory of the quantum Hall effect, co-discovering the scaling theory. Beginning in 1986 at LLNL, he focused on x-ray laser research, becoming the design group and program leader. He was also a Consulting Professor at Stanford University from 1992-1994. He has also served at DOE as the LLNL ‘Science Council’ member and in the Advanced Strategic Computing Office. Libby is a Fellow of the American Physical Society. In addition, he holds a certificate in International Security from Stanford University.
Felice Lightstone, Lawrence Livermore National Laboratory, (bbs.llnl.gov/FeliceLightstone.html):
Discussion topic: Ligand Binding in Proteins
• Creating metrics for screening ligand binding in proteins on HPC platforms.
• Calculating energy landscapes and kinetic parameters for ligand binding and comparing against experimental results. • Relating ligand binding to adverse drug reactions and creating predictive models for drug development.
Felice Lightstone is the Group Leader for the Biochemical and Biophysical Systems Group in Physical and Life Sciences Directorate at Lawrence Livermore National Laboratory. Her group uses multidisciplinary approaches - ranging from molecular biology through proteomics to modeling - to investigate microbial communities, pathogenesis, and drug development. The unifying objective in her group is to understand protein-mediated activities in cells.
Geoff Sanders, Lawrence Livermore National Laboratory:
Discussion topic: Structure and analysis of networks and graphs • Overview of social networks and graphs
• Operations on graphs – What kind of questions can be addressed? • Graph construction from different kinds of measurements
• Analysis methods – structural, spectral, random walks ... • Moving to dynamic graphs
• Challenges and open problems – Where is the field going?
Geoff Sanders is a PostDoc in the Computational Mathematics group at the Center for Applied Scientific Computing. Geoff’s current research focus is on developing multilevel linear algebra techniques for the efficient computation of eigenpairs for large, scale-free graph matrices. A native of Reno, Nevada, Geoffrey earned his Bachelor’s degree in Mathematics at the University of California, San Diego in 2002, his Master’s in Applied Mathematics (2005) and PhD (2008) at The University of Colorado, Boulder.
Andrej Sali, QB3, UCSF, (salilab.org/index.html):
Discussion topic: Integrative modeling of biomolecular assemblies and pathways
• Networks and spatial structures of biomolecular interactions for the function and workings of living cells.
• Hybrid approaches to the structural characterization of large and dynamic assemblies and their networks, integrating data
from diverse experiments: X-ray crystallography, NMR spectroscopy, electron microscopy, chemical cross-linking, yeast-two hybrid system, and various chemical genetics and proteomics approaches.
• Approach to structure and/or network determination as an optimization problem with three components: Representation
of the assembly, the scoring function, and the optimization method.
• Key challenges on translating experimental data into restraints on the structure and/or network, combining these spatial
and/or network restraints into a single scoring function (preferably using a Bayesian approach), optimizing the scoring function, and analyzing the resulting ensemble of solutions.
• The approach will be illustrated by several applications to specific biological systems, including the structure determination
of the nuclear pore complex and the mapping of the gulonate pathway.
Andrej Sali received his BSc degree in chemistry from the University of Ljubljana, Slovenia, in 1987; and his PhD from Birkbeck College, University of London, UK, in 1991, under the supervision of Professor Tom L. Blundell, where he developed the MODELLER program for comparative modeling of protein structures. He was then a postdoc with Professor Martin Karplus at Harvard University, studying lattice Monte Carlo models of protein folding. From 1995 to 2002, he was first an Assistant Professor and then an Associate Professor at The Rockefeller University. In 2003, he moved to University of California, San Francisco, as a Professor of Computational Biology in the Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3). Dr. Sali develops and applies computational methods for determining and modulating structures and functions of proteins and their assemblies.
Brian Shoichet, QB3 UCSF, (http://www.bkslab.org):
Discussion topic: Computational Screening of Organic Molecules
• Screening (docking) libraries of 6 million organic molecules against multi-conformational protein structures, seeking
ligands to modulate activity. Individual targets include JMJD2C, a cancer target, the beta2-adrenergic and muscarinic receptors (asthma), and the Mu-opioid receptor (pain), and the orphan G Protein-Coupled Receptors GPR68 (learning) and GPR65.
• Expanding screening methods to the entire orphan GPCR-ome (~120 receptors)
• Creating networks that relate proteins by the ligands they recognize, as opposed to their sequence similarity or
protein-protein interactions. Based on these networks, predict ligands that will act across multiple targets. Quantitative comparisons of bio-informatics networks and chemistry-based networks.
Brian Shoichet is a Professor in the Dept. of Pharmaceutical Chemistry at UCSF. His lab develops and tests both target-directed and ligand-target-directed drug discovery methods, using cycles computation and experiment.
Larry Smarr, CalIT2 UCSD, (lsmarr.calit2.net):
Discussion Topic: Human Gut Microbiome Ecology and Function using Metagenomics and Supercomputers
• Developing software pipeline using supercomputers to convert 10-20 billion sequenced DNA bases of human gut
microbiome for each of 300 people to logarithmic relative abundances of thousands of microbial species in each person
• From this compute relative abundances of 10,000 protein families in each of 60 people
• Visual analytics on 64Mpixel walls to discover major ecological shifts from health to disease across the population • Advanced data analytics in collaboration with Ayasdi and Dell Analytics to discover orders of magnitude shifts in protein
Larry Smarr spent 25 years developing computational general relativistic astrophysics at UTexas, Princeton, Harvard, UIUC and LLNL. He was the founding director of both the National Center for Supercomputing Applications at UIUC and the California Institute for Telecommunications and Information Technology at UCSD/UCI.
George Sugihara, Scripps Institution of Oceanography, UCSD, (https://scripps.ucsd.edu/profiles/gsugihara) Discussion topic: Understanding Correlation and Causation in a Nonlinear World
• In nonlinear systems lack of correlation does not imply lack of causation.
• Mirage correlations are ubiquitous and arise because associations among variables depend on the changing system state. State dependence (interdependence) is the defining hallmark of nonlinear systems.
• One cannot understand nonlinear dynamic systems with static linear tools. Equilibrium models are a mismatch. Treating systems piecewise as if the pieces are separable and independent (reductionism) is a mismatch.
• Equation-free “empirical dynamic” models (EDM) offer an alternative that allows systems to be studied as they are… as an interdependent ever-changing whole.
George Sugihara is the McQuown professor at Scripps Institution of Oceanography. His lab focuses on understanding complex systems including specific applications (of algebraic topology, dynamical systems theory and nonlinear forecasting) to
ecosystems, climate, finance, gene expression etc. He was a former Managing Director at Deutsche Bank and was solicited (but declined) to be Chief Scientist for NOAA (asst. secretary level, Dept. of Commerce). His lab has been involved in various NAS/NRC studies on climate, fisheries, and systemic risk in the financial sector.