• No results found

Primetime for KNIME:

N/A
N/A
Protected

Academic year: 2021

Share "Primetime for KNIME:"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Primetime for KNIME:

Towards an Integrated Analysis and Visualization

Environment for RNAi Screening Data

F. Oliver Gathmann, Ph. D.

Director IT, Cenix BioScience

Presentation for:

KNIME User Group Meeting 2011

(2)

Overview

Explain “RNAi Screening”

IT infrastructure for HT-HCS

(High-Throughput, High-Content Screening)

Workflow software evolution at Cenix: past,

(3)

How RNAi works

siRNA

Unwinding of siRNA

Target mRNA recognition Target mRNA

RISC

Degradation of mRNA

Explain “RNAi Screening”

First Take Home Message:

RNAi allows you to investigate the function of genes by knocking them down selectively

(4)

The Drug Discovery Pipeline

Target Discovery Target Validation in vitro Target Validation in vivo Lead

IdentificationOptimizationLead ADME/Tox Phase IClinical Phase IIClinical Phase III RegistrationClinical Target Discovery (in vitro)

Target Discovery (in vitro)

Direct LoF Screens

Direct LoF Screens Modifier ScreensModifier Screens

Target Validation (in vitro)

Target Validation (in vitro) Phenotypic Profiling

Phenotypic Profiling Phenotypic TitrationPhenotypic Titration

Explain “RNAi Screening”

Second Take Home Message:

“Early In The Drug Discovery Pipeline” means high-throughput and lots of data

(5)

Information Layers

Metabolic Pathway Gene Silencing Reagent Experiment Hit Phenotype

Explain “RNAi Screening”

Sequence, species, pathway annotations, transcripts

Structure, targeted Gene(s), stock and order information

Meta data (sample and control positions), production data

Cell images, morphology data

Phenotype annotations, knock down, reproducibility, significance Gene network, disease

conditions

Last Take Home Message:

(6)

Cenix

IT Infrastructure for HT-HCS

Database Server File Server Farm Automated Microscope Pipetting Robot Tube Handler LDAP Server Scientist Workstations LIMS Server

(7)

Terminology: “Workflows”

Process-centric: mapping a work process in the

physical world; focused on data

acquisition

Data-centric: mapping an algorithm; focused on

data

processing

Not always clear-cut, but still useful distinction

Workflow software evolution

“Process-centric Workflows”

vs.

(8)

Primordial Process Workflows: Design

(9)

Workflow software evolution

(10)

Data Analysis Workflows: Excel

In the beginning, there was Excel.

+

Advantages:

• Ubiquitous and easy to use

• Full flexibility for the end user (in theory, anyways)

Disadvantages:

• Hard to debug

• Nightmarish version control

• Slow and cumbersome

(11)

Data Analysis

Storage

Experiment Design Data Acquisition

Image Processing

Data Analysis Workflows: Excel

qPCR Autoscope Plate reader LIMS Server Job Server

File Server Database Server LIMS Client Excel Design experiment Submit experiment Post image data Engines Run image

processing job; store phenotype data

Load phenotype data files; run analysis; generate graphs Store image data Excel Img. Analysis Client Submit image processing job

Workflow software evolution

Store experiment data; track experiment; wait for image data

(12)

Data Analysis Workflows: Web Tools

Next: Web tools with tabular data as input

and output.

+

Advantages:

• Encapsulation of complex functionality

• Centralized administration

• Executed on server

Disadvantages:

• Low flexibility

• Frugal web interface

(13)

Data Analysis

Storage

Experiment Design Data Acquisition

Image Processing

Data Analysis Workflows: Web Tools

qPCR Autoscope Plate reader LIMS Server Job Server

File Server Database Server

LIMS Client Excel Engines

Load result data files; generate graphs Img. Analysis Client Web Tools Server Browser Spotfire Run analysis Download result data files

Workflow software evolution

Upload phenotype and design data files

(14)

Data Analysis Workflows: KNIME!

KNIME: A giant leap forward

– Flexible and easy to use and yet robust, scalable, performant and extensible!

Current KNIME infrastructure:

– Centrally administered Windows and Mac

installations, configured to point to a user-specific workspace on the file server

– Workflow curation policy: Versioned reference

workflows for each project, owned by power users

– Experiment meta data provided through database nodes, raw data through files

– Complex statistics implemented with (remote) R scripting nodes

(15)

Data Analysis

Storage

Experiment Design Data Acquisition

Image Processing

Data Analysis Workflows: KNIME!

qPCR Autoscope Plate reader LIMS Server Job Server

File Server Database Server KNIME Spotfire LIMS Client Excel Engines

Workflow software evolution

Img. Analysis Client

Load result data files; generate graphs

Run workflow on phenotype data and experiment design

(16)

Primetime: Requirements

Streamlining the Screening Pipeline

– Analysis has become the bottleneck: Potential for 10-20 % increase in overall throughput

Even “Higher” Content:

– More parameters using advanced analysis methods

– Single object rather than population data

– Integrate gene annotations and pathway data

Enable customers to explore and (re-)analyze

delivered data sets

– Selecting/weighing parameters

– Tight integration with Spotfire, including raw data

(17)

Data Analysis Storage Experiment Design Data Acquisition Image Processing qPCR Autoscope Plate reader LIMS Server KNIME Server Job Server

File Server Database Server KNIME Spotfire LIMS Client Excel Submit image analysis job; wait for phenotype data

Engines

Post phenotype data

Post phenotype data; run workflow on phenotype data and experiment design; post result data

Retrieve result data; run Spotfire Store image data;

launch image

processing workflow

Primetime: IRIS

Workflow software evolution

(18)

Primetime: Beyond IRIS

Use KNIME for process-centric workflows as well

This would require

– Standard interface to the LIMS server to drive the “business logic” (REST)

– Easily configurable User Interfaces to parameterize processing steps (something like RGG?)

(19)

Primetime: Beyond IRIS

KNIME “solutions”: Hide complexity of workflows

by exposing only a few “knobs” to the end user

Features:

– Again, a User Interface generator to make it easy for non-IT power users to create new solutions

– Ideally, a way to publish the “solution” to a server and run it remotely

(20)

Conclusions

KNIME has quickly become an integral part of the

HT-HCS screening pipeline at Cenix

Current work on the data analysis infrastructure

around KNIME is focused on tight integration with

the LIMS server, with Definiens for image

processing, and with Spotfire for data visualization

Further down the road, we plan to use KNIME for

all workflows at Cenix and to build pre-packaged

“solutions”

(21)

References

Related documents

Visual counting and debris sampling were used to assess (1) magnitude of plastic transport, (2) the spatial distribution across the river width, and (3) the plastic

You may either make one copy of the SOFTWARE solely for backup or archival purposes or transfer the SOFTWARE to a single server platform accessing only one database, provided you

Reservation Software, Hotel Reception Software (Front Office), Call Accounting, Hotel Point of Sales (Restaurant, Bar, Room Service, House Keeping or any other outlet),

Veracode’s Vendor Application Security Testing (VAST) Program provides the first comprehensive application security compliance program to large enterprise customers as well as

Third, using an evolutionary optimisation ap- proach, we effectively apply route randomisation while controlling its impact on hard real-time performance guarantees..

Methods based on checklists and in pre-defined evaluation models speed up the process and facilitate the consideration of multiple dimensions of the invention, from the

When the new Spotfire Server is in place, you need to install a Spotfire client and import the Library content and Information Model exported from the 10.1 database tables by

Server side image maps refer to another file on the server, so that if a person clicks on the image, the browser refers to that file for the location the link needs to go to.