• No results found

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

N/A
N/A
Protected

Academic year: 2021

Share "Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Evolution of Connectivity in LabKey Server

Karl Lum

Partner, LabKey Software [email protected]

(2)

Lowering the barrier to connect scientific data to LabKey

Server

 Increased flexibility in routing data

 Cooperating with other systems

 Giving users more options with their data

Historical look at LabKey Server connectivity

Focus on some recent changes (REDCap, FreezerPro, ETL)

Future directions

(3)

 LabKey Software focused on Proteomics

 CPAS server processing MS2 runs through the data pipeline

 Data uploaded through the browser and results saved to the database

 Pipeline tasks could parse specific data formats

 Analysis of Flow data

 FCS files processed via the pipeline

(4)

2005 Connectivity Summary

Data Pipeline Form Entry

LabKey Server Java Module

(5)

 Collaboration with SCHARP on the Atlas Portal

 Many data types associated with HIV/AIDS research

 Lots of study and assay data

 CRF and specimen data imported through the pipeline

 Assay data consisted of machine generated data files

 Assay framework and GPAT

 Imports data from spreadsheets or tab-separated text files

 No built-in specialized analysis or visualizations

(6)

 Needed many custom applications for Atlas

 Java modules were complex to build and maintain

 Build custom applications without the module overhead

 LabKey APIs & Simple Modules

 Lowered the extensibility barrier

 Insert, update, delete programmatically

 Module based assays allowed easy entry into the assay framework

 Lists

 Create tables in LabKey Server and integrate with existing data

 Easily import file based data through the browser

 Tools to infer fields from files

(7)

2008 Connectivity Summary

Data Pipeline Form Entry

LabKey Server

Java Module Simple Module File upload

(8)

 Support for connecting to data sources not in the LabKey Server schema

 Relocating the data is no longer required

 LabKey Server security could be applied

 Editing of external table through the LabKey Server UI can be enabled

 Supported data sources:

 SAS  PostgreSQL  Microsoft SQL Server  Oracle  MySQL

2008-2009 External Schemas

(9)

 LabKey Software continues to refine APIs

 Additional language bindings for Perl and Python

 Polish module based tools

 Remote connections

 LabKey Server as an external data source

 Connectivity through the LabKey Server API

 Folder level granularity

(10)

2012 Connectivity Summary

Data Pipeline Form Entry File upload Client API LabKey Server

Java Module Simple Module

External SQL Data Sources

(11)

REDCap

 Web application for building and managing online surveys and databases

 Developed and distributed by Vanderbilt University

 Popular in the academic and research community for designing clinical and translational research databases

(12)

 International Center of Excellence for Malaria Research (ICEMR) at the University of Washington

 Demographic and clinical data in REDCap

 Wanted their REDCap data integrated into their LabKey Server

 Visualizations

 Queries

 Integration with experimental data

(13)

 Data needed to be synchronized from REDCap to the LabKey Server

 REDCap API allowed programmatic and secure access to the projects of interest

 Data is extracted and saved in a format that can be imported into a LabKey Server study

 Scheduled automatic import

(14)

FreezerPro

 Commercial web application for frozen specimen inventory management

 Supports various sample types

 Tracks location and availability of specimens

 Allows user defined fields

 Users can create custom reports and export data

(15)

 Novo Nordisk Type 1 Diabetes Research Center

 Uses FreezerPro to manage their research specimens

 Needed their specimen inventory integrated into LabKey Server

 Combine with experimental data

 Queries

(16)

 API access to the remote FreezerPro server

 LabKey Server uses a secure storage to encrypt the FreezerPro credentials

 Inventory information is imported directly into LabKey Server

 Uses the data pipeline

 Study specimen repository

 Users control, field mapping, filtering, synchronization schedule

(17)

 Stands for extract, transform and load

 Developed as part of HIDRA (Hutch Integrated Data Repository & Archive)

 Goals of building a LabKey Server ETL Framework

 Provenance

 Understanding the origin of the data, knowing when and how it got there

 Auditing

 Security

(18)

 Built on top of Pipelines

 Functionality

 Query based ETLs

 Stored procedures

 Remote Sources

 Checkers (identify whether work is to be done)

 Scheduling

 Logging output

2013-2014 ETL Framework

(19)

 ETLs are module based

 An ETL consists of a set of Transform Steps

 Key components of a transform

 Source table or query

 Destination table

 Filter strategy

 Identifies rows to transform & if there is work to do

(20)

Filter Strategies

 Choose which rows to move to target table

 Select all

 Just get all the data, every time

 Last modified

 Rows with a date/time column newer than last run

 Records most recent value

 Run filter

 Checks a specified column, especially an incrementing integer column

 Any rows with higher value than last time are transformed

 Useful for rows written by previous ETLs

2013-2014 ETL Framework

(21)

Target Options

 How to add data to target table

 truncate - delete all rows and add the selected ones

 append

 Add new rows to the target table

 Will fail if duplicate primary keys

 merge

 Update or Insert

 Matches Primary Keys

(22)

Schedule Options

 When to run the transform

 Poll option

 Check at a defined interval

 Cron option

 Can be used to check at a particular time of day

2013-2014 ETL Framework

(23)

Connectivity Summary

Data Pipeline Form Entry File upload Client API LabKey Server

Java Module Simple Module

External SQL Data Sources

(24)

 Other connection strategies LabKey is investigating

 DatStat

 Online data and study management software

 I2b2

 informatics framework that will enable clinical researchers to use existing clinical data for discovery research

 Caisis

 Open source, cancer data management system

(25)

Any questions?

Karl Lum

References

Related documents

The radiography control button was released before the exposure end. The display indicates the remaining exposure time. Based on this time, you must decide whether to develop

The following are Process Scheduler server software requirements: • RDBMS connectivity software. • Supported SQL query tool • PeopleSoft Batch

3. In this case, the module can be played continuously. If the currently finish playing the first song, the track pointer automatically point to second song, If you send a "play

This includes coordinating the configuration and installation of hardware and software and ensuring that file server functions, E-mail connectivity and whatever else is necessary

 LabKey executes the code, providing query data & meta data  Artifacts created by the code are then sent to the browser.. Script Reports: Tips

The SQL Server client connectivity at the portal level should be based on the SQL Server 2005 client, if there is a presence of SQL Server 2005 and SQL Server 2000 database

A detailed review of tool wear mechanisms in machining has been undertaken by e.g., Akhtar et al. Knowledge of tool material properties is essential in development of new

SQL Server utility Data tier application Connectivity to SQL Azure. Network Connectivity SQL Server