• No results found

Data Vault & Pentaho in Healthcare. Kasper de Graaf, Aly Hollander

N/A
N/A
Protected

Academic year: 2021

Share "Data Vault & Pentaho in Healthcare. Kasper de Graaf, Aly Hollander"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Vault & Pentaho in Healthcare

(2)

St. Antonius Ziekenhuis Nieuwegein / Utrecht ● 3 locations ● 5.000 employees ● 1.100 beds ● 33 specialties ● 250 specialists ● 150 junior doctors

(3)

Data Vault & Pentaho in Healthcare | 5 juni 2013 3 Santeon Santeon Hospitals 1. Canisius-Wilhelmina Ziekenhuis 2. Catharina Ziekenhuis 3. Martini Ziekenhuis

4. Medisch Spectrum Twente

5. Onze Lieve Vrouwe Gasthuis (OLVG)

(4)

Healthcare is all about data

patient files, diagnostics, R & D

Like an ordinary business, Not more difficult,

DOT,

Care activities, Appointments, Procurement

(5)

Data Vault & Pentaho in Healthcare | 5 juni 2013 5

EPR (Electronic Patient Record) system for Sint Antonius Ziekenhuis

Maintains and improves the quality of care Developed and maintained by ICT

department

Based on web technology, open standards and open source software

Supports primary health care process

Modulair system, can be tuned for various user profiles: doctors, nurses or other

health professionals

(6)

IntraZis, Data Warehouse & Pentaho

2010: more demand for (management-)information IntraZis is not suitable for extensive queries

→ ICT department starts a DWH and BI project MySQL en Pentaho was chosen

in the tradition of in-house development and open source

(7)

Data Vault & Pentaho in Healthcare | 5 juni 2013 7

(8)

Data Vault ETL-issues

Many objects to load (hubs, links, satellites) Automation (almost) required

We did NOT want to – Get rid of ETL tooling – Code the ETL ourselves

– Manage too many ETL objects

(9)

Data Vault & Pentaho in Healthcare | 5 juni 2013 9

Our Solution

Use meta data to drive generic ETL

(10)

Result: The Kettle Data Vault Framework

A set of generic ETL transformations Driven by meta data

– (currently in XLS; loaded to MySQL Database) A couple of configuration files

A couple of ETL jobs and transformations to tie it all together

(11)

Data Vault & Pentaho in Healthcare | 5 juni 2013 11 Staging Area CSV Files ETL ERP DBMS

Sources ETL Process Data Warehouse EUL

MySQL Files ETL: Kettle Data Vault Frame work Central DWH & Data Marts MySQL Data Vault ETL

The Architecture

(12)

Data Vault Size

Approx. 125 tables (excl error and helper tables) 40GB of data

Largest table: 42 mln rows Total rows: 160 mln

Refresh rate: twice a day

(13)

Data Vault & Pentaho in Healthcare | 5 juni 2013 13

Advantages of Data Vault for us

Full traceability of history (a DBC changes rapidly over time, we often see more than 30 versions)

Data Model is very extensible (incorporating new source systems)

Business rules are moved downstream (and change often)

(14)

The Tooling

Database: MySQL

ETL: Pentaho Data Integration (Kettle) BI: Pentaho

(15)

Data Vault & Pentaho in Healthcare | 5 juni 2013 15

Automation?

Staging physical database & loading: can be automated, but currently not part of the framework

Data vault design: manual

Data vault physical database: manual

Mapping from source to data vault (Excel sheet): manual Data vault population: automated using the framework Data marts & BI: manual

(16)

So what does this Framework do?

Automatically populate the entire data vault data warehouse

Generate logging

Error rows are inserted in special error tables Restartable (using the load_dts of the

(17)

Data Vault & Pentaho in Healthcare | 5 juni 2013 17

Some design decisions

Updateable views with generic column names

Compare satellite attributes using string comparison (concatenate all columns, with | (pipe) as delimiter)

'inject' the metadata using Kettle parameters

Generate and use an error table for each Data Vault table Check for ‘design errors’ (i.e. references to non-existent tables, connections, attributes)

(18)

Supported constructs

Hubs, Links, Satellites

Multi source hubs and links

Last_seen_dts (hubs and links)

Link attributes (attribute in a link that references a ‘hub’ that is not modeled, like

orderline)

(19)

Data Vault & Pentaho in Healthcare | 5 juni 2013 19

Not (yet) supported constructs

Composite business keys in a hub (can be solved using concatenation)

Link-to-link relationships Multi active satellites

(20)
(21)

Data Vault & Pentaho in Healthcare | 5 juni 2013 21

(22)
(23)

Data Vault & Pentaho in Healthcare | 5 juni 2013 23

(24)

Final remarks

PDI framework & data vault now operational for > 2 years, still growing and still going strong

Generic solution saves an enormous amount of time (both development and testing)

Generic solution is a bit harder to maintain and debug Luckily maintenance is now close to zero

(25)

Data Vault & Pentaho in Healthcare | 5 juni 2013 25

Want to try?

The PDI framework is open source!

Download a fully operational Virtual Machine at:

http://sourceforge.net/projects/pdidatavaultfw/

Developer: Edwin Weber

(26)

References

Related documents

| Apps Users Advanced Security Data Redaction Data Masking TDE Database Vault Privilege Analysis Database Vault Privileged User Controls|.

As the Enterprise Vault environment grows, user reliance on archived data also grows; ensuring users have reliable access to archived data becomes an Enterprise

These results suggest that the postoperative course of the patients who underwent RS might be milder than after LS in cases in which complications developed, whereas the post-

In addition to large companies in food industry such as Unilever, Ferrero, P & G and Nestle, there are also NGOs members such as WWF, Solidaridad and Oxfam (Nikoloyuk, et

In CFT, the seven Bt hybrids significantly differed from their non-Bt counterparts for leaf damage, number of exit holes, percent tunnel length, and grain yield.. Bt-hybrids had

The purpose of this two hour CE course is to provide an overview of the professional aspects of the Certified Nursing Assistant's (CNAs) role and to explore the importance

The high spatial resolution of fMRI (mm in a 3T clinical scanner) makes it a desirable choice in localizing regions of involvement when studying neuronal networks, especially in

The current study investigated 30 current Major League Baseball players, examining how their performance changed once they signed a long term deal during an arbitration year.. This