• No results found

THE RWE SERIES BIG DATA ANALYTICS & VISUALISATION RWE TRIA

N/A
N/A
Protected

Academic year: 2021

Share "THE RWE SERIES BIG DATA ANALYTICS & VISUALISATION RWE TRIA"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

RWE Series Part III: Big Data Analytics and Visualisation

Big Data in healthcare is a complex, rapidly

evolving and exciting area. It is an important

source of Real World Evidence (RWE) - data

which has been collected outside of a

conventional clinical trial.

In Part III of the RWE Series we distil this

wide-ranging and sometimes intimidating topic down

to its key features to understand how we can

take advantage of this data and maximise its

potential.

We pay special attention to the visualisation

methods for interpreting and presenting Big

Data.

This is the third part of a regular series available

at svmpharma.com

PART III:

BY GEM AUDDY

[email protected] SVMPHARMA LIMITED WINCHFIELD LODGE OLD POTBRIDGE RD WINCHFIELD HAMPSHIRE RG27 8BT

THE RWE

SERIES

Providing the background, the

research and the insights.

RWE TRIALS

BIG DATA ANALYTICS &

VISUALISATION

(2)

RWE Series Part III: Big Data Analytics and Visualisation

he capabilities and potential of Big Data have become a major focus of attention in recent years across many industries. There is no shortage of astounding facts about Big Data, for example, 90% of the data in the world today has been created in the last 2 years alone.1-3 Furthermore, there have been frequent

predictions that Big Data will solve persistent issues or lead to previously unimaginable cost reductions.4

However, it can be difficult to relate to and

contextualise these impressive facts and possibilities. First we need to gain a better understanding of the core principles of healthcare analytics and Big Data.

Big Data has been defined as ‘high volume, high velocity, and high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimisation’. This influential ‘3Vs’ model of Big Data was first envisioned in 2001 and has been referred to in the majority of articles since that time.5,6 More recently a fourth ‘V’ has been incorporated which emphasises the need for the

data to be accurate, this is known asdata veracity.7,8

This model provides an excellent starting point for exploring the topic, and as you will see, healthcare data analytics provides the most powerful examples of the characteristics of Big Data.

In this article we propose for the first time, a fifth ‘V’: visualisation. We will highlight the visualisation tools and techniques which are needed to drive meaningful outcomes from Big Data.

T

IT CAN BE DIFFICULT TO RELATE TO AND CONTEXTUALISE THESE IMPRESSIVE FACTS AND POSSIBILITIES

HEALTHCARE DATA ANALYTICS PROVIDES THE MOST POWERFUL EXAMPLES OF THE

CHARACTERISTICS OF BIG DATA

Big Data &

Healthcare

Analytics

V

ISUALISATION

“NEW HEALTH DATA IS GENERATED AND

CAPTURED CONTINUOUSLY”

“VISUALISATION TOOLS AND TECHNIQUES ARE VITAL IN IDENTIFYING

RELATIONSHIPS AND TO CONVEY INFORMATION”

“THE SHEER QUANTITY OF DATA IS THE EPONYMOUS CHARACTERISTIC OF BIG

DATA”

“HEALTHCARE DATA EXISTS IN MANY FORMS AND IS OFTEN

UNSTRUCTURED” “THE VERACITY OF DATA IS ITS

ACCURACY AND IS A KEY PART OF ANY DISCUSSION”

(3)

RWE Series Part III: Big Data Analytics and Visualisation

Volume

The sheer quantity of data that is captured, stored and processed is the eponymous characteristic of Big Data. Compared to other industries, the healthcare services have always generated vast quantities of data. This is through meticulous record keeping, compliance/ regulatory requirements, and comprehensive patient care notes.9 As we move towards universal electronic

health records in primary care and increasing digitisation of hospital data, we now have a remarkable volume of data.10 Recent technological

advancements in data storage capacity is now prepared for this, whilst networking capabilities can handle the movement of large quantities of data. In addition to clinical data, there has been a notable increase in the collection of consumer driven lifestyle data which can provide valuable insights into general health and wellbeing.11

Velocity

New health data is generated and captured continuously and at a rapid pace. There is a constant flow of new data. The health service is moving away from traditional static data such as x-rays films and scripts to more dynamic systems involving multi-dimensional and regular measurements. Real-time data monitoring is associated with critical care units and the

operating theatre but can increasingly be applied to care at home and the community. Technology allows for immediate data analysis in real time from any location and any number of patients. The general consensus is that real-time analytics of high volume data has the potential to revolutionise healthcare.12-14

Variety

Healthcare data exists in many forms including written notes and prescriptions, medical imaging, laboratory, pharmacy, insurance and administrative data in addition to sensor data. Increasingly health information can be found within dynamic web content including social media. Big Data can be

coded and stored in a structured form, for example, in the large datasets, Hospital Episode Statistics (HES) and the Clinical Practice Research Datalink (CPRD). However a significant amount of data exists in unstructured or semi-structured formats and the processing, storage and analysis of data is adapting to accommodate this.15

Veracity

Veracity of data is its accuracy and conformity to facts. Veracity is an essential part of any discussion on data and is particularly important in healthcare, where the implications of inaccurate data can be severe. To attain accurate analyses in Big Data, there needs to be a scaling up in granularity and performance of the architectures, platforms, algorithms, methodologies and tools. Big Data analysis has been compared to financial data in this regard.9 As Big Data in healthcare is measuring

real-world outcomes, it can offer a more accurate representation of real world practice than controlled trials.

Visualisation

Due to its volume, variety and velocity, Big Data analysis and presentation can be difficult. Traditional written, tabular and graphical formats can be inadequate in conveying the findings, and key details can be lost in summarising and aggregating the data. There a number of visualisation tools and solutions which have been proven to be valuable in identifying relationships and conveying information. The two most commonly used for Big Data in healthcare are Cytoscape and Gephi. Cytoscape initially began in bioinformatics (gene processing and molecular networks) whilst Gephi has often been used for consumer and social research. Both of these tools can be adapted to enhance healthcare data analytics as our examples will show. Network graphs consists of nodes (a set of objects representing entities in a dataset) and edges (connections between nodes that provide visual cues as to the degree of connectedness).16

(4)

RWE Series Part III: Big Data Analytics and Visualisation

Visualisation in Cystic Fibrosis

SVMPharma carried out a data visualisation exercise using Gephi to show the multi-disciplinary nature of Cystic Fibrosis (CF) by highlighting its co-morbidities and associated health problems. 15,000 Cystic Fibrosis patients who visited hospital from 2010-2014 were selected by their ICD-10 code (diagnosis code) from the Hospital Episodes Statistics (HES) dataset.

This provided 3,800 ICD-10 codes and over 20,000 different connections between them. The data was entered into Gephi, with each ICD-10 code represented by a node, and each different connection between them via a line. The radial layout was used in Gephi to create the image on the right (Image 5) which includes the entirety of the data.

To take a closer look at the data, the top 100 different connections leading from the CF nodes were identified. This is represented below (Image 8) with both the node size and connecting line representing the strength and volume of the connection, and the nodes positioned and colour-coded according to disease group.

The final visual conveys at a glance, the main co-morbidities of CF patients and associated health issues. This shows the strength of the relationships between CF and its secondary complications and directions the disease is likely to progress.

8. A Closer Look

Nodes Connections (Edges)

1. Data in Tables 2. Data In Gephi

3. Ranking (Colour) 4. Radial Layout

5. Entire Data

6. Top 100 7. Labelling

3,800 ICD-10 CODES AND OVER 20,000 DIFFERENT CONNECTIONS BETWEEN THEM

(5)

RWE Series Part III: Big Data Analytics and Visualisation

The Technology

To cope with Big Data, a number of hardware and software solutions are used and these have replaced the traditional Business Intelligence (BI) tools. The first step involves massively parallel computing and the processing of large datasets across clusters using a Hadoop based program. Specialist tools are available for rapid processing and querying including Spark for structured and semi-structured data, whilst unstructured data can be stored in the document database software MongoDB. Saiku allows users to create, visualize and analyse information in graphical & pivot table formats. These wholesale changes in data storage and processing architecture has been crucial and sets the platform for future innovation and expansion.

Big Data analytics in healthcare is almost ready to fulfil its potential: this is due

to the rapid digitisation of information, the adaptations in technological

architecture and the increasing capabilities for real-time analysis. However, to

maximise its potential we need to fully understand the key characteristics of

Big Data: volume, variety and velocity. A consideration of the veracity of the

data and visualisation techniques make up our ‘5Vs’ of Big Data in healthcare

and provide a useful framework to explore the key issues.

1. ScienceDaily. Big Data, for better or worse: 90% of world's data generated over last two years. 2013.

www.sciencedaily.com/releases/2013/05/130522085217.htm. 2. Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think: Houghton Mifflin Harcourt; 2013.

3. Marr B. Big Data: The Eye-Opening Facts Everyone Should Know. 2014. www.linkedin.com/pulse/20140925030713-64875646-big-data-the-eye-opening-facts-everyone-should-know. 4. Gurnett J. Big data could dramatically cut £80bn NHS chronic care bill. 2014. http://www.theguardian.com/healthcare-network/emc-partner-zone/big-data-nhs-chronic-care.

5. Ward JS, Barker A. Undefined by data: a survey of big data definitions. arXiv preprint arXiv:13095821 2013.

6. Diebold FX. On the Origin (s) and Development of the Term'Big Data'. 2012.

7. Buhl HU, Röglinger M, Moser D-KF, Heidemann J. Big data. Business & Information Systems Engineering 2013; 5(2): 65-9.

8. Hitzler P, Janowicz K. Linked data, big data, and the 4th paradigm. Semantic Web 2013; 4(3): 233-5.

9. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014; 2(1): 1-10.

10. Takian A, Cornford T. NHS information: Revolution or evolution? Health Policy and Technology 2012; 1(4): 193-8. 11. Johnson K, Jimison H, Mandl K. Consumer Health

12. Blount M, Ebling MR, Eklund JM, et al. Real-Time Analysis for Intensive Care: Development and Deployment of the Artemis Analytic System. Engineering in Medicine and Biology Magazine, IEEE 2010; 29(2): 110-8.

13. Feldman B, Martin EM, Skotnes T. Big Data in Healthcare Hype and Hope. October 2012 Dr Bonnie 2012; 360. 14. Berry A, Milosevic Z. Real-Time Analytics for Legacy Data Streams in Health: Monitoring Health Data Quality. Enterprise Distributed Object Computing Conference (EDOC), 2013 17th IEEE International; 2013 9-13 Sept. 2013; 2013. p. 91-100.

15. Sawant N, Shah H. Big Data Storage Patterns. Big Data Application Architecture Q & A: Apress; 2013: 43-56. 16. Cherven K. Network Graph Analysis and Visualization with Gephi: Packt Publishing Ltd; 2013.

Front Cover Image by Calvinius [CC BY-SA 3.0

(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

References

Related documents

3.1.3 Multiple-Clients DB Design: Multiple clients outsource their data to the service provider; the service provider evaluates the tables and extracts the common, uncommon,

In this paper we employed disaggregated bilateral data from Thailand and her five largest trading partners to investigate the short -run and the long-run response of the trade

Figure 1 shows the thickness values of the CdTe films deposited as a function of the different substrate temperatures and deposition times.. From Figure 1 can be appreciated

Unlike most other modern bullion coins, the American Gold Eagle is 22 karat, .9167 pure gold alloyed with silver and copper for superior durability.. Gold Eagles are a true

It is certainly possible that the answer is yes, and this for four reasons: (1) the ancient Lydians considered electrum as a separate metal 91 and would accept the coins as of equal

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

[mp=title, abstract, original title, name of substance word, subject heading word, keyword heading word, protocol supple- mentary concept word, rare disease supplementary concept

ECP Project Management Structure Board of Directors Science Council Industry Council Project Director Deputy Director CTO Integration Manager D ep ar tm en t o f En er g y