• No results found

Soma: Linked Data Infrastructure

N/A
N/A
Protected

Academic year: 2021

Share "Soma: Linked Data Infrastructure"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Soma:

(2)

What is Soma?

It’s Big Data Candy for the Cloud.

The Soma platform helps Data Scientist to collaborate together to discover and share new facts from large datasets hosted on shared infrastructure.

All this while lowering development & operations bottom line.

(3)

Meet our Customers

Expert

See themselves as “experts” or an authority on a subject. Wants the big picture, likes easy to use specialised applications with great visualisation.

Creative

People who see themselves as Data “artists”. Need to explain the meaning of the data. Good generalists, can code, with a flare for the visual or data narrative.

Engineer

See themselves as “engineers”. Focused on the technical problem of managing data — how to get it, store it, and learn from it. Normally strong software developers with some O/R statistics.

Researcher

See themselves as “scientists”. People with deep academic background in maths, machine learning & modeling complex processes. Reluctant coders.

(4)

Customers we support now

Creative

Need to explain the meaning of the data.

Good generalists, can code, with a flare for the visual or data

narrative.

Engineer

Focused on the technical problem of managing data Normally strong software developers

Researcher

People with deep academic background in science, maths, machine learning

(5)

What we deliver to customers

Creative

Now:

● Gitlab integration ● from gitlab

● Web facing applications

Researcher

Now:

● Discovery early adopters Early September

● Discovery platform rollout

Engineer

Now:

● Big Data Cluster

● Container Management November:

(6)

Fully operational big data station

Right Now

Mesos based Cloud O/S

Cluster of 88 CPUs 295 GB of memoryDistributed Application SchedulingResource Scheduling

Container Management DNS service discover

(7)

Deployment

Gitlab Mesos Cluster Zookeeper Cluster HDFS Cluster Integrated DNS CI servers Docker Registry

(8)
(9)

Gitlab

All applications MUST be in gitlab

Mesos Cluster and Container Manager

Let’s have a look at what is running right now:

(10)

“can mix both batch and real-time processing” “process at batch and

real-time Velocity”

(11)
(12)

Source Control Management Continuous Deployment

Service Monitoring

Always available key datasets

DBPedia

SemanticWeb Dogfood

(13)

1. Have gitlab account

2. Ask Research ops to add Soma Role to your project 3. If you are accepted you will be guided through

“dockerizing” you gitlab project

4. Once accepted, every push to your master branch will be deployed and accessible online through soma.

(14)

Integrated Discovery platform

SOMA Discover - hosted discovery tool based on smarter data project allowing exploration of data and sharing results.

Other internal tools such as Sig.ma, Social Lens, and other projects to follow.

(15)

Goals for Research Ops

Nurture a Data Engineering community at Insight with

supportive experts, shared tools & best practices

Provide a Shared analytics platform for Data Scientists at

Insight (Soma)

Encourage new research and engagements with the wider

(16)

Nurture

Provide a structured approach to managing and

releasing all Engineering IP (Code and Data) at insight

○ Source control (Git) ○ release management ○ Assist in IP management

Provide Quality Circles for Engineering practices

○ 2 Groups - Data Visualisation & Big Data, Workshops to

(17)

Provide

Build big data infrastructure for Insight

○ Soma platform

Support Hadoop ongoing development

○ Hadoop clusters, Dataspace support

Support Ad Hoc projects requiring scale

○ Cancer atlas

Provide “Big Data” Expertise to the Linked Data group

(18)

Problems being met

High cost in research when data scales to “Big Data” [P1]

○ Ad Hoc Maintenance of big data sets is expensive [P2]

○ Development complexity of valuable Big Data jobs is prohibitive

[P3]

The high cost in Operating Big Data infrastructure [P4]

○ Scarcity of hardware and lack of funds for new Hardware [P5] ○ Inability to maintain a core operations team [P7]

(19)

Soma serving our customers

Soma Create - Serves data fresh from the source. Has

queryable large datasets that are both highly available & up-to-date. Has service to mash these up.

Soma Engineer - Provides a Lambda architecture consuming, cleaning, processing and loading the data to the data layer.

Soma Discover - Useful blocks of processing that can connected together using a nice GUI, works with many datastores

Soma Expert - vertical applications solving a real world problem, these apps are built by Insight’s Data Researchers and Data Creatives.

(20)

The 4 kinds of Data Scientist

Expert

See themselves as “experts” or an authority on a subject. Wants the big picture, likes easy to use specialised applications with great visualisation.

Creative

People who see themselves as Data “artists”. Need to explain the meaning of the data. Good generalists, can code, with a flare for the visual or data narrative.

Engineer

See themselves as “engineers”. Focused on the technical problem of managing data — how to get it, store it, and learn from it. Normally strong software developers with some O/R statistics.

Researcher

See themselves as “scientists”. People with deep academic background in maths, machine learning & modeling complex processes. Reluctant coders.

(21)

Goals

Soma to be a complete ecosystem to help researchers deliver “Big Data” distributed applications

Showcase Insight expertise

Standardize best practices for linked data at big data scales Delivers targeted applications & tools

(22)
(23)

Distributed O/S (Better than cloud)

We use Mesos based infrastructure to provide

○ Scheduling Process Execution of Jobs/Applications across the

cluster

○ Resource scheduling of the needed CPU/Memory/Storage for

(24)
(25)

Where we are now

What we have

Soma Engineer - Standard Mesos platform - Provides a Lambda architecture consuming, cleaning,

processing and loading the data to the data layer. Soma Discover - Smarter Data - an interactive

expressive query tool creates data blocks & visualisations

What we need help on

Soma Expert - Pivoty - a medical index built from

standard HCLS datasets and uses a Pivot Browser

Soma Create - The Insight Standard Dataset - a shared

References

Related documents

Next, we find evidence of the “fiscal fatigue” (positive but eventually slowing response of the primary balance to rising debt) during the crisis and post-crisis periods. We also

Differences in the DNA “alphabet” are what make differences in traits (just like a different sequence of letters makes different words, and a different recipe)...

The fact that me-too drugs do not increase price competition substantially can also lead firms to invest in me-too drugs even when they have little expectation of developing a

The following section addresses regulatory reform by analyzing how the defini- tion of universal service obligations (USO) restrict or enable postal operators to

63 Abbildung 22: Diagramme zur Entwicklung der Zellzahlen pro Abstrich und Tier über den gesamten Versuch (logarithmische Auftragung; Kontrollgruppe = grün, Lack C

Rates of return to both long-term debt and equity finances are significantly negative in Indonesia and Thailand; returns to equity in Korea and that to long term debt in Malaysia

For this study, all analyses from the NHANES 2013-2014 survey data were performed using SAS 9.4 (Statistical Analysis System, Cary, NC, USA). The survey data of

To display the rate change information for the hotel on a specific line from an availability and rates display,