Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases

(1)

(2)

2



Introduc)on to uRiKA



Mul)threading



SPARQL Database



uRiKA Appliance



XMT-‐2 Programming



Use Cases

(3)

 

MTA-‐1 – 1998

 

Gallium arsenide: Proof of concept

 

First produc,on implementa,on of latency-‐tolerant mul,threading

 

MTA-‐2 – 2002

 

Transi5on the microprocessor to CMOS

 

Inven,on of scaling memory in a mul,threading system

 

XMT-‐1 – 2008

 

Integrated MTA and main Cray product line XT

 

First hybrid system to support X86 and mul,threading CPUs

 

XMT-‐2– 2011

 

Made system ready for enterprise produc5on

 

Largest shared memory machine in the world

 

First installa,on – CSCS, May 2011

 

YarcData uRIKA – 2012

 

Op5mized for Scalable Graph Analy5cs

(4)

4

 

Uses the same cabinets, boards, scalable interconnect, I/O and storage infrastructure,

user environment, and administra)ve tools…

…just changes the processor

 

Cabinet

 

24 blades per cabinet

 

Ver5cal airﬂow with op5onal liquid assist

 

Compute blades

 

4 Threadstorm processors

 

16-‐64 GB per processor

 

Cray XT service and I/O subsystem

 

PCIe connec5ons to storage and networks

 

Scalable Lustre global ﬁle system

 

Cray XT high-‐speed 3D torus network

 

Cray XT power and RAS systems

(5)

Telecom/Mobile

Life Sciences/Biology

Social Networking

Supply Chain

Healthcare/Medicine

Intelligence/Security

Internet/WWW

Finance

(6)

6

Big Data

•  Structured •  Semi-structured •  Unstructured

In-‐memory

Analy)cs

•

Structured and

Unstructured Data

•

Non-‐par))onable

datasets

•

Complex Queries

(“PaYern Matching”)

•

Example:

YarcData uRiKA

Data Warehouses

BI Tools

•

Structured Data

•

OLAP Cubes

•

Regular Queries,

Known Variables

•

Example: Oracle Exadata

Scale-‐out

Analy)cs

• 

Unstructured Data

• 

Highly Par))onable

Datasets

•

Keyword Search

• 

Example: Hadoop,

column stores

(7)

7

Graphs are hard to Partition

High cost to follow

relationships that

span Cluster Nodes

Graphs are not Predictable

Graphs are highly Dynamic

Network is 100 times

SLOWER than Memory*

Memory is 100 times

SLOWER than Processor*

High cost to follow

multiple competing paths

which cannot be

pre-fetched/cached

High cost to load multiple,

constantly changing

datasets into in-memory

graph models

?

Storage I/O is 1000 times

SLOWER than Memory I/O*

(8)

8

Massively Mul5-‐threaded

128 threads/processor

Large Shared Memory

Up to 512 TB

Highly Scalable I/O

Up to 350 TB/hr

Graphs are hard

to Partition

Graphs are not

Predictable

Graphs are highly

Dynamic

Real-time,

Interactive

Analytics on

Big Data

Graph Problems

(9)



Cray XMT-‐2 Hardware Engine



Originally designed for deep analysis of large datasets



Very large

scalable

shared memory

 

Architecture can support 512TB shared memory

 

Typical systems are 2 TB to 32 TB



Mul5threading

 

Unique highly mul5threaded architecture

 

128 hardware threads per processor

 

Extreme parallelism, hides memory latency



Mul)threaded Graph Database

 

Highly parallel in-‐memory RDF / SPARQL quad store

 

High performance inference engine

 

High performance parallel I/O



Industry Standard Front End

 

Based on Jena open source seman5c DB

 

All standard Linux infrastructure and languages

(10)

1 0

(11)



Rela)ve latency to memory con)nues to increase



Vector processors

amor,ze

memory latency



Cache-‐based microprocessors

reduce

memory latency



Mul5threaded processors

tolerate

memory latency



Mul)threading is most eﬀec)ve when:



Parallelism is abundant



Data locality is scarce



Large graph problems perform well on this architecture



Graph analysis



Seman5c databases

(12)

1 2



Many threads per processor core;

small thread state



Thread-‐level context switch at

every instruc)on cycle

Slide 12 registers program counter ALU

conventional

processor

multithreaded processor

(13)

•

Conven)onal processor

•

Mul)threaded processor

When one or a few threads stall, memory/ network bandwidth become idle

Although some threads stall, others keep issuing local/remote memory requests, keeping most precious resources busy

network

memory

(14)

1 4



To the programmer, a mul)ple processor XMT-‐2

looks like

a single processor,

except that the number of threads is

increased.

(15)

(16)

1 6



RDF triples databases are inherently graphical



Some researchers call seman)c databases “seman)c

graph databases”

Edmonton

Canada

North

America

country

object

730,000

Vancouver

in-country is-type has-population

(17)



Resource Descrip)on Framework (RDF)



Designed to enable seman5c web searching and integra5on of

disparate data sources



W3C standard formats



Every datum represented as subject/predicate/object



Ideally with each of those expressed with a URI



Standard ontologies in some domains



e.g.

, Open Biological and Biomedical Ontologies (OBO)



Examples:

<ncbitax:NCBITaxon_840261> rdf:type owl:Class

<ncbitax:NCBITaxon_195644> rdfs:subClassOf <ncbitax:NCBITaxon_185881>

<ncbitax:NCBITaxon_816681> rdfs:label "Characiformes sp. BOLD:AAG5151"@en

object

subject

(18)

1 8



SPARQL Protocol and RDF Query Language (SPARQL)



Enables matching of graph paderns in the seman5c DB



Reminiscent of SQL

# Lehigh University BenchMark (LUBM) Query 9

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> SELECT ?X, ?Y, ?Z WHERE {?X rdf:type ub:Student . ?Y rdf:type ub:Faculty . ?Z rdf:type ub:Course . ?X ub:advisor ?Y . ?Y ub:teacherOf ?Z . ?X ub:takesCourse ?Z}

PREFIX == shorthand for a URI variables to be returned from

the query

“ﬁnd sets of (X, Y, Z) with a subject X of type Student, a subject Y of type Faculty, and a

subject Z of type Course, where X is an advisee of Y, Y teaches course Z, and X takes

(19)

(20)

2 0

20

Linux

Apache Tomcat/Jena

 

Data Center sees another

Linux server

 

Applica)ons see industry

standard interface



RDF, SPARQL, Java,

Gadgets…

 

Reuse Exis)ng Skillsets

 

Java, OSGI, App Server, SOA,

ESB, Web toolkit…

 

Standard languages

 

All applica5ons and ar5facts

built on uRiKA can be run on

other plalorms

SPARQL

RDF

Java

Gadgets /

Mashups

Graph Analy)c Plagorm

Shared-‐memory, Mul5-‐threaded, Scalable I/O

graph-‐op5mized hardware

Graph Analy)c Database

RDF datastore and SPARQL engine

Graph Analy)c Tools

(21)

SELECT ?p ?x WHERE { (?x type person) (?p sells “DVD”) (?x shops-at ?p) } interface to Web

generate low-level query

API Send query Receive results Parse, interpret query Set up query engine Translate, send results BGP ORDER DISTINCT OPTIONAL database UNION FILTER … SPARQL Ingest Applica)on Jena/Fuseki HTTP Interface SPARQL Engine

(22)

2 2

Import Graph Datasets User/App Visualiza)on Export Analy)c/ Rela)onship Results

Hadoop Other Big

Data Appliances

Existing Analytic Environment(s

) Data

(23)

(24)

2 4



Automa)c Parallelizing C / C++ Compiler

 

Compiler runs on Linux login node

 

The executable runs on the compute nodes

 

Provides automa5c parallelism if proper op5ons are included

 

Paralleliza5on direc5ves are used to provide “hints” to the compiler



Login Nodes run standard SuSe (SLES 11) Linux

 

All usual Linux languages and applica5ons may be used

 

A “hybrid” programming model is possible, and encouraged



Lightweight user communica)on (LUC) interface

 

Use LUC to build a client/server interface between the front-‐end and back-‐end

 

Symmetric; RPC-‐style interface in either direc5on



Lustre global ﬁle system

 

High-‐performance parallel ﬁle system used across Cray line

Slide 24

(25)

• 

Primary responsibility for machine virtualiza)on.

– 

Maps simple programming model (ﬂat memory, inﬁnite processors) to actual hardware.

– 

Performs general management of resources such as processors and hardware streams.

• 

Responsible for thread management

– 

Thread crea5on/destruc5on/blocking/unblocking.

– 

Work scheduling -‐ automa5c intra-‐process load balancing.

• 

Handles hardware traps and OS-‐induced swaps.

• 

Provides a highly-‐op)mized parallel memory allocator.

• 

Provides support for auxiliary func)ons.

– 

Tracing and proﬁling.

(26)

2 6

§

Appren)ce2

•

GUI applica5on for debugging performance problems

•

Consists of one or more reports based on how your program was

compiled and executed

•

Canal Report

•

Feedback from the compiler

•

Insight on how latent parallelism was exploited

•

Informa5on on expected resource u5liza5on and scheduling

•

Tview Report

•

Hardware counter plots

•

Actual performance of your applica5on

•

Run5me trap informa5on for detec5ng hotspots

•

Bprof Report

•

Proﬁle tables in terms of instruc5ons issued and memory

references

(27)

(28)

2 8

The Challenge

  Mul5ple massive datasets describing biological network

graphs in cancer cells from published literature and experimental data, constantly updated

  Un-‐par55onable, densely and irregularly connected graphs

  Medical research churns out massive quan55es of

literature, but most researchers don’t have 5me to read even their own specialty publica5ons

  Over 10,000 genomics publica5ons per month currently

uRiKA Solu)on

  uRiKA holds un-‐par55oned genomic network graph in

memory

  Contrast experimental models and theories with published

results to discover previously unknown rela5onships

  Interac5ve, real 5me access by mul5ple researchers

Confirmation of elevated VEGF levels by tissue microarray:

(29)



The Cancer Genome Atlas (TCGA)

data combined with literature

rela)onships from Medline



Protein domain interac5ons are extracted

from TCGA using Random Forest

classiﬁca5on



A Normalized Medline Distance (NMD)

between proteins is calculated from

Medline 5tles and abstracts



Protein data from Uniprot and Pfam are

also included



1.2 billion entries that generate 6B

triples



New data sets constantly being added-‐

muta5on data, pa5ent data, methyla5on

data, sta5s5cal analysis results and more…



Current database technology does

not have the performance for useful

full-‐scale queries across this data

GRAPH Genes

(TCGA) Literature (Medline)

Protein Families (PFAM) Protein Data

(UniProt)

Muta5on Correla5ons (Sta5s5cal)

Pa5ents (Health Records) Methyla5on

Query

(30)

3 0

Quickly adapt surveillance to new rules and paderns

The Challenge

  Con5nuous stream of possible compliance

events

  New compliance rules require constant

updates of signals and paderns

  Current solu5ons require oﬄine prepara5on

and are based on rigid rules

uRiKA Solu)on

  En5re rela5onship graph in memory

  New Paderns/templates can be iden5ﬁed

and added in real-‐5me

  Graphical interac5ve explora5on of

rela5onships between people, places, things, organiza5ons, communica5ons, etc.

Business Value

  A responsive, ﬂexible event detec5on

plalorm that adapts to new knowledge

(31)

Deploy beder trading strategies with enhanced Market Sensing

The Challenge

  Trading strategies that can react quickly and con5nuously to

market drivers

  Market drivers are news and other events which may

impact current trading outlook, posi5ons

  Enhanced market sensing requires seman5c analysis of

news and a scalable graph-‐based model to implement

uRiKA Solu)on

  Ability to hold en5re ‘market model’ in-‐memory as a

dependency graph

  Ability to ﬁlter news relevant to posi5ons based on

seman5c analysis

  Interac5ve, real-‐5me access from trader worksta5ons

Business Value

  Beder trading strategies that can be based on augmented

(32)

3 2

Discover “similar” Pa5ents to op5mize treatment

The Challenge

  Longitudinal, historical data spanning all events, symptoms,

diagnoses, diseases, treatments, prescrip5ons, etc of 10M pa5ents including gene5cs and family history

  Ad-‐hoc, constantly changing deﬁni5on of “similarity” based

on thousands of parameters

  Interac5ve, real-‐5me response during consulta5on

uRiKA Solu)on

  uRiKA holds en5re rela5onship graph in memory

  Iden5fy “similar” pa5ents based on ad-‐hoc physician

speciﬁed paderns

  Interac5ve, real-‐5me access by en5re physician community

Business Value

  Consistent selec5on of the most eﬀec5ve treatment for

each pa5ent by each doctor every 5me

Blood pressure Prior myocardial infarction

Hypertension Body mass index HDL Anti-hypertension meds

√

…

(33)

(34)

3 4

34



SPARQL by Example tutorial, by Lee Feigenbaum,

hdp://

www.cambridgeseman5cs.com/2008/09/sparql-‐by-‐example/



Search RDF data with SPARQL, by Philip McCarthy

http://www.ibm.com/developerworks/xml/library/j-sparql/



Seman5c Web for the Working Ontologist, by Dean Allemang

and James Hendler, ISBN

978-‐0123859655



Learning SPARQL, by Bob DuCharme, O’Reilly,

ISBN 978-‐1-‐449-‐30659-‐5



RDF:

www.w3.org/

RDF

/



SPARQL:

www.w3.org/TR/rdf-‐

sparql

-‐query/

Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases

Introduc)on to uRiKA

Integrated MTA and main Cray product line XT

Uses the same cabinets, boards, scalable interconnect, I/O and storage infrastructure,

Scalable Lustre global ﬁle system

Unstructured Data

Datasets

Memory is 100 times

128 threads/processor

Very large

Highly parallel in-­‐memory RDF / SPARQL quad store

memory latency

Many threads per processor core;

To the programmer, a mul)ple processor XMT-­‐2

Designed to enable seman5c web searching and integra5on of

<ncbitax:NCBITaxon_195644> rdfs:subClassOf <ncbitax:NCBITaxon_185881>

Linux

All applica5ons and ar5facts

Hadoop Other Big

Existing Analytic Environment(s

All usual Linux languages and applica5ons may be used

High-­‐performance parallel ﬁle system used across Cray line

Handles hardware traps and OS-­‐induced swaps.

compiled and executed

Bprof Report

uRiKA Solu)on

The Cancer Genome Atlas (TCGA)

Protein data from Uniprot and Pfam are

full-­‐scale queries across this data

uRiKA Solu)on

Deploy beder trading strategies with enhanced Market Sensing

uRiKA Solu)on

Discover “similar” Pa5ents to op5mize treatment

uRiKA Solu)on

SPARQL by Example tutorial, by Lee Feigenbaum,

Learning SPARQL, by Bob DuCharme, O’Reilly,

Highly parallel in-‐memory RDF / SPARQL quad store

To the programmer, a mul)ple processor XMT-‐2

High-‐performance parallel ﬁle system used across Cray line

Handles hardware traps and OS-‐induced swaps.

full-‐scale queries across this data