2
Introduc)on to uRiKA
Mul)threading
SPARQL Database
uRiKA Appliance
XMT-‐2 Programming
Use Cases
MTA-‐1 – 1998
Gallium arsenide: Proof of concept
First produc,on implementa,on of latency-‐tolerant mul,threading
MTA-‐2 – 2002
Transi5on the microprocessor to CMOS
Inven,on of scaling memory in a mul,threading system
XMT-‐1 – 2008
Integrated MTA and main Cray product line XT
First hybrid system to support X86 and mul,threading CPUs
XMT-‐2– 2011
Made system ready for enterprise produc5on
Largest shared memory machine in the world
First installa,on – CSCS, May 2011
YarcData uRIKA – 2012
Op5mized for Scalable Graph Analy5cs
4
Uses the same cabinets, boards, scalable interconnect, I/O and storage infrastructure,
user environment, and administra)ve tools…
…just changes the processor
Cabinet
24 blades per cabinet
Ver5cal airflow with op5onal liquid assist
Compute blades
4 Threadstorm processors
16-‐64 GB per processor
Cray XT service and I/O subsystem
PCIe connec5ons to storage and networks
Scalable Lustre global file system
Cray XT high-‐speed 3D torus network
Cray XT power and RAS systems
Telecom/Mobile
Life Sciences/Biology
Social Networking
Supply Chain
Healthcare/Medicine
Intelligence/Security
Internet/WWW
Finance
6
Big Data
• Structured • Semi-structured • UnstructuredIn-‐memory
Analy)cs
•
Structured and
Unstructured Data
•
Non-‐par))onable
datasets
•
Complex Queries
(“PaYern Matching”)
•
Example:
YarcData uRiKA
Data Warehouses
BI Tools
•
Structured Data
•
OLAP Cubes
•
Regular Queries,
Known Variables
•
Example: Oracle Exadata
Scale-‐out
Analy)cs
•
Unstructured Data
•
Highly Par))onable
Datasets
•
Keyword Search
•
Example: Hadoop,
column stores
7
Graphs are hard to Partition
High cost to follow
relationships that
span Cluster Nodes
Graphs are not Predictable
Graphs are highly Dynamic
Network is 100 times
SLOWER than Memory*
Memory is 100 times
SLOWER than Processor*
High cost to follow
multiple competing paths
which cannot be
pre-fetched/cached
High cost to load multiple,
constantly changing
datasets into in-memory
graph models
?
Storage I/O is 1000 times
SLOWER than Memory I/O*
8
8
Massively Mul5-‐threaded
128 threads/processor
Large Shared Memory
Up to 512 TB
Highly Scalable I/O
Up to 350 TB/hr
Graphs are hard
to Partition
Graphs are not
Predictable
Graphs are highly
Dynamic
Real-time,
Interactive
Analytics on
Big Data
Graph Problems
Cray XMT-‐2 Hardware Engine
Originally designed for deep analysis of large datasets
Very large
scalable
shared memory
Architecture can support 512TB shared memory
Typical systems are 2 TB to 32 TB
Mul5threading
Unique highly mul5threaded architecture
128 hardware threads per processor
Extreme parallelism, hides memory latency
Mul)threaded Graph Database
Highly parallel in-‐memory RDF / SPARQL quad store
High performance inference engine
High performance parallel I/O
Industry Standard Front End
Based on Jena open source seman5c DB
All standard Linux infrastructure and languages
1 0
Rela)ve latency to memory con)nues to increase
Vector processors
amor,ze
memory latency
Cache-‐based microprocessors
reduce
memory latency
Mul5threaded processors
tolerate
memory latency
Mul)threading is most effec)ve when:
Parallelism is abundant
Data locality is scarce
Large graph problems perform well on this architecture
Graph analysis
Seman5c databases
1 2
Many threads per processor core;
small thread state
Thread-‐level context switch at
every instruc)on cycle
Slide 12 registers program counter ALU
conventional
processor
multithreaded processor
•
Conven)onal processor
•
Mul)threaded processor
When one or a few threads stall, memory/ network bandwidth become idle
Although some threads stall, others keep issuing local/remote memory requests, keeping most precious resources busy
network
memory
memory
1 4
To the programmer, a mul)ple processor XMT-‐2
looks like
a single processor,
except that the number of threads is
increased.
1 6
RDF triples databases are inherently graphical
Some researchers call seman)c databases “seman)c
graph databases”
Edmonton
Canada
North
America
country
object
730,000
Vancouver
in-country is-type has-population
Resource Descrip)on Framework (RDF)
Designed to enable seman5c web searching and integra5on of
disparate data sources
W3C standard formats
Every datum represented as subject/predicate/object
Ideally with each of those expressed with a URI
Standard ontologies in some domains
e.g.
, Open Biological and Biomedical Ontologies (OBO)
Examples:
<ncbitax:NCBITaxon_840261> rdf:type owl:Class
<ncbitax:NCBITaxon_195644> rdfs:subClassOf <ncbitax:NCBITaxon_185881>
<ncbitax:NCBITaxon_816681> rdfs:label "Characiformes sp. BOLD:AAG5151"@en
object
subject
1 8
SPARQL Protocol and RDF Query Language (SPARQL)
Enables matching of graph paderns in the seman5c DB
Reminiscent of SQL
# Lehigh University BenchMark (LUBM) Query 9
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#> SELECT ?X, ?Y, ?Z WHERE {?X rdf:type ub:Student . ?Y rdf:type ub:Faculty . ?Z rdf:type ub:Course . ?X ub:advisor ?Y . ?Y ub:teacherOf ?Z . ?X ub:takesCourse ?Z}
PREFIX == shorthand for a URI variables to be returned from
the query
“find sets of (X, Y, Z) with a subject X of type Student, a subject Y of type Faculty, and a
subject Z of type Course, where X is an advisee of Y, Y teaches course Z, and X takes
2 0
20
Linux
Apache Tomcat/Jena
Data Center sees another
Linux server
Applica)ons see industry
standard interface
RDF, SPARQL, Java,
Gadgets…
Reuse Exis)ng Skillsets
Java, OSGI, App Server, SOA,
ESB, Web toolkit…
Standard languages
All applica5ons and ar5facts
built on uRiKA can be run on
other plalorms
SPARQL
RDF
Java
Gadgets /
Mashups
Graph Analy)c Plagorm
Shared-‐memory, Mul5-‐threaded, Scalable I/O
graph-‐op5mized hardware
Graph Analy)c Database
RDF datastore and SPARQL engine
Graph Analy)c Tools
SELECT ?p ?x WHERE { (?x type person) (?p sells “DVD”) (?x shops-at ?p) } interface to Web
generate low-level query
API Send query Receive results Parse, interpret query Set up query engine Translate, send results BGP ORDER DISTINCT OPTIONAL database UNION FILTER … SPARQL Ingest Applica)on Jena/Fuseki HTTP Interface SPARQL Engine
2 2
Import Graph Datasets User/App Visualiza)on Export Analy)c/ Rela)onship Results
Hadoop Other Big
Data Appliances
Existing Analytic Environment(s
) Data2 4
Automa)c Parallelizing C / C++ Compiler
Compiler runs on Linux login node
The executable runs on the compute nodes
Provides automa5c parallelism if proper op5ons are included
Paralleliza5on direc5ves are used to provide “hints” to the compiler
Login Nodes run standard SuSe (SLES 11) Linux
All usual Linux languages and applica5ons may be used
A “hybrid” programming model is possible, and encouraged
Lightweight user communica)on (LUC) interface
Use LUC to build a client/server interface between the front-‐end and back-‐end
Symmetric; RPC-‐style interface in either direc5on
Lustre global file system
High-‐performance parallel file system used across Cray line
Slide 24
•
Primary responsibility for machine virtualiza)on.
–
Maps simple programming model (flat memory, infinite processors) to actual hardware.
–Performs general management of resources such as processors and hardware streams.
•
Responsible for thread management
–
Thread crea5on/destruc5on/blocking/unblocking.
–
Work scheduling -‐ automa5c intra-‐process load balancing.
•
Handles hardware traps and OS-‐induced swaps.
•
Provides a highly-‐op)mized parallel memory allocator.
•
Provides support for auxiliary func)ons.
–Tracing and profiling.
2 6
§
Appren)ce2
•
GUI applica5on for debugging performance problems
•
Consists of one or more reports based on how your program was
compiled and executed
•
Canal Report
•
Feedback from the compiler
•
Insight on how latent parallelism was exploited
•
Informa5on on expected resource u5liza5on and scheduling
•
Tview Report
•
Hardware counter plots
•
Actual performance of your applica5on
•
Run5me trap informa5on for detec5ng hotspots
•
Bprof Report
•
Profile tables in terms of instruc5ons issued and memory
references
2 8
The Challenge
Mul5ple massive datasets describing biological network
graphs in cancer cells from published literature and experimental data, constantly updated
Un-‐par55onable, densely and irregularly connected graphs
Medical research churns out massive quan55es of
literature, but most researchers don’t have 5me to read even their own specialty publica5ons
Over 10,000 genomics publica5ons per month currently
uRiKA Solu)on
uRiKA holds un-‐par55oned genomic network graph in
memory
Contrast experimental models and theories with published
results to discover previously unknown rela5onships
Interac5ve, real 5me access by mul5ple researchers
Confirmation of elevated VEGF levels by tissue microarray:
The Cancer Genome Atlas (TCGA)
data combined with literature
rela)onships from Medline
Protein domain interac5ons are extracted
from TCGA using Random Forest
classifica5on
A Normalized Medline Distance (NMD)
between proteins is calculated from
Medline 5tles and abstracts
Protein data from Uniprot and Pfam are
also included
1.2 billion entries that generate 6B
triples
New data sets constantly being added-‐
muta5on data, pa5ent data, methyla5on
data, sta5s5cal analysis results and more…
Current database technology does
not have the performance for useful
full-‐scale queries across this data
GRAPH Genes
(TCGA) Literature (Medline)
Protein Families (PFAM) Protein Data
(UniProt)
Muta5on Correla5ons (Sta5s5cal)
Pa5ents (Health Records) Methyla5on
Query
3 0
Quickly adapt surveillance to new rules and paderns
The Challenge
Con5nuous stream of possible compliance
events
New compliance rules require constant
updates of signals and paderns
Current solu5ons require offline prepara5on
and are based on rigid rules
uRiKA Solu)on
En5re rela5onship graph in memory
New Paderns/templates can be iden5fied
and added in real-‐5me
Graphical interac5ve explora5on of
rela5onships between people, places, things, organiza5ons, communica5ons, etc.
Business Value
A responsive, flexible event detec5on
plalorm that adapts to new knowledge
Deploy beder trading strategies with enhanced Market Sensing
The Challenge
Trading strategies that can react quickly and con5nuously to
market drivers
Market drivers are news and other events which may
impact current trading outlook, posi5ons
Enhanced market sensing requires seman5c analysis of
news and a scalable graph-‐based model to implement
uRiKA Solu)on
Ability to hold en5re ‘market model’ in-‐memory as a
dependency graph
Ability to filter news relevant to posi5ons based on
seman5c analysis
Interac5ve, real-‐5me access from trader worksta5ons
Business Value
Beder trading strategies that can be based on augmented
3 2
Discover “similar” Pa5ents to op5mize treatment
The Challenge
Longitudinal, historical data spanning all events, symptoms,
diagnoses, diseases, treatments, prescrip5ons, etc of 10M pa5ents including gene5cs and family history
Ad-‐hoc, constantly changing defini5on of “similarity” based
on thousands of parameters
Interac5ve, real-‐5me response during consulta5on
uRiKA Solu)on
uRiKA holds en5re rela5onship graph in memory
Iden5fy “similar” pa5ents based on ad-‐hoc physician
specified paderns
Interac5ve, real-‐5me access by en5re physician community
Business Value
Consistent selec5on of the most effec5ve treatment for
each pa5ent by each doctor every 5me
Blood pressure Prior myocardial infarction
Hypertension Body mass index HDL Anti-hypertension meds
√
√
√
√
…3 4
34
SPARQL by Example tutorial, by Lee Feigenbaum,
hdp://
www.cambridgeseman5cs.com/2008/09/sparql-‐by-‐example/
Search RDF data with SPARQL, by Philip McCarthy
http://www.ibm.com/developerworks/xml/library/j-sparql/
Seman5c Web for the Working Ontologist, by Dean Allemang
and James Hendler, ISBN
978-‐0123859655
Learning SPARQL, by Bob DuCharme, O’Reilly,
ISBN 978-‐1-‐449-‐30659-‐5
RDF:
www.w3.org/
RDF
/
SPARQL:
www.w3.org/TR/rdf-‐
sparql
-‐query/