• No results found

Graph Databases What makes them Different?

N/A
N/A
Protected

Academic year: 2021

Share "Graph Databases What makes them Different?"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

www.objectivity.com

Graph Databases

What makes them Different ?

Darren Wood

(2)

NoSQL  –  Data  Specialists  

•  Everyone specializes

– Doctors, Lawyers, Bankers, Developers 

•  Why was data so normalized for so long !

•  NoSQL is all about the data specialist

•  Specializing in…

– Distribution / deployment

– Physical data storage

– Logical data model

– Query mechanism

(3)

Polyglot  NoSQL  Architectures  

Distributed  Data   Processing   Pla;orm   Document   Graph   Database   RDBMS  

ParAAoned  Distributed  DB  (oCen  Document  /  KV)  

Users  

Ap pl ic aA on s   Ex te rn al  /  L eg ac y  D ata   Transf or m aA on  \  MD M   Bu si ness  

(4)

The Physical Data Model

•  Becoming a relationship specialist…

MeeAngs  

P1   P2   Place   Time   Alice   Bob   Denver   5-­‐27-­‐10  

Calls  

From   To   Time   DuraAon   Bob   Carlos   13:20   25   Bob   Charlie   17:10   15  

Payments  

From   To   Date   Amount   Carlos   Charlie   5-­‐12-­‐10   100000   Met   5-­‐27-­‐10   Alice   Called   13:20   Bob   Paid   100000   Carlos   Charlie   Called   17:10  

(5)
(6)

Scaling  Writes  

• 

Big/Fast  data  demands  write  performance  

• 

Most  NoSQL  soluAons  allow  you  to  scale  

writes  by…  

– 

ParAAoning  the  data  

– 

Understanding  your  consistency  requirements  

(7)

Scaling Graph Writes

ACID Transactions

InfiniteGraph  

ObjecAvity/DB  Persistence  Layer   App-­‐1  

(Ingest  V1)   (Ingest  VApp-­‐2  2)   (Ingest  VApp-­‐3  3)  

V1   V2   V3  

App-­‐1  

(E1  2{  V1V2})   (E23App-­‐2  {  V2V3})   App-­‐3  

(8)

High Performance Edge Ingest

IG  Core/API   C1   C2   C3   E12   E23   Tar ge t  C on tai ne rs   Pip elin e  C on tai ne rs   E(1-­‐>2)   E(3-­‐>1)   E(2-­‐>3)   E(2-­‐>1)   E(2-­‐>3)   E(3-­‐>1)   E(1-­‐>2)   E(3-­‐>2)   E(1-­‐>2)   E(2-­‐>3)   E(3-­‐>1)   E(2-­‐>1)   E(2-­‐>3)   E(3-­‐>1)   E(3-­‐>2)   E(1-­‐>2)   Pipeline   Agent  

(9)

Trade offs

•  Excellent for efficient use of page cache

•  Able to maintain full base data consistency

•  Achieves highest ingest rate in distributed

environments

•  Almost always has highest “perceived” rate

•  Trading Off :

•  Eventual consistency in graph

•  Updates are still atomic, isolated and durable but phased

•  External agent performs graph building

(10)

1  client   2  clients   4  clients   8  clients   0   50000   100000   150000   200000   250000   300000   350000   400000   450000   500000   1   2   4   Node s  and  E dg es  pe r  se cond   1  client   2  clients   4  clients   8  clients  

Result…

8  Hosts   4  Hosts   2  Hosts   Single  Host  

(11)

Distributed  API   ApplicaAon(s)  

ParAAon  1   ParAAon  2   ParAAon  3   ParAAon  ...n  

Processor   Processor   Processor   Processor  

Scaling Reads and Query

(12)

Distributed  API   ApplicaAon(s)  

ParAAon  1   ParAAon  2   ParAAon  3   ParAAon  ...n  

Processor   Processor   Processor   Processor  

(13)

Optimizing Distributed Navigation

•  Pregel Clones

–  Message passing for each hop, messages processed at the data host for the

target vertex

–  High concurrency, no data leaves its physical host

–  Message marshalling and transport is expensive

•  Distributed Caching Models

–  Essentially trying to cache graph in memory over multiple hosts

–  Requires too much memory for large graphs

–  Issues with cache consistency

(14)

Optimizing Distributed Navigation

•  InfiniteGraph… both

– Detect local hops and perform in memory traversal

– Intelligently cache remote data when accessed frequently

– Route tasks to other hosts when it is optimal

Processor   Distributed  API   ParAAon  1   ParAAon  2   Processor   ApplicaAon   P(A,B,C,D)  

(15)

Schema  –  It’s  not  your  enemy  !    

(at  least  not  all  the  Ame...)  

•  Schema vs Schema-less

– Database religion

– No time for a full debate here

– InfiniteGraph supports schema, but does not restrict

connection types between vertices

(16)

GraphViews    

Leveraging  Schema  in  the  Graph

 

PaAent   PrescripAon   Drug   Ingredient   Outcome   Complaint   Visit   Allergy   Physician  

(17)

Schema Enables Views

•  GraphViews are extremely powerful

•  Allow Big Data to appear small !

•  Connection inference can lead to exponential gains

in query performance

•  Views are reusable between queries

•  Built into the native kernel

(18)

Facility  Data  Page(s)   PaAent  Data  Page(s)  

Advanced Configured Placement

•  Physically co-locate “closely related” data

•  Driven through a declarative placement model

•  Dramatically speeds “local” reads

Mr   CiAzen   Visit   Visit   Dr   Jones   San   Jose   Facility   Dr   Smith   Primary    Physician   Has   Has   With   At   Located   Located  

Facility  Data  Page(s)  

Dr   Blake   Sunny-­‐ vale   Dr   Quinn   Located   Located   With   At  

(19)

Why

Infinite

Graph

?

•  Objectivity/DB is a proven foundation

– Building distributed databases since 1993

– A complete database management system

•  Concurrency, transactions, cache, schema, query, indexing

•  It’s a Graph Specialist !

– Simple but powerful API tailored for navigation of data

– Easy to configure distribution model

(20)

Zone  2   Zone  1  

HostA  

Fully Distributed Data Model

IG  Core/API  

Distributed  Object  and  RelaAonship  Persistence  Layer   Customizable  Placement  

HostB   HostC   HostX  

(21)

InfiniteGraph  is  a  Complete  Database  

• 

InfiniteGraph  helps  manage  the  things  you  don’t  want  to  do,  but  

want  to  have  done:  

– 

Concurrency  

•  TransacAons  (commit/rollback)  

•  Controlled  mulA-­‐user  reading  during  updates  

– 

Schema  Control  

•  Build  complex  data  structures,  make  changes  easily  and  migrate  exisAng  data  

– 

Distribu9on  

•  Sharing  large  amounts  of  distributed  data  between  distributed  processes  

– 

Indexes  

•  Choose  built-­‐in  key-­‐value,  b-­‐tree  or  other  indexes  

– 

Cache  

(22)

Super Simple API

Person alice = new Person(“Alice”); helloGraphDB.addVertex( alice ); Person bob = new Person(“Bob”); helloGraphDB.addVertex( bob );

Person carlos = new Person(“Carlos”); helloGraphDB.addVertex( carlos );

Person charlie = new Person(“Charlie”); helloGraphDB.addVertex( charlie );

(23)

Adding Edges

MyEdgeType edge = new MyEdgeType();

vertexA.addEdge ( edge, vertexB, EdgeKind.??? );

Meeting denverMeeting = new Meeting("Denver", "5-27-10");

alice.addEdge(denverMeeting, bob, EdgeKind.BIDIRECTIONAL);

Call bobCallToCarlos = new Call(getRandomJulyTime());

bob.addEdge(bobCallToCarlos, carlos, EdgeKind.BIDIRECTIONAL);

Payment payment = new Payment(10000.00);

carlos.addEdge(payment, charlie, EdgeKind.BIDIRECTIONAL);

Call bobCallToCharlie = new Call(getRandomJulyTime());

(24)
(25)

Graph Traversal (Navigation) Queries

•  Use an instance of the Navigator class to perform a navigation

query.

•  A navigation instance is highly customizable, but is comprised of

the following basic parts:

–  Origin : The vertex from which to begin

–  Guide strategy

•  Guide.Strategy.SIMPLE_BREADTH_FIRST

•  Guide.Strategy.SIMPLE_DEPTH_FIRST

–  Graph Views : Powerful filtering and

–  Qualifiers

•  Qualifying valid intermediate paths and results

–  Handlers

(26)
(27)
(28)

Pathfinding

– Intelligence, Police, Counter Terrorism

– Financial Transactions and Fraud

– Discover Paths Between Entities

•  One to One, Many to Many

– Constrain by

•  Edge and Vertex Type

•  Edge and Vertex Attributes (including temporal)

•  Required Sequence of Events

(29)

Graph Analysis (Algorithms)

Bob   Sam   Fred   Degree  Centrality   Closeness  and   Betweeness  Centrality  

(30)

Graph Analysis (Algorithms)

•  Social Networks

– Most connected participants

– Influencers

– Important Syndicates or Sub-networks

•  Central figures in crime organisations

•  Business Intelligence

– Discovering Knowledge Assets

– Complex Big Data Analytics

(31)

Graph Analysis (Patterns)

•  Crime (again)

– Recognize common patterns of activity

– Complex chains of interaction

•  Security

– Recognize attack/threat patterns

– Auditing / log analytics

•  Targeting Advertising

(32)

Many Many More ….

•  Spatial data

•  Financial Services

•  Defense / Situational

Awareness

•  Sciences

•  Health Care

•  Genealogy

•  Logistics

•  Tracking

•  PLM

References

Related documents