In-Memory BigData. Summer 2012, Technology Overview

(1)

In-Memory

BigData

(2)

Company Vision

>

5 years in production

>

100s of customers

>

Starts every 10 secs worldwide

>

Over 10,000,000 starts globally

>

Unique in-memory compute + data grid technology

(3)

In-Memory Processing Facts

>

64-bit CPUs can address 16 exabytes

>

Disk up to

10

7

times

slower than RAM

>

RAM prices drop 30% every 18 months

> 1GB costs < $1

> 1TB RAM & 48 cores cluster ~ $40K

>

Multicore CPUs ideal for in-memory parallelization

>

Speed matters

> Citi: 100ms == $1M

> Google: 500ms == 20% traffic drop

In-memory will have an industry impact comparable to

web and cloud.

(4)

GridGain 4:

Three Editions

>

Different markets, customers,

messages, needs:

>

“Compute Grid”

Edition

>

“Data Grid”

Edition

(5)

GridGain 4:

In A Glance

>

Scalable In-Memory Data Platform

>

Compute Grid + In-Memory Data Grid

Real Time & Streaming MapReduce, CEP

>

TBs of data and 1000s of nodes

Typical 10s of TBs and 100s of nodes

>

In-Memory Speed, Database Reliability

>

Native: Java, Scala and Groovy DSLs

>

Clients: C++, .NET, iOS, Android, PHP, REST

>

Distributed in-memory object store

(6)

GridGain 4:

New Features

1. In-Memory Data Grid

2. In-Memory Compute Grid

3. Streaming MapReduce

4. Clustering

5. Messaging

6. Advanced Security

7. DevOps GUI Console

8. SPI Architecture

9. Zero Deployment

10. Native Client APIs

11. Java, Scala, Groovy

12. Advanced Load Balancing

13. Pluggable Fault Tolerance

(7)

Clustering

GridGain 4

> Pluggable cluster topology management & various consistency strategies > Pluggable automatic discovery on LAN, WAN, and AWS

> Pluggable “split-brain” cluster segmentation resolution

> Unicast, broadcast, and Actor-based cluster-wide message exchange > Pluggable event storage and propagation

> Versioning

> Support for complex leader election algorithms > On-demand and direct deployment

> Support for virtual clusters and grouping > Integration with Hadoop ZooKeeper

Sophisticated clustering capabilities for JVM with ability to connect and manage a heterogenous set of computing devices

(8)

Advanced Security

GridGain 4

>

Cluster Security

>

Client Security

>

JAAS-based

>

Authentication

>

Secure Session

(9)

SPI Architecture

GridGain 4

1. Checkpoint SPI

2. Collision SPI

3. Authentication SPI

4. Secure Session SPI

5. Indexing SPI

6. Load Balancing SPI

7. Communication SPI

8. Deployment SPI

9. Swap Space SPI

10. Metrics SPI

11. Discovery SPI

12. Failover SPI

13. Topology SPI

14. Event Storage SPI

Fourteen SPIs provide plug-and-play capabilities to replace and customize every significant subsystem of GridGain runtime.

(10)

Native Clients

GridGain 4

>

Java (EE & Android)

>

C++

>

.NET C#

>

Objective C

>

REST

(11)

Java, Scala, Groovy

GridGain 4

>

Java 6

>

Scala 2.9

>

Groovy 1.8 and Groovy++

>

Scalar

- Scala DSL for GridGain

(12)

Hadoop Integration

GridGain 4

>

HBase cache store

>

ZooKeeper discovery integration

>

Distributed bulk data loader

>

Hadoop-compatible Distributed File System

(13)

(14)

Success Stories

>

Trading Systems

Handle large volumes of transactions

>

Real-time Risk Analysis

Analysis of trading positions & risk

>

Online Gaming

Online real-time backbone for gaming

>

Actuarial Analysis

Insurance Rating and Modeling

>

Geo Mapping

Real-time geographical route and traffic information

>

Bioinformatics

(15)

(16)

In-Memory Data Grid

Features 1

> Java-based distributed in-memory store > Zero deployment for data

> Local, full replicable and partitioned cache types

> Pluggable expiration policies (LRU, LFU, FIFO, time based and random) > Read-through and write through

> Pluggable cache store (SQL, ERP, Hadoop)

> Synchronous & asynchronous cache operations > MVCC-based concurrency

> Pluggable data overflow storage

(17)

In-Memory Data Grid

Features 2

>

JTA/JTS integration

>

Master/master data replication

>

Master/master data invalidation

>

Replication/invalidation in async/sync modes

>

Write-behind cache store support

>

Concurrent/Delayed transactional preloading

>

Affinity routing with compute grid

>

Partitioned cache with active backups (replicas)

>

Structures and unstructured data

(18)

In-Memory Data Grid

Features 3

> Customizable/pluggable data indexing

> JDBC driver for in-memory data

> Co-located cache mode

> BigMemory (off-heap allocation) support

> Tiered storage with on-heap, off-heap, swap, SQL and Hadoop

> Distributed in-memory query support

> SQL-based affinity co-located queries

> Lucene-based text affinity co-located queries

> H2-based text affinity co-located queries

> Predicate-based full scan queries

> Support for pagination

(19)

In-Memory Compute Grid

Features 1

>

Direct API for map/split and reduce/aggregate

>

Pluggable failover management

>

Pluggable topology resolution

>

Pluggable collision resolution

>

Distributed task session

>

Distributed continuations and recursive split

>

Streaming MapReduce

>

Complex Event Processing (CEP)

>

Node-local cache

(20)

In-Memory Compute Grid

Features 2

>

Direct closure distribution in Java, Scala and Groovy

>

Cron-based task scheduling

>

Direct redundant mapping support

>

Zero deployment with P2P on-demand distributed class loading

>

Partial asynchronous reduction

>

Weighted and dynamic adaptive mapping

>

State checkpoints for long running tasks

>

Early and late load balancing

(21)

GridGain Systems

1065 East Hillsdale Blvd., Suite 230 Foster City, CA 94404

Web: www.gridgain.com

Email: [email protected] Twitter: @gridgain