InterSystems Symposia 2014
Big Data, Cloud & Virtualization
Tokyo, 2014
Vik Nagjee – Product Manager, Database Platforms
Variety
Velocity
Volume
What’s Big about {Big} Data?
The 3 V’s…
The {Big} Data Challenge
InterSystems Symposia 2014
What’s the Real {Big} Data Challenge
Volume
Variety
Velocity
The 4
thDimension of Big Data
VALUE
The {Big} Data Journey: A Data Platform for Just-In-Time Action
Big Data Case Study
ESA: The Gaia Mission
InterSystems Symposia 2014
Gaia: Complete, Faint, Accurate
Hipparcos Gaia
Magnitude limit 12 mag 20 mag
Completeness 7.3 – 9.0 mag 20 mag
Bright limit 0 mag 6 mag
Number of objects 120,000 47 million to G = 15 mag
360 million to G = 18 mag 1192 million to G = 20 mag
Effective distance limit 1 kpc 50 kpc
Quasars 1 (3C 273) 500,000
Galaxies None 1,000,000
Accuracy 1 milliarcsec 7 µarcsec at G = 10 mag
26 µarcsec at G = 15 mag 333 µarcsec at G = 20 mag
Photometry 2-colour (B and V) Low-res. spectra to G = 20 mag
Radial velocity None 15 km s-1to G
RVS= 16 mag
Observing Pre-selected Complete and unbiased
One Billion Stars in 3D will provide …
•
in our Galaxy …
– the distance and velocity distributions of all stellar populations
– the spatial and dynamic structure of the disk and halo
– its formation history
– a detailed mapping of the Galactic dark-matter distribution
– a rigorous framework for stellar-structure and evolution theories
– a large-scale survey of extra-solar planets (~7,000)
– a large-scale survey of Solar-system bodies (~250,000)
•
… and beyond
– definitive distance standards out to the LMC/SMC
– rapid reaction alerts for supernovae and burst sources (~6,000)
– quasar detection, redshifts, microlensing structure (~500,000)
– fundamental quantities to unprecedented accuracy: to 2×10-6 (2×10-5 present)
Source: http://www.cosmos.esa.int/web/gaia/presentations
InterSystems Symposia 2014
Source: http://www.cosmos.esa.int/web/gaia/data-processing
Core Processing – Powered by InterSystems Caché
• ~1,200,000,000 stars observed by Gaia• In 5 years, Gaia will observe each star, on average, 80 times:
– (80 x 1,200,000,000) = 96,000,000,000 transits
– 96,000,000,000 / 5 years = 52,316,076 transits / day
• On a nominal day ~50,000,000 transits ~ 285,000 MB data
• On a “heavy” day ~350,000,000 transits ~1,995,000 MB data
Data volumes
• Per day: ~ 285,000 MB = ~280 GB = ~0.28 TB
• First 4 months (COMMISSIONING Period)
– All daily data is kept
– Total growth = 0.28 TB/day x 120 days = ~34 TB
• In 5th month, cleanup occurs. Remaining data = ~3 TB
• 5th month onwards, steady state size = ~3 TB
Data growth patterns
Core Processing – Powered by InterSystems Caché
• One 16CPU, 1.2TB RAM server IDT/FL DB • Storage for ITD/FL DB:
– 1x NetApp FAS3160, 160 SATA Disks, iSCSI
– 16x Internal SSDs
• List price: ~$200,000 (storage + server)
• One 16CPU, 1.2TB RAM server Asynchronous Mirror • Storage for Async:
– 1x NetApp FAS3250 – 35 STATA Disks, NFS interconnect
– 16x Internal SSDs – internal to each server • List price: ~$90,000 (storage + server) • Application Access:
– Java Application(s) across ~20 application servers
– Connecting to Caché via JDBC
– List price: ~$10,000 / server = ~$200,000 for Java application
• HA / DR Configuration:
– No “hot” HA: 95% Uptime SLA guarantee – rebuild of server, or DR
– DR: Caché Database Mirroring
Delightfully Parsimonious Architecture
Mapping the Galaxy for less than $500,000 in hardware
[database-specific = $300,000]
$500,000 / 1 billion stars = $0.0005 per star
Answering the formation history of the galaxy = Priceless!
Data Platform for Just-In-Time Action
One unexpected characteristic we have noticed during commissioning concerns stray light. In our test images, an excess of diffuse illumination is sometimes seen on some of the detectors, repeating in a cycle that relates to Gaia’s spin period of 6 hours.
InterSystems Symposia 2014
InterSystems Symposia 2014
Gaia
Unraveling the chemical and dynamical
history of our Galaxy
Virtualization & Cloud – Intertwined!
The NIST Definition of Cloud Computing
“Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with
minimal management effort or service provider
interaction.”
InterSystems Symposia 2014
The NIST Definition of Cloud Computing
“Cloud computing is a model for enabling ubiquitous,
convenient,
on-demand
network access to a
shared
pool
of configurable computing resources (e.g.,
networks, servers, storage, applications, and services)
that can be
rapidly provisioned
and released with
minimal management
effort or service provider
interaction.”
Source: National Institute of Standards and Technology, Special Publication 800-145
My definition of Cloud Computing
Harnessing advances in
information technology to
accelerate
value
delivery to
My definition of Cloud Computing
Harnessing advances in
information technology to
accelerate
value
delivery to
customers
Types of Cloud Computing
•
Public Cloud
– an infrastructure as a service (IaaS) provider such as Amazon EC2, Rackspace, Azure, etc.
•
Private Cloud
– either provided by an IaaS, or hosted internally at the customer site (using something like Openstack or Cloudstack)
•
Virtualization-based Cloud
– this would be something like a VMware (vCloud) environment, or even a fully virtualized environment
•
Customer SaaS offering
– where a partner has built a solution based on our products and delivers that solution on a SaaS basis
InterSystems Symposia 2014
SaaS
Cloud. The Enabler.
• Deploy Breakthrough Applications in The Cloud
• How?
– Pay-as-you-go
– Virtually *infinite* computing resources
– Elastic
– Provision on-demand
– Stay lean and agile
CAUTION!
InterSystems Symposia 2014
Amazon EC2 SLA
Amazon EC2 => 99.95% monthly uptime
•
~22 minutes downtime / month (min threshold)
•
Finer print: 99.95% to 99% monthly uptime
•
~22 minutes to 7.2 HOURS downtime/month
•
“Service Credit” as compensation
Other considerations?
• Regulatory Compliance• Cost
• Where’s the Data?
• How Secure Is My Data?
Cloud Case Studies
Providing a Cloud-Enabled Data Platform
3M Health Information Systems
• Ensemble ESB in the Cloud• Goals
– To simplify inter-application communication
– To reduce maintenance costs
– To increase scalability
– To improve governance of application access
– To increase the flexibility with which new software applications could be added to the overall system and
– To automate system operations
• Scalable – auto-scale, based on demand
• Elastic – grow, shrink based on demand
• Stateless – no-persistence model
• Automated – single-click, automated deployment
InterSystems Symposia 2014
Ontario Systems
• Receivables Management Software for Third-Party Collection Agencies
• New regulatory burden for Collection Agencies – monitor customer complaints, or else!
• Built a cloud-based Complaint Tracker application
• Built & Deployed the breakthrough application as SaaS offering in less than six weeks – using Caché, Ensemble, DeepSee
• Eased burden on existing customers; gained several new customers
“Building on InterSystems technologies, we went from initial concept to delivering a functional product in just 35 days.”
- Chris Cochran, Product Director, Ontario Systems
Breakthrough Software-as-a-Service (SaaS)
Eventsforce
• End- to-end event planning and management solution
• Modular, flexible SaaS offering
• Extremely scalable model – events from tens to thousands of users!
• Extremely elastic model –
– scale up or down during an event
– add or remove functional modules on a live system
• Breakthrough web-based SaaS offering, including mobile app
Breakthrough Software-as-a-Service (SaaS)
Press Computer Systems (PCS) – Social Knowledge
Breakthrough Real-time Perception Management SaaS
Listen | Understand | Engage
Wrap-Up
• Questions?• You can reach me @ [email protected]