Cloud Computing
and Big Data
Karl Benedict
Earth Data Analysis Center, University Libraries, Department of Geography
University of New Mexico
[email protected]
E D
An Architecture
Designed for Scalability
Definitions of
Cloud Computing
and
Big Data
Development of an extensible
Services
Oriented Architecture
for
Research Data
management, discovery & access
Use of cloud-compatible
software
components
and
Cloud Computing?
Mell P & Grance T (2011) The NIST Definition of Cloud Computing - Recommendations of the National Institute of Standards and Technology. (National Institute of Standards and Technology, Computer Science Division, Information Technology Laboratory, Gaithersburg, MD), p 7.
On-demand self-service
Broad network Access
Resource Pooling
Rapid Elasticity
Cloud Computing?
Mell P & Grance T (2011) The NIST Definition of Cloud Computing - Recommendations of the National Institute of Standards and Technology. (National Institute of Standards and Technology, Computer Science Division, Information Technology Laboratory, Gaithersburg, MD), p 7.
On-demand self-service
Broad network Access
Resource Pooling
Rapid Elasticity
Measured Service
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Cloud Computing?
Mell P & Grance T (2011) The NIST Definition of Cloud Computing - Recommendations of the National Institute of Standards and Technology. (National Institute of Standards and Technology, Computer Science Division, Information Technology Laboratory, Gaithersburg, MD), p 7.
On-demand self-service
Broad network Access
Resource Pooling
Rapid Elasticity
Measured Service
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Cloud Computing?
Mell P & Grance T (2011) The NIST Definition of Cloud Computing - Recommendations of the National Institute of Standards and Technology. (National Institute of Standards and Technology, Computer Science Division, Information Technology Laboratory, Gaithersburg, MD), p 7.
On-demand self-service
Broad network Access
Resource Pooling
Rapid Elasticity
Measured Service
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Private Cloud
Community Cloud
Public Cloud
Hybrid Cloud
Future Characteristics and Capabilities
Cloud Computing?
Current Characteristics and Capabilities
On-demand self-service
Broad network Access
Resource Pooling
Software as a Service (SaaS)
Private Cloud
Rapid Elasticity
Measured Service
Hybrid Cloud
Big Data
Many Problems and Solutions
*
http://www.opentracker.net/article/25-definitions-big-data
NASA - Goddard Space Flight Center
Big Data
Sample Solutions
Horizontal vs. vertical scaling
Unstructured and semi-structured data
models
Parallelism
Linked analytics
Machine learning
Big Data
An Aligned Definition
“An easily scalable
system of
unstructured data
with accompanying
tools that can
efficiently pull
structured datasets.”
*
Definition provided by John Denver as a comment on the FCW Blog Post of 4/15/2013 Entitled “Sketching the big picture on big data”
EDAC’s Data Challenge
A Snapshot
EDAC’s Data Challenge
A Snapshot
RGIS%
•
120,031%datasets%
•
12,171%files%
•
11,808%vectors%
•
96,052%rasters%
NM%EPSCoR%
•
281,315%datasets%
•
9,054%files%
•
169,981%vectors%
•
102,280%rasters%
10.5TB%on%disk%
140GB%MongoDB%(22GB%Postgres)%
114%million%documents%
1.115%billion%data%points%
1.5%million%discrete,%downloadable%data%objects%
EDAC’s Data Challenge
A Snapshot
RGIS% • 120,031%datasets% • 12,171%files% • 11,808%vectors% • 96,052%rasters% NM%EPSCoR% • 281,315%datasets% • 9,054%files% • 169,981%vectors% • 102,280%rasters% 10.5TB%on%disk% 140GB%MongoDB%(22GB%Postgres)% 114%million%documents% 1.115%billion%data%points% 1.5%million%discrete,%downloadable%data%objects%Metadata Vectors Rasters
Files
Services
FGDC
SHP
GeoTIFF
ZIP
WMS
FGDC-RSE
KML
IMG
HTML
WFS
ISO
19115-2 /
19139
GML
SID
WCS
ISO 19119
GeoJSON
ECW
DOC/DOCX
ISO 19110
JSON
DEM
GZ
CSV
ASCII
XLS/XLSX
XLS
PPT/PPTX
EDAC’s Data Challenge
Our Solution
•
PostgreSQL/PostGIS..
•
Dataset.metadata.
•
Spa4al.data.
•
MongoDB..
•
Vector.a;ribute.data.
•
File.system.
•
Rasters.
•
Compressed.datasets.
•
Dataset.cache.
HTTP/HTTPS. REST.Services.(loadIbalanced.applica4on.cluster). OGC.Services. WMS. WFS. WCS. Search. Download.Data. Stream.Data. Admin. DataONE.API. CUAHSI.API.
MongoDB. Sharded.replica. set. PostgreSQL/ PostGIS. (loadIbalanced.PGI Pool). OAIIPMH/OGCICSW/z39.50/HTTP. Geoportal.Server. Geoportal.Server. WAF. Cl ie nt. Ap pl ic a4 on s. Se rv ic es. D ata. Man ag em en t. File.System. Metadata.
Cloud Computing &
Big Data? (Now)
•
PostgreSQL/PostGIS..
•
Dataset.metadata.
•
Spa4al.data.
•
MongoDB..
•
Vector.a;ribute.data.
•
File.system.
•
Rasters.
•
Compressed.datasets.
•
Dataset.cache.
HTTP/HTTPS. REST.Services.(loadIbalanced.applica4on.cluster). OGC.Services. WMS. WFS. WCS. Search. Download.Data. Stream.Data. Admin. DataONE.API. CUAHSI.API.
MongoDB. Sharded.replica. set. PostgreSQL/ PostGIS. (loadIbalanced.PGI Pool). OAIIPMH/OGCICSW/z39.50/HTTP. Geoportal.Server. Geoportal.Server. WAF. Cl ie nt. Ap pl ic a4 on s. Se rv ic es. D ata. Man ag em en t. File.System. Metadata.
Cloud Computing &
Big Data? (Now)
•
PostgreSQL/PostGIS..
•
Dataset.metadata.
•
Spa4al.data.
•
MongoDB..
•
Vector.a;ribute.data.
•
File.system.
•
Rasters.
•
Compressed.datasets.
•
Dataset.cache.
HTTP/HTTPS. REST.Services.(loadIbalanced.applica4on.cluster). OGC.Services. WMS. WFS. WCS. Search. Download.Data. Stream.Data. Admin. DataONE.API. CUAHSI.API.
MongoDB. Sharded.replica. set. PostgreSQL/ PostGIS. (loadIbalanced.PGI Pool). OAIIPMH/OGCICSW/z39.50/HTTP. Geoportal.Server. Geoportal.Server. WAF. Cl ie nt. Ap pl ic a4 on s. Se rv ic es. D ata. Man ag em en t. File.System. Metadata.
Cloud Computing &
Big Data? (Future)
•
PostgreSQL/PostGIS..
•
Dataset.metadata.
•
Spa4al.data.
•
MongoDB..
•
Vector.a;ribute.data.
•
File.system.
•
Rasters.
•
Compressed.datasets.
•
Dataset.cache.
HTTP/HTTPS. REST.Services.(loadIbalanced.applica4on.cluster). OGC.Services. WMS. WFS. WCS. Search. Download.Data. Stream.Data. Admin. DataONE.API. CUAHSI.API.
MongoDB. Sharded.replica. set. PostgreSQL/ PostGIS. (loadIbalanced.PGI Pool). OAIIPMH/OGCICSW/z39.50/HTTP. Geoportal.Server. Geoportal.Server. WAF. Cl ie nt. Ap pl ic a4 on s. Se rv ic es. D ata. Man ag em en t. File.System. Metadata.