Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Trends and Research Opportunities in
Spatial Big Data Analytics and Cloud Computing
NCSU GeoSpatial Forum
Siva Ravada
Senior Director of Development Oracle Spatial and MapViewer
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 2
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Evolving Technology Platforms
• Compass, telescope, sexton, paper maps
• Mainframe computers
• Workstations, GIS applications
• IT revolution, spatial databases
• GeoEnabled Infrastructure:
LiDAR, Mobile, Stream Processing,
Sensors, Cloud Computing
Geographic Information Systems rely on the technology of the era
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Disappearing line between Geospatial Technologies and
Information Technologies
Mapping Digital data
file Spatial Information
Technology
SOA
Geographic Information Systems
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Latest Technology Trends
Big Data Technology
– Hadoop, MapReduce, Hadoop File System (HDFS), Apache SPARK
Cloud Computing
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Big Data Technology Defined
Big Data: Techniques and
Technologies that Enable Enterprises
to Effectively and Economically
Analyze All of their Data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Emerging Viewpoint:
Big Data = Hadoop + Relational + NoSQL…
-2013 Facebook, 2014 Gartner
Big Data Definition
Current Viewpoint:
Big Data = Hadoop
Volume (amount of data)
Velocity (speed of data in and out)
Variety (range of data types and sources) -2001
Meta Group (now Gartner) definition of Big Data
4th V - Veracity (Uncertainty of Data)
- 2012 IBM added a 4thV
The 3Vs
Is Big Spatial Data different from Big Data?
How does Big Spatial data fit into GIS?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 8
Big Data Architecture
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
BIG DATA MANAGEMENT
BIG DATA ANALYTICS
BIG DATA APPLICATIONS
BIG DATA INTEGRATION DATA
CAPITAL
Connect And Govern Any Data
Simplify Access To All Data Discover And Predict, Fast
Accelerate Data- Driven Action
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 9
Key Factors
Simplify access
to all data
Discover and
predict, fast
Govern and
secure all data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Big Data + Advanced Analytics
Profile
Easily add data and see it automatically and continuously cataloged, enriched
and related
Find
Use familiar guided search across massive
amounts of diverse data
Understand
Know what’s important from diagnostic analysis
of millions of data characteristics
Transform
Powerful tools to quickly clean up
and wrangle dirty data so it’s
ready to go
Discover
Uncover valuable new
insights
Collaborate
Publish, share and evolve as you learn
more
Predict
Use new insights to define and refine predictive
models
Oracle Confidential – Internal 10
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cloud Computing
• Cloud computing enables customers to consume compute resources as a utility
– Just like electricity
– No need to build and maintain computing infrastructures in-house
• Involves large data centers by cloud providers
• Public Cloud and Private Cloud
• infrastructure as a service (IaaS): Amazon AWS storage
• platform as a service (PaaS): IBM, Oracle, MS Azure
• software as service (SaaS): AWS web services, Oracle, IBM
Oracle Confidential 11
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Elastically Scalable
Oracle Confidential – Internal/Restricted/Highly Restricted
• Consumers can scale up as needs increase and then scale down again as demands decrease
• Elastic is ideally dynamic and transparent, but can also be a specific action
– Most important is that it is possible
• Applies to storage, infrastructure, and software
• Elasticity also implies fault tolerance built into the system
– Seamlessly transfer the state of the application to a backup if the primary fails
• Virtualization Software is very important to achieve this goal
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Self-Service Operations
• End users can spin up computing resources for almost any type of workload on-demand
– This applies to storage and compute resources
• All application and system related operations that a customer performs should be accessible via self-service by a customer without requiring any filing of service request to either support or cloud operations teams
• This involves – managing space (eg. their block store or object store space)
– being able to access and analyze diagnostic logs
– being able to migrate data and metadata from one environment to another
Oracle Confidential – Internal/Restricted/Highly Restricted
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Pay Per-Use
• Computing resources are measured at a granular level
– allows users to pay only for the resources and workloads they use
• This is one of the most important aspects for the growth of the cloud
• Consumers can now access a very large pool of computing resources when required without worrying about the cost or management of these
hardware resources
Oracle Confidential – Internal/Restricted/Highly Restricted
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Big Data vs Cloud Computing
• Is Big Data same as cloud computing?
• Not really, but they are tightly related
• Big Data by itself is not affordable for all consumers
– Large infrastructure cost to build cluster computing resources – Human cost to find trained IT staff to manage them
• But Cloud service providers can afford to manage these large computing resources and make slices of it available to consumers
• Cloud computing has many technologies
– Hadoop, MapReduce, Relational DBs, middleware technology
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spatial Big Data and Cloud Computing
challenges
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spatial Big Data Challenges
• Geo-tagging in the context of partial or indirect reference
• Minimize the time it takes to make the data available for analysis
• Discover Spatial and Temporal correlations between different data points
• Data loading time should be minimal to make the data available for use
• Load the data for immediate use, but create spatial indexes over time
• How to leverage the code from spatial database applications developed over the years
• Predictive Analytics for various applications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Location Infused Technology
• Java, Databases, Applications, Cloud
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
GeoSpatial Big Data Sources
• Traditional Data sources
– Raster (satellite imagery, elevation models, images) – Vector (road networks, admin boundaries)
• Machine generated
– Internet of things – Social media
– Sensors
– In vehicle navigation systems (trajectories, traffic information) – Mobile phones
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Extend Spatial Analytics with Cloud and Big Data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Predictive Analysis based on tweets
• How to infer potential trouble based on tweets ?
• Data may have more than one spatial location
– Tweets are generated from a location, but the tweet might be referring to events at a different location
• Find the trend
– “meet at NYC city center at 4PM”
– “protest against climate change”
• Take action to deploy law enforcement to stop any potential crowd trouble
• Needs new algorithms to find spatial-temporal correlations and predict future events
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spatial Cloud Services
• System Developers (CS background)
– Focus on big data, database, SPARK, enabling data sets, etc.
• Application Developers (GIS background)
– Think about solving bigger problems
– More analysis frameworks and data sets are available now – No barriers for entry
– Predictive analytics
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Precision Farming Example
• Goal: Build Predictive Analytical Model to increase the crop yield
• Minimize water resources
• Minimize fertilizer
• Minimize the human capital cost
• Use all available sensor based data sources
– Satellite imagery, ground based sensors, etc.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
How to build a precision farming application
• Acquire satellite data as required
• Acquire disk storage for storing the data
• Setup a cluster of machines to do the computations
• Find a scientist to build the models required to do the analytics using the raster data
• Expensive due to hardware, data acquisition and software costs
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Need to develop new Spatial Algorithms ?
• Map-Reduce uses data partitioning to achieve high performance
• Can we use divide and conquer algorithms without modifications ?
– Depends on how the data is stored
• Need new algorithms for new use cases
• Data Scientist’s focus should be on analysis of data
– Storage and data management should be done by the system – This should be done via a model driven architecture
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spatial Cloud Services Development
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data storage and indexing
• Systems should support a few data storage and indexing models
– Vector data – Raster data – Sensor data
– Enable spatial and temporal search
– Applications can choose one of the provided storage models based in the data and query requirements
• Provide alternate ways to acquire data as required
– Web services, buy as needed, use from existing sources
• Provide reference data and models
• Free up the data scientist to do actual data analysis instead of data storage and layout models
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Geo-Spatial Big Data Management
• Use once
– Data is loaded into the data store and analyzed once
– Extract summary or intelligence once and use it in other places
• Use many times
– Query the data to answer different types of questions – Produce new data products
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge
Separate silos of information to analyze
29
Database
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge
Separate data access interfaces
30
Database
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Before After
What does simplification mean for Spatial Big data analytics
Data Science
PhD
???
Anyone
Web service APIsCopyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spatial Cloud Services Application Development
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Advanced Analytics
Bring the Analytics to the Data
• Understand the data
• Decipher the data to uncover hidden patterns that can be used for better decisions
• Understand hidden correlations and use these relationships to solve business problems
• Predict future outcomes based on observed data before they happen
• Use predictive analytics, machine learning, and data mining techniques on big data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Two types of Analytical Approaches
• Reactive
– Collect large volumes of data from event logs, web logs, etc.
– Process, analyze and extract summaries from the data – Feed the summary data into a traditional DW system
• Proactive
– Process the data as it comes in to find the correlations – Find out if the patterns in the new data mean something – Initiate actions based on perceived patterns
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Precision Farming Example
• Multi-band raster data
– RGB
– Thermal – Vegetation
• Analyze thermal band for vegetation properties
• Compute NDVI models
• Results can be used to model
– Water requirements for different parts of the farm – Growth indicators
– Fertilizer schedules
– Identify under growth (caused by pests)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
After
Application Development in Spatial Cloud
Application Development
Database
Database
Before
Spatial Cloud APIs
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Breaking Barriers with Cloud Computing
• Cloud computing changing the way systems are built
• No more proprietary data silos
– Better result than what OGC/ISO standards have achieved in this respect
• No more closed systems
• Traditional software development paradigms are changing
• On premise cloud will replace most of the on-premise proprietary systems