© Limit Point Systems, Inc. 2014
Big Data Complexities for Scientific
Computing in the Oil and Gas
Industry
noSQL, SQL, and mo’SQL
David M. Butler, President Limit Point Systems, Inc.
© Limit Point Systems, Inc. 2014
Outline
Big Data in oil & gas exploration & production
Field theory for data scientists
The data model paradigm
The sheaf data model
© Limit Point Systems, Inc. 2014
The oil and gas business
“Upstream” is exploration and production (“E&P”) (upper left)
“Downstream” is transportation, refining, and marketing (lower right)
© Limit Point Systems, Inc. 2014
Major Acquired “Upstream” Data Types
Time lapse raw seismic
Time lapse prestack seismic image
Time lapse poststack seismic image
Well logs
Production monitoring
© Limit Point Systems, Inc. 2014
Time lapse raw seismic data
each sensor gives
amplitude as a function of time ~10K sensors moving towards ~1M ~10K shots ~5K samples/shot ~4 – 12 bytes/sample time lapse: repeat ~2/year ~10 years
~10 TB/project*~100 projects/year/major company ~1PB/year/major
© Limit Point Systems, Inc. 2014
Time lapse prestack seismic image data
clean up seismic data
remove noise
remove artifacts
other signal processing operations
“migrate” data
focus signal energy
convert time to position
up to 5D array of data
reflectivity as a function of
3D position
source-sensor 2D offset
© Limit Point Systems, Inc. 2014
Poststack seismic image data
“stack” of prestack data
aggregate over 1 or more
array indices reduces size ~100x 2D or 3D image reflectivity as function of position similar to medical ultrasound image
interpret to produce model of subsurface
© Limit Point Systems, Inc. 2014
Well logs
lower sensor package
into well measure various properties as a function of depth ~10k samples ~1k components simple numbers
bore hole images
others
typically done once
before production starts
~100MB/well*~1K wells/year/major ~ 100GB/year/major
© Limit Point Systems, Inc. 2014
Production monitoring
Classical methods at well head
flow volumes
gas/oil/water composition
temperature
pressure
Distributed sensing methods
fiber optic cables in well
acoustic sensing
temperature sensing
~1000 equivalent discrete sensors
~1k samples/sec
continuous monitoring
~10-100GB/day/well
function of time and position along
well path
~1K wells (growing rapidly) ~1PB/year/major
[epmag 2]
© Limit Point Systems, Inc. 2014
Major interpreted/modeled data types
Geological structure model
Velocity model Basin model Reservoir models geological quantitative engineering Geomechanical model
© Limit Point Systems, Inc. 2014
Geological structure model
~1K structures/year/major ~1TB/year/major
geologist interprets seismic image identifies surfaces defining rock
strata and faults
very complex networks of
intersecting surfaces
iterative process
seismic image depends on acoustic
velocity
acoustic velocity depends on rock
type
rock type interpreted from seismic
image and well data
© Limit Point Systems, Inc. 2014
Velocity model
velocity of sound as a function of
position in volume corresponding to geological structure
scalar, vector, or tensor models
used to produce seismic images
accurate velocity model key to
good seismic image
~1-10GB/model
~1K models/year/major ~1TB/year/major
[geosoft]
© Limit Point Systems, Inc. 2014
Basin model
dynamic model of entire
sedimentary basin rock movement fluid movement study history of hydrocarbon deposits generation expulsion migration to reservoir entrapment useful in predicting
whether structure contains oil or gas
~100GB/model*~100/year/major ~10TB/year/major
© Limit Point Systems, Inc. 2014
Reservoir models
static models
prior to production
estimate volume and
other properities dynamic models fluid flow fluid composition function of position and time
used to guide drilling &
production
keep wells producing ~100GB/project
many fields, many versions/year/major ~100 TB/year/major
© Limit Point Systems, Inc. 2014
Geomechanical model
simulation of mechanical
stresses and strains
whole subsurface
specific reservoirs
stress, strain, deformation as
function of position and time
used to anticipate
mechanical changes around bore hole and in reservoir
~1-10GB/model
~100 models/year/major ~100GB/year/major
© Limit Point Systems, Inc. 2014
(Order of magnitude estimates)
Variety Volume (/object) Velocity (/year/major)
Raw seismic ~1TB ~1PB Prestack seismic ~1TB ~1PB Poststack seismic ~10GB ~10TB Well logs ~100MB/well ~100GB Production monitoring ~10GB ~1PB Geological structure ~1GB ~1TB Velocity model ~1GB ~1TB Basin model ~100GB ~10TB Reservoir models ~100GB ~100TB Geomechanical model ~1GB ~100GB
dozens of other data types, all important
© Limit Point Systems, Inc. 2014
Upstream Data Flow (partial)
complex interoperation between data types
© Limit Point Systems, Inc. 2014
Shared Earth Model concept
integrated data base for evolving models of subsurface
all data types
multiple scales
structure reservoir basin
multiple interpretations and versions per object
uncertainty quantification for everything
provenance for everything
constantly evolving
holy grail of Exploration and Production (“E&P”) data integration in practice: still mostly vendor proprietary islands of integration
© Limit Point Systems, Inc. 2014
conventional enterprise data warehouse
analysis and report oriented rather than transaction oriented
integrates data from many different applications
Extract-Transfer-Load (“ETL”) processes a critical component
conventional warehouse and ETL
relational data model provides conceptual framework
Shared Earth Model for E&P data
relational data model has not proven particularly useful
why not?
© Limit Point Systems, Inc. 2014
Outline
Big Data in oil & gas exploration & production
Field theory for data scientists
The data model paradigm The sheaf data model
© Limit Point Systems, Inc. 2014
Field Theory for Data Scientists
physicist’s “field” not same as database admin’s “field”
field describes some physical property as function of position
and/or time in some physical object
position in a physical object
physical property
physical property as a function of position
© Limit Point Systems, Inc. 2014
A simple example
Lower well Upper well well Branched well bore 1 bore 2 junction derrick floor© Limit Point Systems, Inc. 2014
position in a physical object
position represented by coordinate vector 𝑟⃗ = 𝑥(𝑝) 𝑦(𝑝) p y x R2 x(p) y(p)
© Limit Point Systems, Inc. 2014
Physical property
physical property types specified by mathematical physics
family of types jointly referred to as multilinear algebra
scalar types single number F vector types column of numbers 𝐹⃗ = 𝐹0 𝐹1 tensor types matrix of numbers 𝐹⃡ = 𝐹00 𝐹01 𝐹10 𝐹11
each has important algebraic properties
© Limit Point Systems, Inc. 2014
Physical property as a function of position
function (map) from physical
space to property space
associates a value of F with
each p in the object
𝑭 𝒓 = 𝑭𝟎𝟎 𝑭𝟎𝟎
𝑭𝟎𝟎 𝑭𝟎𝟎
𝒙 𝒚
infinite number of points
infinite number of property
values
how do we represent this on the computer?
𝑭𝟎𝟎 𝑭𝟎𝟎 𝑭𝟎𝟎 𝑭𝟎𝟎 R2 p y x x(p) y(p)
© Limit Point Systems, Inc. 2014
How do we represent a field on the computer?
numerous methods
small industry busy creating new methods
makes interoperation and integration difficult
some common features
decompose physical object into simple pieces
© Limit Point Systems, Inc. 2014
Decompose physical object into simple pieces
mathematicians call each piece a “cell”
decomposition is a “cell complex”
more commonly called a “mesh”
j df s5 s1 s2 s0 s3 s4 df v1 j v3 v4 v5 v6
© Limit Point Systems, Inc. 2014
Approximate by simple function on each cell
for each cell c:
store a data tuple
specify an evaluation
method
evaluation method
F(p) = evalc(p)(p, data tuple)
data tuple may or may not
correspond to value of field at some point
depends on evaluation
method
example: linear interpolation
data for entire field is an array of tuples
value(p) = u*F1 + (1-u)*F0
F0 F1 value(p) v0 v1 u(p) p F
© Limit Point Systems, Inc. 2014
Data for entire field is an array of tuples
tuple components typically real (float or double) but may be of any type
F0 F1 F2 ... Fn-1
F1,0 ... F0,n-1
F0,0 F0,1 F1,1 F0,2 F1,2 F1,n-1
cell 0 cell 1 cell 2 cell n-1 cell 0 cell 1 cell 2 cell n-1
F01,0 F00,0 cell 0 F11,0 F10,0 F00,1 F01,1 cell 1 F11,1 F10,1 F00,n-1 F01,n-1 cell n-1 F11,n-1 F10,n-1 ... scalar vector tensor
© Limit Point Systems, Inc. 2014
How do we want to use field data?
operations specified by mathematical physics five main categories
topological operations
compose and decompose
geometric operations
change the shape
functional operations
set and get the value at a point
move field from one mesh to another
algebraic operations
add, subtract, multiply, divide, diagonalize, ...
calculus operations
© Limit Point Systems, Inc. 2014
data?
doesn’t fit the way we want to store field data
relational schema can’t directly capture field entity
captures data tuple entity instead of entire field entity field entity has to be reconstructed by queries
normalization forces introduction of surrogate keys
may require recursive queries
doesn’t fit the way we want to use field data
table operations are too low level
aren’t useful for high level field operations
no pay-off to using relational model
most field data is stored in app-specific, proprietary flat files
© Limit Point Systems, Inc. 2014
Outline
Big Data in oil & gas exploration & production Field theory for data scientists
The data model paradigm
The sheaf data model
© Limit Point Systems, Inc. 2014
The data model paradigm
Data model [Codd] specifies
class of mathematical objects
operations on those objects
constraints valid instances must satisfy
Languages, libraries, tools based on data model
Applications developed on top of tools
© Limit Point Systems, Inc. 2014
Benefits of data model paradigm
Increases level of abstraction for application development
Increases capability of applications
Facilitates interoperation and integration
Increases productivity of programmers
© Limit Point Systems, Inc. 2014
But …
Benefits only accrue if model captures application structure
The more structure captured the bigger the benefit
© Limit Point Systems, Inc. 2014
by various data models
most noSQL models capture less structure than relational
the “no” in noSQL should perhaps be “less”
scientific apps have way more mathematical structure
relational model isn’t nearly structured enough
scientific apps don’t need no Structured Query Language need a (much) more Structured Query Language – mo’SQL
© Limit Point Systems, Inc. 2014
Data model/mo’SQL requirements
must capture common math structure of scientific data
scalars, vectors, tensors
topology and geometry
fields
algebra and calculus operations
must describe how math entities are represented/stored
decomposition into primitive types and operations
decomposition for parallelism
must maintain rigorous connection between high level
semantics and low level implementation
© Limit Point Systems, Inc. 2014
Outline
Big Data in oil & gas exploration & production Field theory for data scientists
The data model paradigm
The sheaf data model
© Limit Point Systems, Inc. 2014
Sheaf data model
objects are discrete sheaves over finite distributive lattices
math details:
http://www.limitpoint.com/images/Publications/The%20Sheaf%20Data%20Model.pdf
finite distributive lattice
“part space”
all distinct composite parts formed from set of basic parts
discrete sheaf
describes association of attributes with parts
algebraic description of decomposition of abstract data types into
© Limit Point Systems, Inc. 2014
Visualizing a finite distributive lattice
directed acyclic graph
“Hasse diagram”
two kinds of nodes
composite parts
basic parts
links represent “covers”
covers := immediately
includes
A covers B if and only if
A includes B
there is no C such that A includes C includes B.
draw graph so that if A covers
B, B is lower on page example composite part A basic part B covers basic part C covers
© Limit Point Systems, Inc. 2014
Example: branched well
well upper well bore 1 bore 2 junction lower well df Hasse diagram
basic parts are independent objects
composite parts are precisely the sum of their basic parts
bore 1 bore 2 junction Lower well Upper well derrick floor well Well parts
© Limit Point Systems, Inc. 2014
Sheaf table metaphor
data base is a set of tables
each table represents a type
each row an instance
each column an attribute
rows carry client-defined
lattice order
col lattice is row lattice of
some other table
schema are first class
objects
© Limit Point Systems, Inc. 2014
Unified framework for scientific data types
tabular types
contains relational model as limiting case
row lattice is a boolean lattice
physical property types
scalars, vectors, tensors
object-oriented types with multiple inheritance
col lattice is subobject inclusion hierarchy
spatial types (meshes)
any decomposition of space
row lattice represents spatial inclusion
field types
any property, any mesh, any evaluation method
col lattice = tensor(mesh row lattice, property col lattice)
rigorous connection between abstract math types and numeric reps from high level specification to tuples of primitives
© Limit Point Systems, Inc. 2014
Open Source Implementation
SheafSystem™ Community Edition
C++ libraries with Java, Python, and C# bindings
www.sheafsystem.org or github
Geometry API
point locators coordinate sections
(invertible sections)
Fiber Bundle Data Model API spatial
types sectiontypes
physical property types
Jacobians tensors groups Field API refiners pushers field types
Sheaf Data Model API
sheaf storage agent HDF5
© Limit Point Systems, Inc. 2014
Outline
Big Data in oil & gas exploration & production Field theory for data scientists
The data model paradigm The sheaf data model
© Limit Point Systems, Inc. 2014
Query language for sheaf data model
work in progress
with Prof Magne Haveraaen
Bergen Language Design Laboratory, University of Bergen
started with initial guess at operators
extension of relational operators
experience with implementation
formalizing and refining definitions
© Limit Point Systems, Inc. 2014
Acknowledgements
Mark Verschuren, Shell, provided many useful comments and
other input for this presentation
Original research and development funded by subcontracts
B347785, B515090, and B560973 of primecontract
W-7405-ENG-48 with the Department of EnergyNational Nuclear
Security Administration (DOE/NNSA)
Ongoing development has been funded by Shell
GameChanger and Shell TaCIT
© Limit Point Systems, Inc. 2014
© Limit Point Systems, Inc. 2014
References 1
[Krebbers] “Big Data & Analytics: Exploiting it”, Johan
Krebbers, VP Architecture, Shell
http://cdn.osisoft.com/corp/en/media/presentations/2013/ UsersConference2013/PDF/UC2013_Shell_Krebbers_GlobalIT Architecture_1.pdf [KrisEnergy] http://www.krisenergy.com/company/about-oil-and-gas/exploration/ [epmag 1] http://www.epmag.com/Exploration-Geology- Geophysics/Three-D-Seismic-Advances-Improve-Exploration-Success_90469 [decogeo] http://www.decogeo.com/upload/Image/log1_bigl.jpg
© Limit Point Systems, Inc. 2014
References 2
[epmag 2] http://www.epmag.com/item/DAS-enables-simultaneous-multiwell-VSP_121593 [slb1] http://www.slb.com/resources/case_studies/completions/~/medi a/Images/completions/intelligent/wellwatcher_neon_tp_01tn.jpg[slb 2] System of subsurface faults and horizons in the Gulfaks oil
field in the Norwegian sector of the North Sea. Data set courtesy of Schlumberger Limited. [geosoft] http://blogs.geosoft.com/exploringwithdata/2012/08/3d-modelling-with-velocity-volumes-in-gm-sys.html [pdgm 1] http://www.pdgm.com/getmedia/c72b49d9-571b-4fe8- ae3f-bfd00f862b0d/Skua-salt-2010.jpg.aspx?width=1024&height=650&ext=.jpg
© Limit Point Systems, Inc. 2014
References 3
[slb 3] http://www.software.slb.com/PublishingImages/total-stress.jpg [dgi] http://www.dgi.com/images/cvslideshow/fullsize/CoViz4D_Slides how_003.jpg [outernode] http://outernode.pir.sa.gov.au/__data/assets/image/0020/119009 /Curnamona_3D.jpg [cda] http://www.oilandgasuk.co.uk/cmsfiles/custom/html/report-14.png[Codd] E. F. Codd. 1970. A relational model of data for large shared