• No results found

Big Data Complexities for Scientific Computing in the Oil and Gas Industry

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Complexities for Scientific Computing in the Oil and Gas Industry"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

© Limit Point Systems, Inc. 2014

Big Data Complexities for Scientific

Computing in the Oil and Gas

Industry

noSQL, SQL, and mo’SQL

David M. Butler, President Limit Point Systems, Inc.

(2)

© Limit Point Systems, Inc. 2014

Outline

Big Data in oil & gas exploration & production

Field theory for data scientists

The data model paradigm

The sheaf data model

(3)

© Limit Point Systems, Inc. 2014

The oil and gas business

“Upstream” is exploration and production (“E&P”) (upper left)

“Downstream” is transportation, refining, and marketing (lower right)

(4)

© Limit Point Systems, Inc. 2014

Major Acquired “Upstream” Data Types

Time lapse raw seismic

Time lapse prestack seismic image

Time lapse poststack seismic image

Well logs

Production monitoring

(5)

© Limit Point Systems, Inc. 2014

Time lapse raw seismic data

each sensor gives

amplitude as a function of time ~10K sensors moving towards ~1M ~10K shots ~5K samples/shot ~4 – 12 bytes/sample time lapse: repeat ~2/year ~10 years

~10 TB/project*~100 projects/year/major company ~1PB/year/major

(6)

© Limit Point Systems, Inc. 2014

Time lapse prestack seismic image data

clean up seismic data

remove noise

remove artifacts

other signal processing operations

“migrate” data

focus signal energy

convert time to position

up to 5D array of data

reflectivity as a function of

3D position

source-sensor 2D offset

(7)

© Limit Point Systems, Inc. 2014

Poststack seismic image data

“stack” of prestack data

aggregate over 1 or more

array indices reduces size ~100x 2D or 3D image reflectivity as function of position similar to medical ultrasound image

interpret to produce model of subsurface

(8)

© Limit Point Systems, Inc. 2014

Well logs

lower sensor package

into well measure various properties as a function of depth ~10k samples ~1k components  simple numbers

 bore hole images

 others

typically done once

before production starts

~100MB/well*~1K wells/year/major ~ 100GB/year/major

(9)

© Limit Point Systems, Inc. 2014

Production monitoring

Classical methods at well head

 flow volumes

 gas/oil/water composition

 temperature

 pressure

Distributed sensing methods

 fiber optic cables in well

 acoustic sensing

 temperature sensing

 ~1000 equivalent discrete sensors

 ~1k samples/sec

 continuous monitoring

 ~10-100GB/day/well

function of time and position along

well path

~1K wells (growing rapidly) ~1PB/year/major

[epmag 2]

(10)

© Limit Point Systems, Inc. 2014

Major interpreted/modeled data types

Geological structure model

Velocity model Basin model Reservoir models geological quantitative engineering Geomechanical model

(11)

© Limit Point Systems, Inc. 2014

Geological structure model

~1K structures/year/major ~1TB/year/major

geologist interprets seismic image identifies surfaces defining rock

strata and faults

very complex networks of

intersecting surfaces

iterative process

 seismic image depends on acoustic

velocity

 acoustic velocity depends on rock

type

 rock type interpreted from seismic

image and well data

(12)

© Limit Point Systems, Inc. 2014

Velocity model

velocity of sound as a function of

position in volume corresponding to geological structure

scalar, vector, or tensor models

used to produce seismic images

accurate velocity model key to

good seismic image

~1-10GB/model

~1K models/year/major ~1TB/year/major

[geosoft]

(13)

© Limit Point Systems, Inc. 2014

Basin model

dynamic model of entire

sedimentary basin  rock movement  fluid movement study history of hydrocarbon deposits  generation  expulsion  migration to reservoir  entrapment useful in predicting

whether structure contains oil or gas

~100GB/model*~100/year/major ~10TB/year/major

(14)

© Limit Point Systems, Inc. 2014

Reservoir models

static models

 prior to production

 estimate volume and

other properities dynamic models  fluid flow  fluid composition  function of position and time

used to guide drilling &

production

keep wells producing ~100GB/project

many fields, many versions/year/major ~100 TB/year/major

(15)

© Limit Point Systems, Inc. 2014

Geomechanical model

simulation of mechanical

stresses and strains

whole subsurface

specific reservoirs

stress, strain, deformation as

function of position and time

used to anticipate

mechanical changes around bore hole and in reservoir

~1-10GB/model

~100 models/year/major ~100GB/year/major

(16)

© Limit Point Systems, Inc. 2014

(Order of magnitude estimates)

Variety Volume (/object) Velocity (/year/major)

Raw seismic ~1TB ~1PB Prestack seismic ~1TB ~1PB Poststack seismic ~10GB ~10TB Well logs ~100MB/well ~100GB Production monitoring ~10GB ~1PB Geological structure ~1GB ~1TB Velocity model ~1GB ~1TB Basin model ~100GB ~10TB Reservoir models ~100GB ~100TB Geomechanical model ~1GB ~100GB

dozens of other data types, all important

(17)

© Limit Point Systems, Inc. 2014

Upstream Data Flow (partial)

complex interoperation between data types

(18)

© Limit Point Systems, Inc. 2014

Shared Earth Model concept

integrated data base for evolving models of subsurface

all data types

multiple scales

 structure  reservoir  basin

multiple interpretations and versions per object

uncertainty quantification for everything

provenance for everything

constantly evolving

holy grail of Exploration and Production (“E&P”) data integration in practice: still mostly vendor proprietary islands of integration

(19)

© Limit Point Systems, Inc. 2014

conventional enterprise data warehouse

analysis and report oriented rather than transaction oriented

integrates data from many different applications

Extract-Transfer-Load (“ETL”) processes a critical component

conventional warehouse and ETL

relational data model provides conceptual framework

Shared Earth Model for E&P data

relational data model has not proven particularly useful

why not?

(20)

© Limit Point Systems, Inc. 2014

Outline

Big Data in oil & gas exploration & production

Field theory for data scientists

The data model paradigm The sheaf data model

(21)

© Limit Point Systems, Inc. 2014

Field Theory for Data Scientists

physicist’s “field” not same as database admin’s “field”

field describes some physical property as function of position

and/or time in some physical object

position in a physical object

physical property

physical property as a function of position

(22)

© Limit Point Systems, Inc. 2014

A simple example

Lower well Upper well well Branched well bore 1 bore 2 junction derrick floor
(23)

© Limit Point Systems, Inc. 2014

position in a physical object

position represented by coordinate vector  𝑟⃗ = 𝑥(𝑝) 𝑦(𝑝) p y x R2 x(p) y(p)

(24)

© Limit Point Systems, Inc. 2014

Physical property

physical property types specified by mathematical physics

family of types jointly referred to as multilinear algebra

scalar types single number F vector types column of numbers 𝐹⃗ = 𝐹0 𝐹1tensor types matrix of numbers 𝐹⃡ = 𝐹00 𝐹01 𝐹10 𝐹11

each has important algebraic properties

(25)

© Limit Point Systems, Inc. 2014

Physical property as a function of position

function (map) from physical

space to property space

associates a value of F with

each p in the object

𝑭 𝒓 = 𝑭𝟎𝟎 𝑭𝟎𝟎

𝑭𝟎𝟎 𝑭𝟎𝟎

𝒙 𝒚

infinite number of points

infinite number of property

values

how do we represent this on the computer?

𝑭𝟎𝟎 𝑭𝟎𝟎 𝑭𝟎𝟎 𝑭𝟎𝟎 R2 p y x x(p) y(p)

(26)

© Limit Point Systems, Inc. 2014

How do we represent a field on the computer?

numerous methods

small industry busy creating new methods

makes interoperation and integration difficult

some common features

decompose physical object into simple pieces

(27)

© Limit Point Systems, Inc. 2014

Decompose physical object into simple pieces

mathematicians call each piece a “cell”

decomposition is a “cell complex”

more commonly called a “mesh”

j df s5 s1 s2 s0 s3 s4 df v1 j v3 v4 v5 v6

(28)

© Limit Point Systems, Inc. 2014

Approximate by simple function on each cell

for each cell c:

store a data tuple

specify an evaluation

method

evaluation method

F(p) = evalc(p)(p, data tuple)

data tuple may or may not

correspond to value of field at some point

depends on evaluation

method

example: linear interpolation

data for entire field is an array of tuples

value(p) = u*F1 + (1-u)*F0

F0 F1 value(p) v0 v1 u(p) p F

(29)

© Limit Point Systems, Inc. 2014

Data for entire field is an array of tuples

tuple components typically real (float or double) but may be of any type

F0 F1 F2 ... Fn-1

F1,0 ... F0,n-1

F0,0 F0,1 F1,1 F0,2 F1,2 F1,n-1

cell 0 cell 1 cell 2 cell n-1 cell 0 cell 1 cell 2 cell n-1

F01,0 F00,0 cell 0 F11,0 F10,0 F00,1 F01,1 cell 1 F11,1 F10,1 F00,n-1 F01,n-1 cell n-1 F11,n-1 F10,n-1 ... scalar vector tensor

(30)

© Limit Point Systems, Inc. 2014

How do we want to use field data?

operations specified by mathematical physics five main categories

topological operations

 compose and decompose

geometric operations

 change the shape

functional operations

 set and get the value at a point

 move field from one mesh to another

algebraic operations

 add, subtract, multiply, divide, diagonalize, ...

calculus operations

(31)

© Limit Point Systems, Inc. 2014

data?

doesn’t fit the way we want to store field data

relational schema can’t directly capture field entity

 captures data tuple entity instead of entire field entity  field entity has to be reconstructed by queries

normalization forces introduction of surrogate keys

may require recursive queries

doesn’t fit the way we want to use field data

table operations are too low level

aren’t useful for high level field operations

no pay-off to using relational model

most field data is stored in app-specific, proprietary flat files

(32)

© Limit Point Systems, Inc. 2014

Outline

Big Data in oil & gas exploration & production Field theory for data scientists

The data model paradigm

The sheaf data model

(33)

© Limit Point Systems, Inc. 2014

The data model paradigm

Data model [Codd] specifies

class of mathematical objects

operations on those objects

constraints valid instances must satisfy

Languages, libraries, tools based on data model

Applications developed on top of tools

(34)

© Limit Point Systems, Inc. 2014

Benefits of data model paradigm

Increases level of abstraction for application development

Increases capability of applications

Facilitates interoperation and integration

Increases productivity of programmers

(35)

© Limit Point Systems, Inc. 2014

But …

Benefits only accrue if model captures application structure

The more structure captured the bigger the benefit

(36)

© Limit Point Systems, Inc. 2014

by various data models

most noSQL models capture less structure than relational

the “no” in noSQL should perhaps be “less”

scientific apps have way more mathematical structure

relational model isn’t nearly structured enough

scientific apps don’t need no Structured Query Language need a (much) more Structured Query Language – mo’SQL

(37)

© Limit Point Systems, Inc. 2014

Data model/mo’SQL requirements

must capture common math structure of scientific data

scalars, vectors, tensors

topology and geometry

fields

algebra and calculus operations

must describe how math entities are represented/stored

decomposition into primitive types and operations

decomposition for parallelism

must maintain rigorous connection between high level

semantics and low level implementation

(38)

© Limit Point Systems, Inc. 2014

Outline

Big Data in oil & gas exploration & production Field theory for data scientists

The data model paradigm

The sheaf data model

(39)

© Limit Point Systems, Inc. 2014

Sheaf data model

objects are discrete sheaves over finite distributive lattices

math details:

http://www.limitpoint.com/images/Publications/The%20Sheaf%20Data%20Model.pdf

finite distributive lattice

“part space”

all distinct composite parts formed from set of basic parts

discrete sheaf

describes association of attributes with parts

algebraic description of decomposition of abstract data types into

(40)

© Limit Point Systems, Inc. 2014

Visualizing a finite distributive lattice

directed acyclic graph

 “Hasse diagram”

two kinds of nodes

 composite parts

 basic parts

links represent “covers”

 covers := immediately

includes

 A covers B if and only if

 A includes B

 there is no C such that A includes C includes B.

draw graph so that if A covers

B, B is lower on page example composite part A basic part B covers basic part C covers

(41)

© Limit Point Systems, Inc. 2014

Example: branched well

well upper well bore 1 bore 2 junction lower well df Hasse diagram

basic parts are independent objects

composite parts are precisely the sum of their basic parts

bore 1 bore 2 junction Lower well Upper well derrick floor well Well parts

(42)

© Limit Point Systems, Inc. 2014

Sheaf table metaphor

data base is a set of tables

each table represents a type

each row an instance

each column an attribute

rows carry client-defined

lattice order

col lattice is row lattice of

some other table

schema are first class

objects

(43)

© Limit Point Systems, Inc. 2014

Unified framework for scientific data types

tabular types

 contains relational model as limiting case

 row lattice is a boolean lattice

physical property types

 scalars, vectors, tensors

 object-oriented types with multiple inheritance

 col lattice is subobject inclusion hierarchy

spatial types (meshes)

 any decomposition of space

 row lattice represents spatial inclusion

field types

 any property, any mesh, any evaluation method

 col lattice = tensor(mesh row lattice, property col lattice)

rigorous connection between abstract math types and numeric reps from high level specification to tuples of primitives

(44)

© Limit Point Systems, Inc. 2014

Open Source Implementation

SheafSystem™ Community Edition

C++ libraries with Java, Python, and C# bindings

www.sheafsystem.org or github

Geometry API

point locators coordinate sections

(invertible sections)

Fiber Bundle Data Model API spatial

types sectiontypes

physical property types

Jacobians tensors groups Field API refiners pushers field types

Sheaf Data Model API

sheaf storage agent HDF5

(45)

© Limit Point Systems, Inc. 2014

Outline

Big Data in oil & gas exploration & production Field theory for data scientists

The data model paradigm The sheaf data model

(46)

© Limit Point Systems, Inc. 2014

Query language for sheaf data model

work in progress

with Prof Magne Haveraaen

Bergen Language Design Laboratory, University of Bergen

started with initial guess at operators

extension of relational operators

experience with implementation

formalizing and refining definitions

(47)

© Limit Point Systems, Inc. 2014

Acknowledgements

Mark Verschuren, Shell, provided many useful comments and

other input for this presentation

Original research and development funded by subcontracts

B347785, B515090, and B560973 of primecontract

W-7405-ENG-48 with the Department of EnergyNational Nuclear

Security Administration (DOE/NNSA)

Ongoing development has been funded by Shell

GameChanger and Shell TaCIT

(48)

© Limit Point Systems, Inc. 2014

(49)

© Limit Point Systems, Inc. 2014

References 1

[Krebbers] “Big Data & Analytics: Exploiting it”, Johan

Krebbers, VP Architecture, Shell

http://cdn.osisoft.com/corp/en/media/presentations/2013/ UsersConference2013/PDF/UC2013_Shell_Krebbers_GlobalIT Architecture_1.pdf [KrisEnergy] http://www.krisenergy.com/company/about-oil-and-gas/exploration/ [epmag 1] http://www.epmag.com/Exploration-Geology- Geophysics/Three-D-Seismic-Advances-Improve-Exploration-Success_90469 [decogeo] http://www.decogeo.com/upload/Image/log1_bigl.jpg

(50)

© Limit Point Systems, Inc. 2014

References 2

[epmag 2] http://www.epmag.com/item/DAS-enables-simultaneous-multiwell-VSP_121593 [slb1] http://www.slb.com/resources/case_studies/completions/~/medi a/Images/completions/intelligent/wellwatcher_neon_tp_01tn.jpg

[slb 2] System of subsurface faults and horizons in the Gulfaks oil

field in the Norwegian sector of the North Sea. Data set courtesy of Schlumberger Limited. [geosoft] http://blogs.geosoft.com/exploringwithdata/2012/08/3d-modelling-with-velocity-volumes-in-gm-sys.html [pdgm 1] http://www.pdgm.com/getmedia/c72b49d9-571b-4fe8- ae3f-bfd00f862b0d/Skua-salt-2010.jpg.aspx?width=1024&height=650&ext=.jpg

(51)

© Limit Point Systems, Inc. 2014

References 3

[slb 3] http://www.software.slb.com/PublishingImages/total-stress.jpg [dgi] http://www.dgi.com/images/cvslideshow/fullsize/CoViz4D_Slides how_003.jpg [outernode] http://outernode.pir.sa.gov.au/__data/assets/image/0020/119009 /Curnamona_3D.jpg [cda] http://www.oilandgasuk.co.uk/cmsfiles/custom/html/report-14.png

[Codd] E. F. Codd. 1970. A relational model of data for large shared

data banks. Commun. ACM 13, 6 (June 1970), 377-387. DOI=10.1145/362384.362685

References

Related documents