Application of Grid-Enabled Technologies for Solving Optimization Problems in Data Driven Reservoir Systems

(1)

Application of Grid-Enabled Technologies

for Solving Optimization Problems in Data

Driven Reservoir Systems

M. Parashar, H. Klie, U. Catalyurek, T.

Kurc, V. Matossian, J. Saltz, M.F.

(2)

UT Austin Univ. Chicago OSU UMD Rutgers

ITR Collaborators

• University of Chicago—CS: Stevens, Papka • University of Maryland—CS: Sussman

• Ohio State—CS: Saltz, Kurc, Catalyurek

• Rutgers—ECE: Parashar • MIT—Engineering: Haines • UT Austin – Wheeler, Dawson,

Peszynska, Klie, Bangerth (Computaional and Applied

math); Sen, Stoffa, Seifoullaev (UTIG), Torres-Verdin (CPGE)

(3)

The Instrumented Oil Field

Detect and track changes in data during production

Invert data for reservoir properties

Detect and track reservoir changes

Assimilate data & reservoir properties into

the evolving reservoir model

Use simulation and optimization to guide future production,

future data acquisition strategy

(4)

Assumptions:

Production of oil and gas will take advantage of

permanently installed geophysical sensors and down hole

instrumentation that will monitor the reservoir’s state as

fluids are extracted.

Knowledge of the reservoir’s state during production will

result in better engineering decisions to modify production

techniques that optimize goals while maintaining safe

operating conditions in environmentally complex and

(5)

Optimize

• Economic revenue • Environmental hazard • …

Based on the present subsurface knowledgeand numerical model

Improve numerical model

Plan optimal data acquisition Acquire remote sensing data Improve knowledge of subsurface to reduce uncertainty Update knowledge of model Management decision START

Dynamic Decision

System

Driven Assimilation

Dynamic Data

-

-Data assimilation Subsurface characterization Experimental design Autonomic Autonomic Grid Grid Middleware Middleware

Grid Data Management

Grid Data Management

Processing Middleware

Data Driven Model Optimization

(6)

DDDSF Requires Multi-petabyte Virtual Data

performance/capacity storage

– Active Disk Cache

-compute jobs that require directly connected storage – parallel file systems, and

scratch space.

– Large temporary holding area

• 128 TB tape library

– Backups and long-term "offline" storage

IBM’s Storage Tank technology combined with TFN connections will allow large data sets to be seamlessly moved throughout the state with increased redundancy and seamless delivery.

(7)

A new generation of IPARS

Optimizing oil production on

the Grid

Objective function Steering Monitoring Data manag./ assimilation Dynamic data Static data Visualization Collaboration Clien ts

(8)

Optimization with a Known Oil Reservoir

Model

•

f

: Objective

function

•

α

: Control variables

in feasibility set

A

•

c

: Model data

(9)

Interplay between Data Acquisition, Data

Assimilation and Optimization

• Optimization seeks best production

strategy

•

Control variables

α

parameterize

production and

data acquisition strategy

•

Good choice of

α

optimizes production

and improves model certainty

• Model c as stochastic

• E: expectation for PDF of c

• A posteriori PDF computed to

describe current subsurface

knowledge

(10)

Parallel/Grid Computing Tools

The Multiblock Adaptive Computational Engine (MACE) for solving heterogeneous domain applications

Adaptive grid blocks

Automatic and transparent scheduling, load balancing

Distributed Shared Objects: distributed dynamic arrays

Datacutter/STORM: Middleware for On-Demand Data Product Generation for Large Archival Scientific Datasets in a Grid Environment

Exploration and analysis of scientific datasets in distributed and

heterogeneous environments

Represents components of a data-intensive application as a set of

filters

Data virtualization for heterogeneous collections of data formats,

storage systems

Discover: Grid Computational Collaboratory enabling seamless and secure access to and interactions between users, applications, services, data and resources

P2P Grid Middleware: services, autonomic composition, secure access

(11)

Scalability of IPARS and

geomechanical coupling

Domain 76800 by 76800 by 1059 feet

513 by 513 by 45 mesh points

282 nodes of dual-processor Dell PowerEdge 1750 3.06GHz computer

interconnected by a Myrinet 2000 with a point-to-point bandwidth of

2Gb/sec. Each node has a 2GB of memory.

0.6 0.7 0.8 0.9 1 1.1 1.2 64 128 192 256 Number of processors P ara llel ef fi ci en cy

(12)

Data Middleware Services

• Filter-stream based distributed execution

middleware (

DataCutter, STORM

)

• Grid based data virtualization, data

management, query, on demand data

product generation (

STORM, Active ProxyG,

Mako

)

• Distributed metadata management (

Mobius

Global Model Exchange

)

– Track metadata associated with workflows, input

image datasets, checkpointed intermediate

(13)

Data Middleware Services and Very Large

Scale Distributed Data Applications

Processing Remotely-Sensed Data

NOAA Tiros-N w/ AVHRR sensor

AVHRR Level 1 Data

AVHRR Level 1 Data • As the TIROS-N satellite orbits, the

Advanced Very High Resolution Radiometer (AVHRR) sensor scans perpendicular to the satellite’s track. • At regular intervals along a scan line measurements are gathered to form an instantaneous field of view

(IFOV).

• Scan lines are aggregated into Level 1 data sets.

A single file of Global Area Coverage(GAC) data represents:

• ~one full earth orbit. • ~110 minutes. • ~40 megabytes. • ~15,000 scan lines.

One scan line is 409 IFOV’s

Satellite Data Processing

Managing Oilfields, Contaminant Transport Digital Pathology Derivation of macroscopic materials properties from MD simulations DCE-MRI Analysis

(14)

DataCutter

9/11/2002 DataCutter 19

Combined Data/Task Parallelism

host1 R0 R₁ host2 R₂ host3 Ra0 host1 E₀ EK host2 EK+1 E_N host4 Ra1 host5 Ra₂ host1 M Cluster 1 Cluster 3 Cluster 2

Flow control between components

Schedulers place filters on grid

processors (scheduler API)

Parallel stream based communication

Data aggregation implemented as a

component

Filters placed near data sources

NPACkage, NMI

(15)

Scientific and engineering applications require interactive exploration and

analysis of datasets.

Applications developers generally prefer storing data in files

Support high level queries on multi-dimensional distributed datasets

Many possible data abstractions, query interfaces

Grid virtualized object relational database or XML database

Grid virtualized objects with user defined methods invoked to access and process data

A virtual relational table view

Large distributed scientific datasets

Data Service

Data

Virtualization

(16)

Our Approach

• Automatic data virtualization

–

Friendly front-end

Support a basic SQL Select query with a virtual

relational table view or a virtual XML database view

–

A lightweight layer on top of datasets

• STORM runtime middleware STORM carries out query

execution, query planning

–

Compiler front end customizes runtime support

Automatic customization and configuration of runtime

query support middleware

(17)

(18)

(19)

Compiler Customization – support for Select

query

SELECT < Data Elements > SELECT *

FROM < Dataset Name > FROM IPARS

WHERE < Expression > WHERE REL in (0,6,26,27) AND TIME>1000 AND Filer( < Data Element> ); AND TIME<1100 AND SOIL>0.7

(20)

Analysis of Oil Reservoir Simulation Data

Prototype Implementation

• Evaluate geologic uncertainty and production strategies

simultaneously

– Multiple realizations of multiple geostatistical models

– Multiple production strategies (number, location of wells)

• Dataset Size = ~5TB

– 500 simulations, selected from several Geostatistics models and well patterns

– Each simulation is ~10GB

• 2,000 time steps, 65K grid elements, 8 scalars + 3 vectors = 17 variables

• Stored at

– SDSC: HPSS and 30TB Storage Area Network System

– UMD: 9TB disks on 50 nodes: PIII-650, 128MB, Switched Ethernet – OSU: 7.2TB disks on 24 nodes: PIII-900, 512MB, Switched Ethernet

• Data Analysis

– Economic model assessment – Bypassed oil regions

(21)

Component #

Receiver group #

&

receiver group position

Sp (or CDP) #

& source position

Line #

Array #

Seismic Data Analysis – STORM: On

Demand Processing of 1.5 TB Seismic

Dataset

Traces

Survey #

(22)

Data Archive & Sensors Data Archives Sensors, Non-Traditional Data Sources Discovery Points Laptop PDA Computer User Scientist Resources CPU's, Storage, Instruments, ... Applications & Services Application Service Discovery Points P2P Grid Middleware DISCOVER Portals

DISCOVER: A Grid Computational Collaboratory enabling seamless and secure access to and interactions between users, applications, services, data and resources

P2P Grid Middleware (PAWN, DISCOVER-COG)

Peer services (discovery, routing, message publication, notification, event), context-aware access control, p2p deductive engines.

Autonomic and Interactive Components (DIOS, AUTOMATE)

Components encapsulate sensors, actuators, policies and rules. Distributed control network connects sensors, actuators and interaction agents.

P2P deductive shell, control network, rules and polices enable autonomic composition, configuration, interaction, protection, optimization and adaptation.

Collaborative Portals

Pervasive (secure) access, monitoring, interaction and control

(23)

Autonomic

Oil Well Placement (UT-CSM, UT-IG)

Oil

•

Optimization services:

–

VFSA (Very Fast Simulated Annealing)

–

SPSA (Simultaneous Perturbation Stochastic Optimization)

•

IPARS delivers

–

fast-forward model (guess->objective function value)

–

post-processing

•

Formulate a parameter space

–

well position and pressure (y,z,P)

•

Formulate an objective function:

(24)

Autonomic Oil Reservoir Optimization using

Decentralized Services

(25)

Components of the AORO Application

•

IPARS : Integrated Parallel Accurate Reservoir Simulator

– Parallel reservoir simulation framework

•

IPARS Factory

– Configures instances of IPARS simulations

– Deploys them on resources on the Grid

– Manages their execution

•

VFSA/SPSA Optimization Services

– Optimizes the placement of wells and the inputs (pressure, temperature) to IPARS simulations.

•

Economic Modeling Service

– Uses IPARS simulations outputs and current market parameters (oil prices, costs, etc.) to compute estimated revenues for a particular reservoir configuration.

•

DISCOVER Computational Collaboratory

– Interaction & Collaboration

– Distributed Interactive Object Substrate (DIOS)

(26)

Autonomic

(27)

Autonomic

Oil Well Placement (SPSA)

Oil

Permeability field showing the positioning of current wells. The symbols “*” and “+” indicate injection and producer wells, respectively.

Search space response surface:

Expected revenue - f(p) for all possible well locations p. White marks indicate optimal well locations found by SPSA for 7 different starting points of the algorithm.

(28)

The Future

•

Scaling up:

– High resolution IPARS simulations

– Multi-petabyte distributed archives of model data

– Exploitation of OSC and Teragrid resources (large teragrid allocation approved)

– Large scale demonstration of Discover/STORM/DataCutter integration

•

Experimental testbeds

– EPA/INEEL collaboration – live sensor data from superfund site

– NSF Center for Subsurface Sensing and Imaging Systems

– Data from industrial affiliates

•

New numerical methods

– Next generation accurate, multi-scale coupled chemical, fluid, geomechanical and geophysical simulator