Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services

(1)

Servicing Seismic and Oil Reservoir

Simulation Data

through Grid Data Services

Sivaramakrishnan Narayanan, Tahsin Kurc,

Umit Catalyurek and Joel Saltz

Multiscale Computing Lab

Biomedical Informatics Department The Ohio State University

(2)

Joel Saltz Gagan Agrawal Umit Catalyurek Shannon Hastings Vijay S Kumar Tahsin Kurc Steve Langella Scott Oster Tony Pan Benjamin Rutt Narayanan Sivaramakrishnan, Li Weng Michael Zhang

Multiscale Computing Lab

(3)

Analysis

Production rates, bypass oil, net present value

Workflow

Run new reservoir simulations

Data

Seismic, well pressures, reservoir

simulations

Generate requests for new simulations, new

seismic studies

Obtain initial, boundary conditions,

input parameters for simulations Store and index simulation results Summary data from datasets Spatio-temporal queries • Simulate multiple realizations of multiple geostatistical models and production strategies

• Evaluate geologic

uncertainty and production strategies simultaneously • Enable on-demand

exploration and comparison of multiple scenarios

– Integration of a robust, Grid-based computational and data handling

infrastructure

– Distributed databases of reservoir and

geophysical data

– Storage and computing resources at multiple institutions

(4)

Characteristics and Issues

• Spatio-temporal datasets

– Simulations carried out/data captured on 3D meshes over many time steps

– Multiple data attributes per data point (gas pressure, oil saturation, seismic traces, etc).

• Very large datasets

– Tens of gigabytes to 100+ TB data

• Lots of simulation runs

– Up to thousands of runs for a study are possible

• Data can be stored in distributed collection of files

• Distributed datasets

– Data may be captured at multiple locations by multiple groups – Simulations are carried out at multiple sites

• Common operations: subsetting, filtering, interpolations,

projections, comparisons, frequency counts

(5)

Data Management, Access and Integration

• Tracking of metadata associated with data

– Metadata defining simulation parameters, mesh description, files associated with simulations, etc.

– Metadata defining seismic measurements (location, year, files storing data, etc.)

• Support for data subsetting and filtering on

file-based, distributed datasets

• Support for on-demand data product generation

– Track metadata associated with data analysis workflows

• Grid data services and distributed querying

– Make data and data products available through Grid service interfaces

(6)

Applications developers generally prefer storing data in files

Support high level queries on multi-dimensional distributed

datasets

Many possible data abstractions, query interfaces

Grid virtualized object-relational database or XML database Grid virtualized objects with user defined methods invoked to

access and process data

Data Virtualization

Our Approach

• Support a basic SQL Select query with a virtual relational table view or a virtual XML database view

• A lightweight layer on top of datasets

(7)

Middleware Support

• Data Virtualization: STORM

– Large data querying capabilities, layered on DataCutter – Distributed data virtualization

– Indexing, Subsetting, Data Cluster/Decluster, Parallel Data Transfer

• Data Analysis/Processing Workflows: DataCutter

– Component Framework for Combined Task/Data Parallelism

– Filtering/Program coupling Service: Distributed C++ component framework

– On demand data product generation

• Distributed Metadata and Data Management: Mobius

– Create, manage, version data definitions – Management of metadata and data instances – Data integration

• Grid Data Services (OGSA-DAI)

– Defines services and interfaces that can be used by clients to specify operations on data resources and data

(8)

Data Management, Access, Integration

• Grid-level data services via OGSA-DAI • Management of data definitions and

metadata, XML virtualization via Mobius

• Object-relational virtualization and subsetting of file based datasets via STORM

• On-demand data product generation via DataCutter

• STORM, Mobius, DataCutter support data operations on heterogeneous collections of storage and compute clusters Schema Management Mobius Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius OGSA-DAI OGSA-DAI OGSA-DAI OGSA-DAI Grid Protocols

(9)

Data Management, Access, and Integration

Schema Management Mobius Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius

Grid-data Service (OGSA-DAI) Grid-data Service (OGSA-DAI)

Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management

Grid-data Service (OGSA-DAI) Grid Service

Protocols Grid-data Service

(OGSA-DAI)

(10)

Array # Component # Component # Component # Sp (or CDP) # & position Receiver group # & position Receiver group # & position Receiver group # & position 50. 00 50. 00 50 .00 Component # Component # Component # Array # Receiver group # & position Receiver group # & position Receiver group # & position 50 .00 50 .00 5 0.00 Component # Component # Component # Array # Receiver group # & position Receiver group # & position Receiver group # & position 50.00 50.00 50.0 0

Data Querying and Processing

Seismic Data

Geostatistics Model 1 Model 2 Model n … … m realizations Well Pattern p Production Strategies Well Pattern 1 … Well Pattern 2

Reservoir Simulations

(11)

STORM

Support efficient selection of the data of interest from

distributed scientific datasets and transfer of data

from storage clusters to compute clusters

• Data Subsetting Model

– Virtual Tables – Select Queries – Distributed Arrays

SELECT <DataElements>

FROM Dataset-1, Dataset-2,…, Dataset-n

WHERE <Expression> AND <Filter(<DataElement>)>

(12)

STORM Services • Query • Meta-data • Indexing • Data Source • Filtering • Partition Generation • Data Mover

(13)

Grid Data Resource

• Grid has emerged as an integrated infrastructure for distributed computation

• OGSA-DAI initiative is to deliver high level data management functionality for the Grid.

– Defines services and interfaces that can be used by clients to specify operations on data resources and data

• OGSA-DAI services can be configured to expose a specific database management system.

• To be a GDS, a service must accept perform documents and return results

– Interpretation of perform documents is open to interpretation – Traditionally wrap SQL queries

(14)

STORM Data Resource

Extractor Filter Data Mover Storm Daemon JDBC Driver GDS STORM instance Data Resource

(15)

Experimental Setup

• All nodes running linux • Gigabit switch 7.3 TB FAStT600 disk array 4 GB memory 2 Xeon 2.4 GHz 16 Xio 1.5 TB local disk 8 GB memory Dual 1.4 GHz AMD Optron 8 nodes mob Mob,01 24 * X / 1M X 24 bytes 6 TXm Xio,16 1,056 247 4240 bytes 16 Seismic Mob,03 315 3,840 84 bytes 21 Oil Reservoir Cluster, Num nodes Dataset (GB) Records (millions) Record Size Attributes Dataset

(16)

STORM Results

STORM I/O Performance

0 500 1000 1500 2000 2500 3000 3500 4000 4500 1 2 4 8 16 # XIO nodes B and w idt h ( M B /s ) 2 Threads 4 Threads Max

Seismic Datasets

10-25GB per file.

About 30-35TB of Data.

(17)

Comparison with MySQL - 1

• Varying table size. • Per tuple cost is lesser

0 20 40 60 80 100 120 0 50 100 150 200

Table Size (million rows)

E xe cut ion Tim e ( se cs ) MySQL-cold MySQL-hot STORM-cold STORM-hot

(18)

Comparison with MySQL - 2

• Varying query size

• Also compare them as data resources

0 10 20 30 40 0 250000 500000 750000 1000000

Query Size (num of records)

Ex ec u ti o n T im e (s ec s) MySQL STORM MySQL-DAI STORM-DAI

(19)

Oil Reservoir Data Results

• Improvements due to: treating records as array of bytes, combining results at client 0 40 80 120 160 0 100000 200000 300000 400000

Query Size (number of records)

E xe cut ion Tim e ( se cs ) STORM STORM-DAI-o STORM-DAI-1 STORM-DAI-50 3 DAIs

(20)

Seismic Data Results

0 10 20 30 40 50 0 2000 4000 6000 8000 10000 12000 14000

Query Size (number of records)

E xecu ti o n T ime ( secs) STORM STORM-DAI-o STORM-DAI-1 2-DAIs • 96 x 11GB files on 16 nodes

(21)

Conclusions

• Overview of work related to Large Scale Scientific Data

Management at Multi-Scale Computing Lab

• Exposed STORM as a Grid Data Service

– Results on use case: Oil reservoir management

• For more info / to download STORM, DataCutter, Mobius

http://www.multiscalecomputing.org

or