Accelerating Oracle with IBM FlashSystem: The Need for Speed

(1)

Accelerating Oracle with

IBM FlashSystem: The Need for Speed

Mike Ault - Oracle FlashSystem Consulting Manager, IBM

(2)

Smarter Computing Demands Flash

(3)

Why Flash Storage…….. Timing is Perfect !!

In the last 10 years…

• CPU Speed: Performance increase roughly 8-10x • DRAM Speed: Performance increase roughly 7-9x • Network Speed: Performance increase of 100x • Bus Speed: Performance increased roughly 20x • Disk speed: Performance increased 1.2x

IBM FlashSystem™ 50+x 300K+

(4)

Most Costly & Volatile

Time Consuming, Very Expensive &

Risky

Wasteful, Expensive & Ineffective with

Storage Latency Issues Expensive & Ineffective for Storage Performance Issues

Datacenter’s Response to Bridge Disk Performance Gap

Add More Memory Typical Performance Mitigation Tactics HDD Performance Enhancement

Add CPUs Tune & Modify

(5)

What if we only reduced Latency ??

Consider Little’s Law of mathematical queue theory as it applies to Application Performance

Now let’s see how FlashSystem alters this equation

Q = Number of parallel threads running in the application

t = Time it takes for an IO request to be serviced (Latency)

R = Result, typically measured in IOPS or Bandwidth

(6)

• 38% Lower software license costs

• Due to fewer cores

• Lower software maintenance • More Efficient Infrastructure

• 13% lower infrastructure software costs

• 35% lower operational support costs • Server / Storage Admin

• Much better storage utilization

• As much as 50% • Lower maintenance

• Ease management by 50% • 17% Fewer Servers

• Fewer cores • Lower Memory

• Fewer network connections • Lower maintenance

• Environmentals 74% Lower Cost

• Lower power / cooling

All Flash is 31% 31% Less Expensive Overall

(7)

Microsecond latency maximizes Application CPU utilization

I/O Serviced by Disk

1. Issue I/O request ~ 100 μμμμs

2. Wait for I/O to be serviced ~ 5,000 μμμμs

3. Process I/O ~ 100 μμμμs

• Time to process 1 I/O request = 200 μμμμs

+ 5,000 μμμμs = 5,200 μμμμs

• CPU Utilization = Wait time / Processing time = 200 / 5,200 = ~4%

Time

Processing ~100 µs ~100 µs

Waiting ~5,000 µs

1 I/O Request

CPU State

I/O Serviced by IBM FlashSystem

1. Issue I/O request ~ 100 μμμμs

2. Wait for I/O to be serviced ~ 200

μ μ μ μs

3. Process I/O ~ 100 μμμμs

• Time to process 1 I/O request = 200 μμμμs Time

Processing ~100 µs ~100 µs

Waiting ~200 µs CPU State

12X Application benefit by only changing storage latency!

12XApplication benefit by only changing storage latency!

(8)

Introduction

Important applications require: High Performance

Queries, reports, and screens must return quickly Scale to high user loads

Reliability

100% uptime

Single system fault can not be fatal Loss of processing impacts bottom line Cost Effectiveness

Effective use of resources

Leverage tech to achieve accelerated performance gains for the cost

(9)

(10)

Oracle and Queries -Where does latency matter?

Memory SGA & PGA

Oracle Processes

Tables & Indexes

Reads - Cache miss Foreground Waits: DB file sequential read DB file scattered read 3-5 ms

User’s Query

Storage latency

(11)

Why Don’t Writes Matter?

For data and index block writes: –Uses delayed block cleanout

–Writes when it can’t find clean blocks –Writes every 3 seconds

(12)

Oracle and Insert/update/delete- Where does latency matter?

Memory SGA & PGA

Oracle Processes

Tables & Indexes

Logs

LGWR

DBWR

(background)

Users Insert Commit

(13)

Where Else?

Temporary Activity

–Sorts

–Hashes

–Bitmaps

–Global Temporary Tables

Non-memory Undo activity

Flash Cache

(14)

(15)

IBM FlashSystem 840: Hardware View

Flash Modules (12)

RAID Controllers (2)

Battery Modules (2)

Power Supplies (2) Fan Modules (4)

Interface Modules (4)

Management Modules (2)

Canisters (2)

Improved RAS features

Front/Back accessible Hot-swap Flash Modules, Power Supplies, Batteries, Fans, Controllers w/ interface cards and Canisters

Non-disruptive maintenance and firmware updates (concurrent code load)

Improved RAS features

Front/Back accessible Hot-swap Flash Modules, Power Supplies, Batteries, Fans, Controllers w/ interface cards and Canisters

Non-disruptive maintenance and firmware updates (concurrent code load)

(16)

Superior Durability:

Using the Best Flash

10X

3X

Superior Protection: Beyond Disk RAID

Chip/Plane/Die level protection

Self-Recovering Flash Modules Avoid system rebuilds Protection Within And

Across Flash Modules

Variable Stripe Sizes Read Disturb Mitigation Automatic Read Sweeper High-Speed Clock Recovery

Advanced Engineering = Less Maintenance

IBM FlashSystem 840: Reliability Ingredients

SLC Market demand decreasing. eMLC data protection techniques delivering more wear life than what market demands

(17)

Storage Performance Council

(SPC-1/e)

(18)

FlashSystem Result Details: IBM MicroLatency™

Leadership minimum reported latency (SPC-1 LRT™): 0.18 ms

Single-system latency leadership up to about 85K IOPS, scalable with multiple FlashSystem units

Nearest latency competitor (HDS) uses 2 racks of equipment, over 2x the flash for storage, plus a massive 1 TB DRAM cache and an additional 1 TB flash cache

Nearest standard SSDs are ~2x the latency!

(19)

FlashSystem Result Details: Extreme Performance

Our result shows a single 1U “building block”, not a highly scaled out design like other results

Maximum aggregate performance of

195,021.70 SPC-1 IOPS™ from a single 1U FlashSystem 820

Scale IOPS linearly by stacking FlashSystem units

Strong “performance efficiency”: – ~200K IOPS per rack unit – ~50K IOPS per 8 Gbit FC port

System

8 Gbit FC Ext Ports

Max IOPS/ Ext Port

$/ASU GB

Huawei Dorado5100 8 75K $76

Huawei Dorado2100

G2 8 50K $60

IBM

FlashSystem 820 4 49K $25

HDS

HUS 150 (SSDs) 4 31K $116

HDS

VSP with HAF 32 19K $148

HP StorServ 7400

(SSDs) 20 13K $130

SLC + servers = more speed, higher price! We can do similar with 7xx products—but do clients need it?

(20)

OPERA (Preferred read)

(21)

WRITE S READS ASM FG2 ACTIVE DATA 20 TB

OPERA Example

ARCHIVE DATA 100 TB ASM FG1 ACTIVE DATA 20 TB

ASM

Boost Performance Boost Redundancy

- Without Disruption - Without Risk

- Without Feature Loss

IBM Flash System

SAN

SAN SANSAN

DB Servers

Mirror Mirror

(22)

Storwize V7000:

• 36x 300GB 10k disks

• Brocade SAN switch:

• SAN40B-4 8Gbit ports • Power server(Lpar1&2):

• Power 750 – 8233-E8B • Each has:

• 8 CPU

• 100 GB memory • AIX 7.1 TL2 SP2

• 2x 8Gbit FC ports • FlashSystem 820 20TB

• 4x 8Gbit FC ports

O

ptimal

P

erformance

E

nhancing

R

eal FlashSystem

(23)

8,000 Reads / Sec now at extremely low latency

Preferred Read – Acceleration Example

System does 10,000 Writes & IBM FlashSystem does 10,000 Writes &

40,000 Reads

System performance @10,000 IOPS for a given app Read/Write Ratio @ 80% Reads / 20% Writes

Reads: 8,000 / Sec Writes: 2,000 / Sec

Introduce IBM FlashSystem as Primary Copy of new mirror

System was 10,000 IOPS Now 10,000+ Writes / Sec

R/W ratio does not change; No change in the app

= System Accelerated

(24)

Swingbench OLTP Results

(25)

(26)

Acceleration with IBM FlashSystem 820

V7000

FlashSystem

820 X Increase

8.06 2.29 2.52

28.43 2.15 12.22

15.26 2.04 6.48

(27)

DWH Swingbench Results

(28)

(29)

FlashSystem 820 Acceleration

V7000

Flash

System820 X Increase

1453703 621184 2.34

2885604 718334 4.02

255656 64721 3.95

3705331 1344883 2.76

(30)

SLOB (Silly Little Oracle Benchmark) Testing Scenario

SLOB generates the IO requests via PL/SQL, thus exercising full Oracle IO machinery along with SGA etc.

SLOB is capable of testing random single block reads, writes and extreme REDO logging. It does all this with no application

contention, thus allowing one to measure true maximum IO that can be achieved on a system .

Workload consisted of 56 users from both RAC nodes generating db file sequential reads (Random Physical single block Reads). The test started with preferred read set to the disk failure group and continued after preferred read changed to FlashSystem

(31)

Acceleration of Database Creation with IBM Flash System

After swithcing to FlashSystem (05:27 PM) Disk IO wait disappears and waiting is now on host CPU. This graph shows the effect of the low

(32)

(33)

SLOB Test Results

Just by Adding a Single FlashSystem Box to an Oracle

environment you can get:

36x acceleration of IOPS!

30x acceleration of throughput!

40x Acceleration of Latency! (assuming 0.43 ms, Oracle

reports <0.5 as 0)

(34)

Before – Read From Disk

After Acceleration – Read From

FlashSystem

(35)

System Configuration

Linux X86 RHEL Server Gen2 XIV disk array

IBM FlashSystem 820

ASM used to mirror between XIV and FlashSystem Switched preferred read mirror at Instance level Gen2/Flash means the Gen2 was PRM

Flash/Gen2 means FlashSystem was PRM

(36)

(37)

(38)

(39)

Conclusions

Using ASM PRM achieved near Flash-only levels of performance Preferred read mirror using IBM FlashSystem provides dramatic performance boost in read heavy environments.

(40)

(41)

ABB

As Is Environment:

ABB US has ~ 15 major manufacturing plants & 7,500 users All plants depend on SAP; The SAP Oracle DB = ~ 3.2 TB

The PR1 DB is hosted on HACMP-clustered AIX LPARs; The LPARs are clustered between 2x p570 P6+ frames

The DB resides on 2x DS8700 arrays front-ended by SVCs DB LUNs are mirrored between the 2 arrays at the host level

(42)

Business Challenge

User dissatisfied with SAP performance Slow month-end batch reads and reporting

Dialog response times approaching business SLAs

Performance concerns causing hesitation to invest in growth 100K (USD) per month in SLA fines

(43)

Fixes:

Many alternate solutions tried / considered but limited success: Additional CPU => Limited improvements; Additional cost

SAP dedicated SAN => Too expensive

DS8700 SSD => Too expensive; Configuration limitations

SAP / Oracle tuning => Limited changes helped a little; Extensive changes too labor intensive

(44)

Proposed Fix

Install (2) IBM FlashSystem 810 units, one to accelerate each Pod

– Attach behind SVC; powered by IBM P Series and AIX

– Migrate the database and logs to the SSD and mirror across the Pods via AIX-level mirroring (standard mirroring used today)

(45)

Predicted Acceleration

Predicted CPU Utilization

40 60 80 100 120 140 P e rc e n t CPU% Corrected CPU% % Increase 0.00% 50.00% 100.00% 150.00% 200.00% 250.00%

Current wait time total 70.84% New project wait time 2.78% Total Improvement 213.10%

(46)

Results

Actual CPU Utilization

40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 P e rc e n t CPU% Corrected CPU% Percent Increase

(47)

Proof points: Stuttgarter Straßenbahnen AG

• SAP ERP Finance • Oracle

• AIX

• SVC stretched cluster

Several other customer cases show batch run time reduction by factor 5 to 10 !

(48)

(49)

Tipping Point Demonstration

Highly Scalable & I/O Intensive OLTP Database Workload

Compelling Economics

Significantly improved workload efficiency

Extreme Capacity

Buy only what you need; add capacity as needed

Application Transparency

Avoid risk and cost of change as you grow or subside

Continuous Availability

Uninterrupted access to data with

IBM FlashSystem, IBM Power Systems and DB2

IBM Power 780 (4 nodes, 128 Cores, 2TB Memory) IBM FlashSystem 820 (4-1U units, 20TB Each) Fibre Channel

Networking 10 GbE Networking

IBM DB2 v10.5 (10-8 core cluster

(50)

Tipping Point Demonstration Results

IBM FlashSystem, IBM Power Systems and DB2

•

1.3 Million IOPS

•

43K+ Transactions per second

•

13K Updates per second

Normalized $ / IOPS

Energy Space

IBM FlashSystem

2,500Spindles

+ 128 SSDs Spindles5,000

11x Less

80x Less 26x Less

(51)

All flash Case Study: Life sciences Client

10 TB Flash System 820

SQL cluster

IBM 3650 IBM 3650

Problem

•Experiencing pain with JDE BD loads / backups / restores

•Needed better system performance for the end user

Solution

•Installed IBM FlashSystem 820 into a a SQL DB, clustered, running Oracle JDE

•Included Oracle OLAP processes

Benefit

•Backup Time improved from 5 hours to 42 minutes

•Restore Time improved from 6.5 hours to 1.2 hours

•Batch times went from 7:30 hours to 2:37 and 17:47 to 7:07

(52)

Questions?

Mike Ault

[email protected]

Thanks to :

[email protected]

Ali Fığığığığlalıııı [email protected]

Orçun Budak [email protected]