Tools for Managing Big Data
Analytics on z/OS
Mike Stebner, Joe Sturonas PKWARE, Inc.
Wednesday, March 12, 2014
Session ID 14948
Introduction
Heterogeneous Analysis
Addressing the process of packaging and transferring z/OS
based information to an off-board analytic platform in an
Effective, Cost-efficient and Secure manner.
What are some major hurdles that exploitation of advanced
System z facilities can overcome in this venue?
Introduction
Heterogeneous Analysis
•
Data Transformation
• Code page differences (EBCDIC/ASCII)
• Data Structures (Binary, Endian mode numerics, Parsing)
•
Portability between dissimilar file system formats
•
Data Packaging (multiple discrete components)
•
Data Protection
•
Data Volume
• Total raw size
What is the business impact of selected
designs and facilities?
Focus on experiences with System z
Facilities that help address two areas
•
Data Transformation
• Code page differences (EBCDIC/ASCII)
• Data Structures (Binary, Endian numerics, Parsing)
•
Portability between dissimilar file system formats
•
Data Packaging (multiple discrete components)
•
Data Protection -
Encryption
•
Data Volume –
Hardware Assisted Compression
• Total raw size
Data Protection
Data-Centric Encryption using ICSF
Machine z10-‐EC 2097 z10-‐BC 2098 z196 2817 z114 2818 zEC12 2827 zBC12 2828 Algorithm
Supported 3DES DES
AES128, 192, 256 DES 3DES AES128, 192, 256 DES 3DES AES 128, 192, 256 DES 3DES AES 128, 192, 256 DES 3DES AES 128, 192, 256 DES 3DES AES 128, 192, 256 Crypto Hardware CPACF CEX2C CPACF CEX2C CPACF CEX3C CPACF CEX3C CPACF CEX3C CPACF CEX3C
Application Design Cryptographic
Design Influences
•
Data Exchange Format
• Collection with associative constructs
•
Data Transport (Container Format)
• In-flight and ‘at rest’ security
• Authentication and decryption service availability
•
Cryptographic Identity and Associated Key Management
• Dynamic vs. Static Keys
• Inter-system Key Coordination
•
Data Recovery (Contingency Keys)
•
Resource Capacity
Crypto Facilities
OpenPGP Keyrings Native X.509 Certificates Proprietary Certificate Store RACF/ACF2/Top Secret Certificate Cryptographic X.509 Certificates Public LDAP Administration Application Services ICSF CKDS & PKDS Certificate Authority CEXnC / CPACF / Software CryptoData-Centric Encryption
ICSF Data Encipherment Algorithms
•
RSA PKi Encryption
• Losing ground for longevity due to high cost of processing
increased key lengths
•
Symmetric Clear Key
• DES class, AES (128 – 256 bit key strength)
• May be employed with passphrase-generated key or CKDS
stored key
•
Symmetric Protected Key (SYMCPACFWRAP)
Symmetric Key Operational Comparison
“Clear”
Fast, but Risky Fast & Secure“Protected” “Secure” Slow
o ICSF Software
-or-
o System z CPACF
o System z CPACF o Cryptographic
Card o Passphrase Value -or- o ICSF CKDS Registered (clear) o ICSF CKDS Registered (encrypted) o ICSF CKDS Registered (encrypted)
Leverage ICSF CKDS to Protect
Passphrase Derived Keys
CKDS Policy Control – Duplicate Key
Value Protection
RACF key ring/certificate with PKDS
Label:MSTEBNERSHARETEST ç RACF Label (r_datalib API access)
Certificate ID:2QPVweLV4uPFwtXF2fLw8P1A Status:TRUST Start Date:2013/12/17 19:00:25 End Date: 2014/01/18 19:00:24 Serial Number:10F0F1FF3C718DEE4D24BBEDA47A49D0 Issuer's Name:CN=UTN-USERFirst-Client Authentication and Email.OU=http:
//www.usertrust.com.O=The USERTRUST Network.L=Salt Lake City.SP=UT.C=US
Subject's Name:[email protected]=Mike Stebner.OU=Corporate Secure Email.OU=Issued through PKWARE E-PKI Manager.O=PKWARE.648 N PL ANKINTON AVE.L=MILWAUKEE.SP=WI.53203.C=US Key Usage:HANDSHAKE Key Type:RSA Key Size:2048 Private Key:YES
What is the business impact of selected
designs and facilities?
Inherited OpenPGP Data Flow
18
•
Onion layer concept
• Encryption Layer
• Compression Layer • Literal Data layer
•
Data stream packets on each layer
Literal Data Layer Compression Layer Encryption Layer
Consider the Basic Data Flow
Simple copies
from phase to
phase
Understand OpenPGP Internal Stream
Formatting (RFC 2440 or 4880)
OpenPGP Data Flow Overhead
Additional data
manipulation
logic from phase
to phase
Illustration of Container Format Influence
on Encipherment Facilities
Symmetric Keys X.509 Certificates OpenPGP RACF/ACF/CA-TSS
ICSF PKDS ICSF CKDS FIPS 140-2
Compression
Why is it important?
APPLICATION SERVICES GCP/ zIIP/zEDC Data acquisition Result:Compressed & Encrypted Data on Target Platform
Data is offloaded, encrypted, and compressed.
What Compression Facilities are
Available on System z?
Software-based
•
General CP (e.g. gzip, OpenPGP, PKZIP, zlib)
• Any viable cross-platform compatible algorithm chosen for
implementation
• Deflate (RFC1951) is a commonly used algorithm that combines
LZ77 sliding dictionary compression with Huffman coding.
•
Software using zIIP offload
• Execute software routines on a System z9 or later
• Requires APF authorization to run SRB enclave scheduling
• Provides economic compression, but may not improve
What Compression Facilities are
Available on System z?
Hardware-based
•
System z CMPSC Static Dictionary hardware compression
• Available since the early 1990’s
• Static dictionary LZ77
• Limited applicability outside of z/OS
•
System z Enterprise Data Compression hardware
• New with zEC12 and zBC12 systems
• PCIE adapter card
Compression Facility
Functional Comparison
Software General CP Software on zIIP CMPSC Static Dictionary zEDC Portable Generalized CompressionRequirements General CP Capacity
System z9 zIIP Capacity (APF) Pre-defined data structures zEC12/zBC12 z/OS 2.1 zEDC Card
IBM zEnterprise Data Compression for z/
OS and the zEDC Express Feature (I)
IBM Announcement; Document Number: ZSB03059USEN
•
Implements RFC 1951 Deflate compression
•
“When zlib uses zEDC, there can be up to 118X reduction
in CPU and up to 24X throughput improvement”
•
One or more PCIE cards servicing multiple partitions (15)
• Currently supported only under a native z/OS LPAR
• Check IBM statements of direction
•
Optimized for larger amounts of data
IBM zEnterprise Data Compression for z/
OS and the zEDC Express Feature (III)
•
System Use Cases
• SMF
•
Phased Roll-out intentions
• BSAM/QSAM (infrastructure layer)
• DFSMSdss™/DFSMShsm™ backup/restore
• z/OS Java™ Technology Edition, Version 7
•
Detailed SHARE sessions
• 15209: Experiences with IBM zAware and zEDC
• 15099: zEnterprise Data Compression: What is it and How
Do I Use it? (Wed. 4:30 PM)
• 15080: z/OS zEnterprise Data Compression Usage and
IBM zEnterprise Data Compression for z/
OS and the zEDC Express Feature (IV)
•
z/OS V2R1.0 MVS Callable Services for HLL (Ch. 13-15)
• Deflate stream compatible with GZIP, PKZIP, OpenPGP
• Hardware availability checks to determine availability
• IBM-provided compatible C library functions
• APF Authorized API for single-block compress/inflate
IBM zEnterprise Data Compression for z/
OS and the zEDC Express Feature (V)
•
z/OS V2R1.0 MVS Callable Services for HLL (Ch. 13-15)
• Unauthorized zlib interface (streaming data)
• Uses zlib.net z_stream programming interface (subset)
• Raw Deflate Stream or GZIP modes (CRC32 with GZIP)
• libzz.a include wrapper
• Controlled by SAF-protected FACILITY class resource
FPZ.ACCELERATOR.COMPRESSION
• z/OS UNIX _HZC_COMPRESSION_METHOD environment
control variable
• May fall back to zlib software routines depending on zEDC
requirements, including size limitations
• PARMLIB IQPPRMxx DEFMINREQSIZE (4K) and
IBM zEnterprise Data Compression
PKWARE Early Test Program Experience
•
Objective
• Assess compression using software GCP, zIIP and zEDC
•
zEC12
• 5 General CPs, 2 zIIPs, 1 zEDC
•
Workloads – Single system (no LPAR sharing of zEDC)
• “Large” (1gb+) linear with multiple parallel (80 concurrent)
• “Small” (256k) high volume
•
Metrics
• Elapsed Time
zEDC Operations
zEDC Operations
zEDC Processing Characteristics
•
Multi-tasking with the zlib API is available
•
zlib API may not run on the zEDC hardware (per design)
• Different minimum buffer size thresholds for deflate & inflate
•
Only one ‘level’ of zEDC Deflate compression
• 9 levels available in zlib software
• Internal implementations of RFC 1951 Deflate may differ
• May experience varying compression ratios (based on level)
IBM zEnterprise Data Compression
PKWARE Early Test Program Experience
Initial Results Overview (I)
• zEDC sustained 1gb+ per second of raw compression
• zEDC capacity exceeded application resource constraints
• The affects of I/O and application processing prevented
saturation of zEDC
• Under appropriate conditions, zIIP met or exceeded
application performance when compared to zEDC.
• Optimized zlib C routines showed benefits over the libzz.a
wrapper code under some conditions.
• Small files under the minimum buffer size
IBM zEnterprise Data Compression
PKWARE Early Test Program Experience
Initial Results Overview (II)
• ETP limitations of first implementation identified
• Buffer allocation issues • Buffer release
• Rejected concurrent requests for the same size buffer
Effect of Resource Availability
zEDC vs. zIIP
Incorporate Design with Facility
Summary Slide
•
The Mainframe is typically the source of record for critical
business data
• Data needs to move off the mainframe quickly, efficiently and
securely.
• Numerous facilities on z/OS exist to make this quick, efficient
and secure – zIIP, CryptoExpress4S, CPACF, zEDC