VBLOCK™SOLUTION FOR GREENPLUM
Table of Contents
Introduction ...4 Goal ...4 Audience ...4 Scope ...4 Objectives ...4Greenplum Architecture Overview ...5
Setup ...6
Installation...6
Vblock Series 700 model MX Building Block Configuration Specifications for Greenplum ...6
Greenplum Design and Configuration Details ...7
Compute – Unified Computing System (UCS) ... 10
UCS Server Blade Provisioning ... 10
Service Profile Template ... 10
UCS Firmware ... 14
UCS Network Configuration ... 14
Greenplum Segment Server/ESX Server Provisioning ... 16
Symmetrix VMAX Storage Array ... 19
The Symmetrix VMAX Architecture Overview ... 19
Hardware List ... 20
Disk Layout ... 20
Front-end Storage Port Layout ... 21
RecoverPoint ... 22
RecoverPoint VSAN Zoning ... 23
RecoverPoint Consistency Groups ... 26
TimeFinder/Snap... 28
Test Results ... 30
Read and Load Performance Tests: Objectives and Results ... 30
RecoverPoint Disaster Recovery Tests: Objectives and Results ... 31
Conclusion ... 34
Introduction
Goal
The purpose of this document is to provide architecture for hosting the Greenplum application on Vblock™ Infrastructure Platforms. Specifically, the Greenplum application is hosted on the Vblock Series 700 model MX shared infrastructure and is part of a multi-applications environment. Vblock 700 delivers high-performance, large-scale virtualization across data centers of large enterprise customers. It includes Cisco Unified Computing system, EMC Symmetrix VMAX, and VMware vSphere 4 and can include flash technology to meet the high-performance demands of mission-critical applications.
The architecture provides a building block approach for hosting Greenplum applications. This approach is scalable and supports a dynamic workload in a cost effective model. The Vblock 700 enables enterprises to meet their mobility, disaster recovery, security, and optimized data life cycle management requirements for hosting Greenplum along with other applications.
Audience
The target audience for this document includes technical engineering staff, managers, IT planners, administrators, and others involved in evaluating, managing, operating, or designing Greenplum Vblock platform deployments.
Scope
The project demonstrates the ability to:
Run the Greenplum application on the Vblock platform
Prove that Greenplum Data Warehousing (DW) is a viable solution for use on a Vblock platform
Objectives
The business objectives of the new Vblock Solution for Greeplum include advantages in the following areas: Provide a proven performance platform for a Greenplum and Vblock 700 architecture
Establish a building-block scalable model with predictable performance growth Provide a showcase environment for a Greenplum and Vblock 700
Greenplum Workload Tests using massively parallel processing (MPP) o MPP/Row and MPP/Columnar Load queries
o MPP/Row Sequential Read and Random Read queries o MPP/Columnar Sequential Read and Random Read queries o Mixed Workloads
Automated Virtualization Functionality
o Scale out to a new ESX server infrastructure with six VMs per ESX host, for a total of forty-eight Greenplum Segment Server VMs
Greenplum Architecture Overview
Greenplum Database is a massively parallel processing (MPP) database server based on PostgreSQL open-source technology. MPP (also known as a shared nothing architecture) refers to systems with two or more processors which cooperate to carry out an operation, each processor with its own memory, operating system, and disks. Greenplum leverages this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a system’s resources in parallel to process a query.
Greenplum Database is essentially several PostgreSQL database instances acting together as one cohesive database management system (DBMS). It is based on PostgreSQL 8.2.9, and in most cases is very similar to PostgreSQL with respect to SQL support, features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS.
The internals of PostgreSQL have been modified or supplemented to support the parallel structure of Greenplum Database. For example, the system catalog, query planner, optimizer, query executor, and transaction manager components have been modified and enhanced to be able to execute queries in parallel across all of the PostgreSQL database instances at once. The Greenplum interconnect (the networking layer) enables communication between the distinct PostgreSQL instances and allows the system to behave as one logical database.
Greenplum Database also includes features designed to optimize PostgreSQL for business intelligence (BI) workloads. For example, Greenplum has added parallel data loading (external tables), resource management, query optimizations, and storage enhancements which are not found in regular PostgreSQL. Many features and optimizations developed by Greenplum do make their way back into the PostgreSQL community, now in standard PostgreSQL.
Figure 1. MPP Shared-nothing Architecture
For further Greenplum information, see the following:
Setup
This section addresses the Greenplum setup.
Installation
The following link points to the Greenplum installation documentation.
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-011-541.pdf
Vblock Series 700 model MX Building Block Configuration Specifications for Greenplum
The 700MX building block configuration for Greenplum comprises forty-eight Greenplum Segment Server VMs running on eight Blades and two storage engines. It will support 4 GB/sec scan rate. More throughputs can be achieved by adding an additional pair of Vmax engines and disks, or by adding more building blocks. See below for additional details.
Table 1 Building block configuration specifications for Greenplum
700MX Compute B200 M2 Blades 8 x B200 M2 Blades w/96GB memory per
Blade (8 Blades for Greenplum Segment Servers) across two or more chassis
700MX Storage Symmetrix VMAX storage 2 Engines
16 x 8Gb/sec FC ports
192 x 300GB FC drives (excluding hot spares)
RAID5 (3+1) 700MX Virtualization VMware vSphere ESX 4.0 U2 servers
vCenter connects to another Windows 2008 R2 Enterprise Edition Server running SQLServer 2005 Enterprise Edition, setup per VMware’s installation guide
VMware Distributed Virtual Switch Greenplum application on
Vblock platform
Greenplum v4.006 software
Greenplum Utilities GP PerfMon
Psql GP-Load Greenplum Perl
Greenplum Connectivity GP/port 5432
GP PerfMon /port 8888 Scan Rate for the given building block 4 GB /sec
Greenplum Design and Configuration Details
This section presents design and configuration details. Figures 2 and 3, below, illustrate the conceptual design through both the physical topology and the logical topology.
The physical topology, figure 2, depicts the connectivity from the UCS to the SAN layer and LAN layer. In the SAN layer, a total of 16 x 8 Gb Fiber Channel (FC) connections were utilized from the USC Fabric Interconnects (A and B) to the MDS SAN directors. The SAN directors are fabric A and fabric B. VSANs 30, 130 are in director A, and VSANs 31 and 131 are in director B.
VSANs 30 and 31 are backend VSANs consisting of the VMax storage ports and the RecoverPoint Appliance HBA ports. VSANs 130 and 131 are the corresponding front-end VSANs consisting of server HBA ports. In this case, the UCS blade servers are used as ESX servers. The front-end and backend VSANs are required by Cisco’s SANTap to function as the write-splitter for RecoverPoint Appliances.
In the LAN layer, a total of 16 x 10 Gb Ethernet port connections are used between the UCS Fabric Interconnects (A and B) and the Nexus 5020 access layer LAN switches, which in turn are connected to the Nexus 7000 switches as the aggregation layer.
See Figure 3, below. The logical topology depicts the configuration of the backend disk, ESX host LUNs, VMs, and Greenplum components.
A total of 26 x VMs were created on 10 x ESX servers (the UCS blade servers). 24 x VMs are used as the data segment servers, 2 x VMs are used as the master servers, one active and one standby node. The 24 VMs are evenly distributed across 8 x ESX servers, 3 x VMs per ESX server. The master servers (active and standby node) are on the ninth and tenth ESX blade servers. RecoverPoint was configured in local replication mode called Continuous Data Protection (CDP).
Compute – Unified Computing System (UCS)
UCS Server Blade Provisioning
The Cisco Unified Computing System (UCS) allows provisioning of Physical Servers using a Template. In UCS terminology, a server is described as a Service Profile. The template allows users to deploy one or more service profiles as a Service Profile Template.
In this Greenplum on Vblock 700 configuration, a total of eight Service Profiles were used. The following Service Profile Template details are used for deploying Greenplum on Vblock 700.
Service Profile Template
A Service Profile Template has the following components:
1. vNIC Template: This template is used for vNIC configuration for all Service Profiles deployed using a Service Profile Template. For Greenplum, disable the failover between vNICs.
2. Each vNIC has the following VLAN configuration:
vNIC 0 (vmnic0) vNIC 1 (vmnic1)
VLAN ID Service Console/Management
vMotion Network
Greenplum Segment VLan-1 * Public Network vLAN**
Service Console/Management vMotion Network
Greenplum Segment VLAN – 2 * Public Network vLAN**
Fabric ID A B
3. vHBA Template: This template is used for a vHBA configuration for all service profiles deployed using a service profile template. For this setup, a total of eight fiber connections from each fabric
Interconnect to the MDS Switches were used. Eight SAN Pin Groups also were created. This way, each Service Profile has a dedicated SAN Connection. The FC Adapter policy for VMware was used for all the vHBAs.
vHBA-1 vHBA-2
VSAN ID 201 202
Fabric ID A B
4. Boot Policy: Four Boot policies were created, which were named Boot from SAN (BFS). Each Boot policy points to front-end directors of the EMC Symmetrix VMAX storage array.
Boot From SAN Policies (BFS)
Policy 1:
Policy 2:
Policy 4:
5. MAC pools for each fabric: In this setup, we have two MAC Pools, one for each fabric. The UCS
6. World Wide Node and Port Name Pool: In this setup, two WWPN Pools were used, one for each
fabric. The UCS blade allows users to modify the last three octets of the worldwide port and node names.
The Universal Unique Identifier (UUID) pool for each Service Profile:
UCS Firmware
The latest firmware available at the time, 1.3 (1c), was used for this deployment. Additionally, to enable the Palo interface cards, it is necessary to attach the firmware package with each service profile. This can be done in the service profile template, which then propagates the firmware to each service profile bound to this template. It is also necessary to update the BIOS to the latest version using a firmware package. The following shows a firmware package, which has Palo card (M81KR) firmware, and BIOS updates.
To apply the update to all service profiles:
1. In the Service Profile template, go to Policies. 2. Select the firmware package as shown below:
UCS Network Configuration
1. Create uplink ports for etherenet traffic.
2. Under networking , configure the LAN and SAN.
LAN Configuration
Ports 1/37-40 were selected as uplink ports for 10GB ethernet traffic. The following screen captures show the port channel (LAN) configurations on the fabric interconnect.
Port-channel Configuration details for fabric Interconnect A
Port-channel Configuration details for fabric Interconnect B SAN Configuration on Fabric Interconnects
Greenplum Segment Server/ESX Server Provisioning
VM servers are provisioned as follows: 3 x VMs are created per ESX server on a total of 8 x ESX servers. These are used as Greenplum segment servers.
1 x VM is created on a 9th ESX server. It is used as the Greenplum Master/metadata Server. This server also handles client requests.
1 x VM is created on a 10th ESX server. It is used as the Greenplum Standby Server.
Note: XFS-formatted devices on VMware ESX Guest RDMs are recommended for GP data segments for the highest
performance.
The table below provides the VM/ESX storage configuration details.
The following diagram depicts the graphical layout of a single blade, with an ESX instance and a VM (Greenplum Data Segment server) with two LUNs (FC) per VM. The ESX environment is located on the 500GB LUN, which holds the three VM instances.
Figure 4. VM/ESX/LUN Layout
The following setup shows the ESX Servers in the vCenter server.
The following shows the Greenplum Segment Servers (gpssx) Virtual Machines distribution on single ESX servers. The VM distribution on the remaining seven ESX servers is identical to gpesx101.gp.vce
Symmetrix VMAX Storage Array
The Symmetrix VMAX Architecture Overview
At the heart of the Symmetrix VMAX series storage array architecture is the scalable Virtual Matrix interconnect design. The Virtual Matrix is redundant and dual active, and supports all Global Memory references, messaging, and management operations including internal discovery and initialization, path management, load balancing, failover, and fault isolation within the array. The Symmetrix VMAX array comprises from one to eight VMAX Engines. Each VMAX Engine contains two integrated directors. Each director has two connections to the VMAX Matrix Interface Board Enclosure (MIBE) via the System Interface Board (SIB) ports. Since every director has two separate physical paths to every other director via the Virtual Matrix, this is a highly available interconnect with no single point of failure.
This design eliminates the need for separate interconnects for data, control, messaging, environmental, and system test. A single highly-available interconnect suffices for all communications between the directors, which reduces complexity.
Figure 5. Symmetrix VMAX Virtual Matrix Interconnect
Figure 6. Symmetrix VMAX Engine Building Blocks
Hardware List
Number of VMAX Engines: 2 Global Memory (GB): 256
Number of Front End (8 Gbps Ports): 32 ports
Number of 300G (15K RPM) FC disks (excluding hot spares): 192
Disk Layout
48 x RAID5(3+1) RAID groups are created out of the total 192 x Fibre Channel (FC) disks o 24 x RAID groups are used as RecoverPoint CDP source
o 24 x RAID groups are used as RecoverPoint CDP target Two 190 GB hyper volumes are created from each RAID group
One concatenated metavolume is created from the above hyper-volumes from each RAID group. This is to achieve IO isolation on disk level.
Each metavolume is allocated to each of the total 24 Greenplum segment VMs as an RDM disk
Easily Add More
Symmetrix VMAX Engines
Virtual Servers VMAX Engine VMAX Engine VMAX Engine VMAX Engine VMAX Engine VMAX Engine VMAX Engine VMAX Engine
Symmetrix VMAX Engine
Building Block
Host & Disk Ports
Core Core Core Core Core Core Core Core CPU Complex Host & Disk Ports
Core Core Core Core Core Core Core Core Global Memory CPU Complex Virtual Matrix Interface Virtual Matrix Interface
Front End Back End Front End Back End
Figure 7. Greenplum Backend Disk Layout on VMAX
Front-end Storage Port Layout
2 x Engines 4 x Directors
16 x front-end processors, 2 x FC ports on each
2 x FC ports on each processor with total 32 x 8 Gbs FC ports. 16 FC ports (from each processor) are utilized
In this configuration, only 16 x FC ports are used. Port 0 is taken from each processor. See figure below.
Figure 8. Greenplum Front-end Storage Port layout
VMax Mask View (LUN masking):
A total of 10 x Mask Views are configured, 8 x for the ESX hosts running Greenplum segment VMs, 1 x for the ESX host running Greenplum master VM server, 1 x for the ESX host running standby VM server.
Each ESX server HBA initiator accesses storage via 2 x Vmax storage ports, or total of 4 x Vmax storage ports per ESX server (with dual HBAs).
For more information about Symmetrix VMAX, see the product documentation:
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-008-603.pdf?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwNTIyMz FkLG5hdmVOb2RlPVNvZndhcmVEb3dubG9hZHMtMg__
Dir 7 Port Port Dir 8 Port Port Dir 9 Port Port Dir 10 Port Port
RecoverPoint
RecoverPoint is EMC’s leading out-of-band, block-level replication product for a heterogeneous server and storage environment. RecoverPoint continuous data protection (CDP) provides local synchronous replication between LUNs that reside in one or more arrays at the same site. RecoverPoint continuous remote replication (CRR) provides remote asynchronous replication between two sites for LUNs that reside in one or more arrays. Both RecoverPoint CDP and RecoverPoint CRR feature bi-directional replication and an any-point-in-time recovery capability, which allows the target LUNs to be rolled back to a previous point in time and used for read/write operations without affecting the ongoing replication or data protection. The bi-directional replication and any-point-in-time recovery capability can be enabled simultaneously with RecoverPoint concurrent local and remote (CLR) data protection.
RecoverPoint supports three types of write-splitting technologies for maximal flexibility.
Table 3 Splitter details
Splitter Type How Deployed Overhead
Host-based In I/O stack just above the multi-path software Adds write traffic at the HBA; no other impact
Fabric-based In intelligent storage services hardware on a Brocade- or Cisco-based switch
Operates at wire speeds; no impact
CLARiiON-based In FLARE operating system; active in both storage
processors No impact
In the Greenplum configuration, the Cisco SANTap service is used in a RecoverPoint CDP deployment. Cisco 18/5 MSMs (Multi Service Modules) are installed in MDS 9513. Both the GP segment server data and the GP master metadata replicate locally on the VMAX for continuous data protection.
Figure 9. RecoverPoint Sequence
RecoverPoint VSAN Zoning
RecoverPoint with Cisco SANTap deployment requires placing different components into two VSANs: Front-end VSAN
Backend VSAN
All I/O activity between the host and the storage is relayed by SANTap from the actual host port via the DVT (Data Virtual Target) in the front-end VSAN to the VI (Virtual Initiator) in the backend VSAN, and then to the actual storage port. This relay mechanism is completely transparent to the hosts.
The following types of zones are required for each VSAN.
Zones in the backend VSAN
The backend VSAN contains the physical storage ports, the RecoverPoint Appliance HBA ports, the CVTs (control virtual targets created by SANTap service), and AVTs (Appliance Virtual Targets, created by the RecoverPoint Appliance):
Zone Type 1 - a zone that contains a member of the ESX server HBA virtual initiators and the
corresponding physical storage ports. These zones were almost identical to the zones in the front-end VSAN that contain the host HBA port and DVTs. See explanation below.
Zone Type 2 - a zone that contains a member of the RPA HBA port and the physical storage ports. This allows the RPAs to access the storage on the Vmax.
Zone Type 3 - a zone that contains RPA HBA ports and CVTs. This allows the RPA to request the CVT to open a splitting session. The I/O is then copied to the RPA, allowing the RPA to replicate data to the target.
Zone Type 4 - a zone that contains RPA HBA ports and the AVTs.
/ A
/
C
/ B
r A
r B
r C
1.Data is split and sent to the
RecoverPoint appliance in one of three ways
4.
The appliance writes data to thejournal volume, along with time stamp and application-specific bookmarks
5. Write-order-consistent data is distributed to the replica volumes
Production volumes Replica volumes Journal volume
3. Writes are acknowledged back from the RecoverPoint appliance
AVT are used to mask the identity of the appliance (RPA), allowing it to appear as the host. This masking is necessary to allow the RPA to overcome SCSI reservation of storage ports by the hosts and to get the same view of the SAN that the hosts have.
Zones in the front-end VSAN
A zone that is between the host HBA ports, in this case the UCS blade server HBA ports, and the SANTap Data Virtual Targets (the DVTs).
Note: DVTs are created as the virtual storage port entity during SANTap configuration. Each physical storage port used
in the backend VSAN would need a corresponding DVT created.
For more information, see the following zoning tables.
Table 4 RecoverPoint VSAN zoning table: Fabric A
Zone Member Member Member
gpesx101_HBA1_Vmax gpesx101_HBA1 Vmax_8eA Vmax_10eA
gpesx102_HBA1_Vmax gpesx102_HBA1 Vmax_8fA Vmax_10fA
gpesx103_HBA1_Vmax gpesx103_HBA1 Vmax_8gA Vmax_10gA
gpesx104_HBA1_Vmax gpesx104_HBA1 Vmax_8hA Vmax_10hA
gpesx107_HBA1_Vmax gpesx107_HBA1 Vmax_8gA Vmax_10gA
gpesx201_HBA1_Vmax gpesx201_HBA1 Vmax_8eA Vmax_10eA
gpesx202_HBA1_Vmax gpesx202_HBA1 Vmax_8fA Vmax_10fA
gpesx203_HBA1_Vmax gpesx203_HBA1 Vmax_8gA Vmax_10gA
gpesx204_HBA1_Vmax gpesx204_HBA1 Vmax_8hA Vmax_10hA
gpesx207_HBA1_Vmax gpesx207_HBA1 Vmax_8gA Vmax_10gA
RPA1_HBA1_2_Vmax RPA1_HBA1 Vmax_8eA Vmax_10eA
RPA1_HBA2 Vmax_8fA Vmax_10fA
Vmax_8gA Vmax_10gA
Vmax_8hA Vmax_10hA
Vmax_8gA Vmax_10gA
…
…
…
…
RPA8_HBA1_2_Vmax RPA8_HBA1 Vmax_8eA Vmax_10eA
RPA8_HBA2 Vmax_8fA Vmax_10fA
Vmax_8gA Vmax_10gA
Vmax_8hA Vmax_10hA
Vmax_8gA Vmax_10gA
RPA_CVT_A All above RPA HBA ports All SANTap CVTs in Fabric A
RPA_AVT_A All above RPA HBA ports All RPA AVTs
Table 5 RecoverPoint VSAN zoning table: Fabric B
Table 6 RecoverPoint VSAN zoning table: Fabric A’
Zone Member Member Member
gpesx101_HBA2_Vmax gpesx101_HBA2 Vmax_9eA Vmax_7eA
gpesx102_HBA2_Vmax gpesx102_HBA2 Vmax_9fA Vmax_7fA
gpesx103_HBA2_Vmax gpesx103_HBA2 Vmax_9gA Vmax_7gA
gpesx104_HBA2_Vmax gpesx104_HBA2 Vmax_9hA Vmax_7hA
gpesx107_HBA2_Vmax gpesx107_HBA2 Vmax_9gA Vmax_7gA
gpesx201_HBA2_Vmax gpesx201_HBA2 Vmax_9eA Vmax_7eA
gpesx202_HBA2_Vmax gpesx202_HBA2 Vmax_9fA Vmax_7fA
gpesx203_HBA2_Vmax gpesx203_HBA2 Vmax_9gA Vmax_7gA
gpesx204_HBA2_Vmax gpesx204_HBA2 Vmax_9hA Vmax_7hA
gpesx207_HBA2_Vmax gpesx207_HBA2 Vmax_9gA Vmax_7gA
RPA1_HBA3_4_Vmax RPA1_HBA3 Vmax_9eA Vmax_7eA
RPA1_HBA4 Vmax_9fA Vmax_7fA
Vmax_9gA Vmax_7gA
Vmax_9hA Vmax_7hA
Vmax_9gA Vmax_7gA
…
…
…
…
RPA8_HBA3_4_Vmax RPA8_HBA3 Vmax_9eA Vmax_7eA
RPA8_HBA4 Vmax_9fA Vmax_7fA
Vmax_9gA Vmax_7gA
Vmax_9hA Vmax_7hA
Vmax_9gA Vmax_7gA
RPA_CVT_B All above RPA HBA ports All SANTap CVTs in Fabric B
RPA_AVT_B All above RPA HBA ports All RPA AVTs
Fabric B, VSAN 31 (BE VSAN)
Zone Member Member (DVT) Member (DVT)
gpesx101_HBA1_Vmax gpesx101_HBA1 Vmax_8eA Vmax_10eA
gpesx102_HBA1_Vmax gpesx102_HBA1 Vmax_8fA Vmax_10fA
gpesx103_HBA1_Vmax gpesx103_HBA1 Vmax_8gA Vmax_10gA
gpesx104_HBA1_Vmax gpesx104_HBA1 Vmax_8hA Vmax_10hA
gpesx107_HBA1_Vmax gpesx107_HBA1 Vmax_8gA Vmax_10gA
gpesx201_HBA1_Vmax gpesx201_HBA1 Vmax_8eA Vmax_10eA
gpesx202_HBA1_Vmax gpesx202_HBA1 Vmax_8fA Vmax_10fA
gpesx203_HBA1_Vmax gpesx203_HBA1 Vmax_8gA Vmax_10gA
gpesx204_HBA1_Vmax gpesx204_HBA1 Vmax_8hA Vmax_10hA
gpesx207_HBA1_Vmax gpesx207_HBA1 Vmax_8gA Vmax_10gA
Table 7 RecoverPoint VSAN zoning table: Fabric B’
RecoverPoint Consistency Groups
RecoverPoint replicates data by using logical groups called Consistency Groups (CGs). Each Consistency Group contains one or many replication sets. Each replication set is a paring between the replication source LUN and target LUN. Since each Consistency Group can be active on a particular RPA, in order to utilize all 8 x RPAs for optimal performance, a total of 8 x CGs were created with each CG containing 3 x replication sets. A Group Set was created to contain all 8 x CGs to provide replication consistency for the entire Greenplum environment. Data consistency is maintained at the Group Set level. This allows rapid, point-in-time recovery of the Greenplum environment.
Below is the Consistency Group configuration table.
Zone Member Member (DVT) Member (DVT)
gpesx101_HBA2_Vmax gpesx101_HBA2 Vmax_9eA Vmax_7eA
gpesx102_HBA2_Vmax gpesx102_HBA2 Vmax_9fA Vmax_7fA
gpesx103_HBA2_Vmax gpesx103_HBA2 Vmax_9gA Vmax_7gA
gpesx104_HBA2_Vmax gpesx104_HBA2 Vmax_9hA Vmax_7hA
gpesx107_HBA2_Vmax gpesx107_HBA2 Vmax_9gA Vmax_7gA
gpesx201_HBA2_Vmax gpesx201_HBA2 Vmax_9eA Vmax_7eA
gpesx202_HBA2_Vmax gpesx202_HBA2 Vmax_9fA Vmax_7fA
gpesx203_HBA2_Vmax gpesx203_HBA2 Vmax_9gA Vmax_7gA
gpesx204_HBA2_Vmax gpesx204_HBA2 Vmax_9hA Vmax_7hA
gpesx207_HBA2_Vmax gpesx207_HBA2 Vmax_9gA Vmax_7gA
Table 8 RecoverPoint consistency group configuration table
For more information about RecoverPoint, see the RecoverPoint documentation set located at:
TimeFinder/Snap
TimeFinder provides local storage replication for increased application availability and faster data recovery. Leveraging the industry leading high-end EMC Symmetrix system, TimeFinder offers unmatched deployment flexibility and massive scalability to meet any service level requirement. TimeFinder helps companies perform backups, load data warehouses, and easily provide data for application test and development—without downtime. TimeFinder/Snap provides the following:
Storage-based information replication; no-host cycles
Snapshots create logical point-in-time images of a source volume Requires only a fraction of the source volume’s capacity (~20–30%)
Multiple snapshots can be created from a source volume and are available immediately Snapshots support both read and write processing
In the Greenplum Vblock platform, SATA disks are configured into a Snap pool for the snaps.
Figure 10. TimeFinder/Snap
Replication Manager
Replication Manager (RM) is EMC’s software that improves access to information by automating and managing disk-based replicas.
Replication Manager is used to manage the TimeFinder Snap operations. Key benefits are that it:
Automates the creation, management, and use of EMC disk-based, point-in-time replicas Auto-discovers the environment
Has intelligence to orchestrate replicas with deep application awareness Is easy to use with point-and-click controls, wizards, and user access
Supports VMware ESX Server Windows and Linux guest operating system environments and Virtual Machine File System (VMFS) containing virtual machines, and:
o Reduces backup windows
o Minimizes/eliminates impact on the application
o Enhances productivity
o Offers data-warehouse refreshes o Provides decision support
o Provides database-recovery ―checkpoints‖ o Enables application development and testing o Enables fast restore
Test Results
This section presents the objectives and results for two different, complementary test sets: Tests of the read and load performance with Greenplum database on the Vblock 700 Tests of disaster recovery success with the RecoverPoint appliance
Read and Load Performance Tests: Objectives and Results
The following tests were performed with the Greenplum DB on a Vblock 700.
Table 9 Read and load performance test objectives
Test Objective Description
1. Test Greenplum read performance on a Vblock 700 with the specified building block.
Test read scan rate and query response time on a Vblock 700 with a specific building block as previously described. Database size: eight TB
2. Test Greenplum load performance on a Vblock 700.
Test data load rate on Vblock 700 with a specific building block as previously described.
Database size: eight TB
Table 10 Test 1: Read test summary and results
Test Scenario Description
Run a stored procedure with two variations:
1. One sequential run with no other workload on the system.
2. One sequential run while additional workload is being run on the system.
Read scan query and response times.
Test Result Description
1. One sequential run with no other workload on the system.
Run #1 result: 9.6 minutes for 15 million records
2. One sequential run while additional workload was being run on the system.
Run #2 result: 11.1 minutes for 15 million records
Online = 1100 queries completed
Production = 50 jobs completed
Table 11 Test 2: Load test summary and results
Test Scenario Description
1. Load an empty monthly partition with one day of data.
2. Load a half full monthly partition with one day of data.
3. Load a full monthly partition with one day of data.
One day of data equated to roughly 15 million records.
Test Result Description
1. Empty Partition Load = 11.33 minutes
2. Half Full Partition Load = 11.32 minutes
3. Full Partition Load = 11.17 minutes
Met performance metrics.
RecoverPoint Disaster Recovery Tests: Objectives and Results
The RecoverPoint (RP) appliance replication capability is leveraged to perform site Disaster Recovery (DR) testing. The following test scenarios to validate the DR testing solutions are within the BI/DW solution stack proposed by EMC. The objectives and results for four tests are summarized in the following tables.
Table 12 Objectives for DR Site recovery tests using RecoverPoint replication
Test Objective Description
1. Verify the Point in Time (PIT) bookmark for entire dataset.
Perform a local bookmark function test to verify that users have access to the database.
2. Verify that the Point in Time (PIT) image was the correct image.
Perform a PIT copy on production.
3. Verify that Snapshot consolidation works correctly.
Enable Snapshot consolidation.
4. Switch over the production DB to the secondary side.
Table 13 Test 1: Bookmark test summary and results
Test Scenario Description
Follow a sequence of steps to place the database in and out of suspend mode, while enabling the image on secondary hosts.
Verify that the database is open and accessible.
Test Result Description
1. Place database in gp suspend mode. 2. Create RP BM.
3. Take the database out of gp suspend mode.
4. Enable the image on secondary hosts. 5. Mount BM image.
6. Start database. 7. Verify PIT.
Database was open and DBA’s were able to access the database.
Table 14 Test 2: PIT image test summary and results
Test Scenario Description
Copy a PIT image before inserting records to production.
Verify that the correct PIT image is copied before inserting 35 million records into production.
Test Result Description
1. DBAs insert 35 million records on production
2. A PIT image is copied.
3. DBAs are able to get correct image before the insert.
DBAs successfully copied correct image on production.
Table 15 Test 3: Snapshot consolidation summary and results
Test Scenario Description
Bookmark an image during Snapshot consolidation.
DBA must verify the image on Snapshot consolidation.
Test Result Description
1. DBA restores the Snapshot consolidation bookmark.
Table 16 Test 4: Switch production DB summary and results
Test Scenario Description
Determine the latest point in time image and switch over from the production database to the target DB.
Target DB has to be primary.
Test Result Description
1. Enable the latest point in time image and switch over the production DB to the target DB.
Conclusion
Our testing supports the benefits of the building block system approach used for hosting Greenplum applications. Key results from the read and load performance tests illustrate the scalability of Greenplum on a Vblock 700 solution:
The Read Test results show that the scan rate and query response time on Vblock 700 have a similar read performance time whether a stored procedure was run alone where results showed 9.6 minutes to read 15 million records, or run with additional workload on the system where the results achieved were 11.1 minutes to read 15 million records.
The Load Test results showed a similar load performance time whether the load was an empty partition load with results at 11.33 minutes, a half full partition load with results at 11.32 minutes, or a full partition load with results at 11.17 minutes.
References
For further Greenplum information, see the following:
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-011-538.pdf Greenplum installation DOC link
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-011-541.pdf Introduction to EMC RecoverPoint 3.3 New Features and Functions Applied Technology white paper http://www.emc.com/collateral/software/white-papers/h2781-emc-recoverpoint-3-new-features.pdf RecoverPoint Guides – full set
http://powerlink.emc.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=freeformlinks2&internalId =0b014066800f517e&_irrt=true&rnavid=PT-2%3A0b0140668037fed7
Symmetrix VMAX product guide
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-008-603.pdf?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwNTIyMz FkLG5hdmVOb2RlPVNvZndhcmVEb3dubG9hZHMtMg
ABOUT VCE
VCE, the Virtual Computing Environment Company formed by Cisco and EMC with investments from VMware and Intel, accelerates the adoption of converged infrastructure and cloud-based computing models that dramatically reduce the cost of IT while improving time to market for our customers. VCE, through the Vblock platform, delivers the industry's first completely integrated IT offering with end-to-end vendor accountability. VCE prepackaged solutions are available through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development environments, allowing customers to focus on business innovation instead of integrating, validating and managing IT infrastructure.
For more information, go to www.vce.com.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." VCE MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OR MECHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.