• No results found

Purestorage_UserGuide3_2

N/A
N/A
Protected

Academic year: 2021

Share "Purestorage_UserGuide3_2"

Copied!
253
0
0

Loading.... (view fulltext now)

Full text

(1)

The FlashArray User's Guide

Purity Version v3.2

(2)

The FlashArray User's Guide: Purity Version v3.2

(3)

Table of Contents

About the Guide ... x

Organization of the Guide ... x

A Note on Format and Content ... x

Please Help Us Improve Our Documentation ... x

Introduction to the FlashArray Architecture ... xi

Modular Hardware Architecture ... xi

High Availability ... xii

Software for Solid State Storage ... xii

FlashArray Advantages ... xiii

I. An Overview of FlashArray Hardware and Software ... 1

1. FlashArray Hardware ... 3

Scalable, Highly-Available Configurations ... 3

High Availability ... 3

Scaling ... 4

FlashArray Hardware Module Details ... 5

Controllers ... 5

Storage Shelves ... 5

Intra-Array Connectivity ... 6

Controller to Drive Connectivity ... 6

Inter-Controller Connectivity ... 8

External Connectivity ... 8

Host Connectivity ... 8

Administrative Network Connectivity ... 8

FlashArray End-to-End Resiliency ... 9

Device and Data Resiliency ... 9

Array-Level Resiliency ... 11

2. The Purity Operating Environment ... 13

Flash Memory: Like Disk, But Different ... 13

Solid State Drives versus Disks ... 13

Disk and SSD Failure Modes ... 14

Purity Operating Environment Goals ... 15

Designing for Flash ... 15

Purity Architectural Highlights ... 16

Data Virtualization ... 16

The Virtualization Map ... 16

Using Virtualization ... 17

Storage Layout ... 17

Storage Layout Structures ... 18

Segment Allocation ... 18

Populating Segments and Scheduling Writes ... 19

RAID-3D ... 20

RAID-3D Flexibility ... 20

Storage Reclamation and Data Reduction ... 21

Storage Reclamation ... 21

Data Reduction ... 22

Data Consolidation ... 22

Data Movement ... 23

Storage Reclamation and Capacity Optimization Priorities and Tradeoffs ... 23

Read and Write Processing ... 24

Write Processing ... 24

(4)

The FlashArray User's Guide

Proactive Troubleshooting ... 26

Information Sources ... 26

Reporting Mechanisms ... 27

Alerts ... 27

The remoteassist Facility ... 28

Summarizing the Purity Operating Environment ... 28

II. FlashArray Concepts and Managed Objects ... 29

3. FlashArray Storage Capacity and Utilization ... 31

Array Capacity and Storage Consumption ... 31

FlashArray Storage States ... 31

Reporting Array Capacity and Storage Consumption ... 32

Volume and Snapshot Storage Consumption ... 32

Provisioning ... 32

Data Reduction ... 33

Snapshots and Physical Storage ... 34

Measuring Volume Storage Usage ... 35

Reporting Volume Size and Storage Consumption ... 35

The FlashArray Data Lifecycle ... 36

4. FlashArray Managed Objects ... 38

Physical and Virtual Objects ... 38

Object Naming ... 39

The Principal Virtual Objects ... 40

Common Operations on Virtual Objects ... 40

Volumes ... 41

Managing Volume Size ... 41

Immediate Volume Eradication ... 42

Changes in Volumes' Storage Consumption ... 42

Associated Objects and Attribute Values ... 43

Snapshots ... 43

Hosts and Host Groups ... 44

Host and Host Group Attributes ... 45

Host-Volume Connections ... 45

Private and Shared Connections ... 45

Properties of Shared Connections ... 46

LUN Management ... 46

Managed Hardware Objects ... 47

Hardware Component Naming ... 47

FlashArray Hardware Components ... 47

The Array ... 48

Solid State Drives ... 49

FlashArray Ports ... 49

Exporting Volumes to Hosts ... 50

III. Using the Purity GUI to Administer a FlashArray ... 51

5. The Purity GUI ... 53

Accessing the GUI ... 53

6. The Dashboard Tab ... 54

The Capacity Pane ... 56

The Performance Panes ... 57

7. The Storage Tab ... 59

Volume Administrative Tasks ... 60

Creating New Volumes ... 60

Managing Existing Volumes ... 61

Deleting Unneeded Volumes ... 61

(5)

The FlashArray User's Guide

Volume Detail Views ... 63

Host Group and Host Administrative Tasks ... 64

Creating New Host Groups and Hosts ... 65

Renaming and Deleting Host Groups and Hosts ... 66

The Host Group Details View ... 66

Adding Hosts and Connecting Volumes to a Host Group ... 67

Other Host Group Administrative Tasks ... 69

Host Administrative Tasks ... 71

Creating and Deleting Host Objects ... 71

The Individual Host Detail View ... 73

Host Object Management ... 75

Host-Volume Connection Tasks ... 77

Making Private Host-Volume Connections ... 78

Shared Connections ... 79

Breaking Private Connections ... 81

Breaking Shared Connections ... 82

8. The Analysis Tab ... 84

Analysis Tab Display Control ... 84

Analysis Tab Time Scales ... 85

Analysis Tab Displayed Data ... 88

Capacity View ... 88

Performance View ... 88

9. The System Tab ... 92

The Array ("SYSTEM") Health View ... 92

The Connections View ... 94

The Configuration View ... 95

The Array Information Sub-View ... 95

The Networking Sub-View ... 96

The Support Connectivity Sub-View ... 97

The Alerts Sub-View ... 97

The SNMP Sub-View ... 98

The Array Time Sub-View ... 99

The Users View ... 100

10. The Messages Tab ... 102

Viewing Alert Messages ... 103

Managing Alert Messages ... 104

IV. Using the Purity CLI to Administer a FlashArray ... 106

I. Purity CLI Man Pages ... 108

pureadmin ... 109 purealert ... 111 purearray ... 115 purecli ... 121 pureconfig ... 128 puredns ... 129 puredrive ... 131 pureds ... 134 purehgroup ... 140 purehgroup-connect ... 145 purehgroup-list ... 148 purehost ... 152 purehost-connect ... 156 purehost-list ... 161 purehw ... 165 purelicense ... 171

(6)

The FlashArray User's Guide puremonitor ... 181 purenetwork ... 183 pureport ... 187 puresnap ... 189 puresnmp ... 198 purevol ... 203 purevol-list ... 208 purevol-rename ... 213 purevol-setattr ... 215

11. Common CLI Administrative Tasks ... 217

CLI Help ... 217

Top-Level Help ... 217

Command Help ... 218

Subcommand-Level Help ... 218

Man Page Help ... 219

Getting Started ... 219

Creating Volumes ... 219

Creating Hosts ... 220

Connecting Hosts and Volumes ... 221

Resizing Volumes ... 222

Destroying Volumes ... 223

Recovering and Eradicating Destroyed Volumes ... 224

Renaming Volumes ... 225

Ongoing Administration of Hosts ... 225

Monitoring I/O Performance ... 226

Using listobj Output ... 227

V. Supplementary Information ... 229

A. Supported Remote Access Packages ... 231

B. The Pure Storage Glossary ... 232

C. References ... 238

D. License and Product Information ... 239

License ... 239

About Panel ... 239

Pure Storage FlashArray Systems and Components ... 239

(7)

List of Figures

1.1. Entry-Level and Highly Available FlashArray Hardware Configurations ... 4

1.2. Drive and Interposer in Carrier ... 6

1.3. Example of Redundant FlashArray Controller to Storage Shelf Connectivity ... 7

1.4. Redundant Host Connections via Separate Fabrics ... 8

2.1. The Purity Virtualization Map ... 17

2.2. The Purity On-Media Data Layout ... 18

2.3. RAID-3D Protection Spheres ... 20

2.4. The Purity Write Processing Path ... 24

2.5. The Purity Read Processing Path ... 25

3.1. FlashArray Physical Storage States ... 31

3.2. Data Reduction Example ... 33

3.3. Snapshot Space Consumption ... 34

3.4. FlashArray Physical Storage Life Cycle ... 36

4.1. Changes In Volume Size ... 42

4.2. Snapshot Management ... 44

5.1. The Purity GUI Login Pane ... 53

6.1. Dashboard Display Areas ... 55

6.2. Expanded Graph (Latency History Example) ... 56

6.3. Capacity Bar Graph ... 56

6.4. Point-in-Time I/O Performance ... 58

7.1. The Storage Tab (Host Group Detail View) ... 59

7.2. Create Volume Buttons ... 60

7.3. The Edit Volume Dialog ... 61

7.4. Deleting a Volume ... 62

7.5. Recovering a Deleted Volume ... 63

7.6. Downloading Volume Information ... 63

7.7. Volume Detail View ... 64

7.8. Sample Host Group and Host View ... 65

7.9. Renaming a Host Group and Deleting a Host Object ... 66

7.10. Adding Hosts and Volumes to a Host Group ... 67

7.11. Adding Hosts and Volumes to a Host Group ... 68

7.12. Other Host Group Tasks Performed from the Detail View ... 69

7.13. Host-Volume Connection Map for a Host Group ... 70

7.14. Host and Volume Tasks Performed from the Host Group Detail View ... 70

7.15. Creating a Host Object from the Host Group Add Hosts Dialog ... 72

7.16. Deleting a Host from the Host Group and Hosts View ... 72

7.17. Deleting a Host from its Host Detail View ... 73

7.18. Example Individual Host Detail View ... 74

7.19. Renaming a Host Object ... 76

7.20. Associating Worldwide Name with a Host Object ... 76

7.21. Breaking the Association between a Worldwide Name and a Host Object ... 77

7.22. Connecting Multiple Volumes to a Single Host ... 78

7.23. Connecting Multiple Hosts to a Volume ... 79

7.24. Connecting One or More Volumes to a Host Group ... 80

7.25. Breaking a Private Connection from the Individual Volume Detail View ... 81

7.26. Breaking a Private Connection from the Individual Host Detail View ... 82

7.27. Breaking a Shared Connection from the Individual Host Group Detail View ... 83

7.28. Breaking a Shared Connection from the Individual Volume Detail View ... 83

8.1. The GUI Analysis Tab ... 84

8.2. Analysis Tab Expanded Graph, Pop-up Numeric Display, and Time Scales ... 86

(8)

The FlashArray User's Guide

8.4. Moving the Display within the Zoom Interval ... 87

8.5. Data Selection—Capacity View ... 88

8.6. Data Selection—Performance View ... 89

8.7. Analysis Tab Stacked I/O Performance Display ... 91

9.1. The System Tab (User View Selected) ... 92

9.2. The Array Health View ... 93

9.3. The Connections View ... 94

9.4. The Array Information Sub-View ... 95

9.5. The Networking Sub-View ... 96

9.6. The Support Connectivity Sub-View ... 97

9.7. The Alerts Sub-View ... 98

9.8. The SNMP Sub-View ... 99

9.9. The Array Time Sub-View ... 100

9.10. The Users View ... 100

10.1. The Messages Tab ... 102

10.2. Viewing an Informational Alert Message ... 103

10.3. Viewing an Alert Error Message ... 103

10.4. Managing Alert Messages ... 104

(9)

List of Examples

3.1. purearray list --space Example ... 32

3.2. Sample purevol list --space CLI Command Output ... 35

4.1. Case Insensitivity in Object Names ... 40

11.1. Top-Level Help ... 217

11.2. Command-Level Help ... 218

11.3. Subcommand Help ... 218

11.4. man Page Help ... 219

11.5. Creating Volumes ... 220

11.6. Creating Hosts ... 220

11.7. Establishing Host-Volume Connections ... 221

11.8. Resizing Volumes ... 222

11.9. Destroying Volumes ... 223

11.10. Recovering and Eradicating Volumes ... 224

11.11. Renaming Volumes ... 225

11.12. Administrative Operations on Hosts ... 225

11.13. Monitoring I/O Performance ... 226

11.14. Using the Output of the listobj Subcommand ... 227

11.15. Shell Scripting with the listobj Subcommand ... 227

(10)

About the Guide

Organization of the Guide

The target audience for this Guide is administrators of Pure Storage Inc. FlashArray™ storage systems. Part I, “An Overview of FlashArray Hardware and Software”:

introduces the FlashArray hardware architecture and gives an overview of Purity software concepts. Part II, “FlashArray Concepts and Managed Objects”:

defines FlashArray storage capacity measurement and describes the physical and virtual objects man-aged by a FlashArray administrator.

Part III, “Using the Purity GUI to Administer a FlashArray”:

describes the use of the browser-based Purity ™ graphical user interface (GUI) to administer a FlashArray.

Part IV, “Using the Purity CLI to Administer a FlashArray”:

describes the Purity command line interface (CLI) and its use in FlashArray administration. Part V, “Supplementary Information”:

includes Appendixes that contain supplementary information about Pure Storage products and links to reference material that storage administrators may find useful.

A Note on Format and Content

This Guide describes concepts and usage of the Pure Storage FlashArray from the array administrator's point of view. FlashArray technology is evolving rapidly. As with all advanced information technologies, once basic architecture is in place, implementation develops at different rates in its different facets, each preceded by feature-by-feature detailed design.

This edition of the Guide describes the properties and behavior of arrays that run the v3.2 Release of Purity Software. It may include information about planned capabilities whose external form has been specified at the time of the release. Such material is included to provide users of the v3.2 Release with information for design planning purposes, and is subject to change as new functionality is implemented. Material relating to not-yet-implemented functionality is identified as such in the text.

Please Help Us Improve Our Documentation

Pure Storage Inc.'s goal is to deliver the best performing, most reliable, most cost-effective, easiest to use enterprise storage arrays in the market. Performance, reliability, and cost are objective measures, but ease of use is to a large extent in the eye of the administrator. We are eager to hear and respond to your feedback on how we could make our arrays and our documentation easier to use. To that end, we have established an electronic mailbox at <[email protected]>, to which you can send comments on this Guide, as well as on any other aspect of FlashArray ease of use. We have also enabled the PDF form of the Guide for commenting using the free Adobe Acrobat Reader, and encourage you to insert your comments in a pdf and return it to us via electronic mail to the above address.

(11)

Introduction to the FlashArray

Architecture

The FlashArray architecture defines a design for scalable enterprise-class storage systems based entirely on flash solid-state drives. FlashArrays that run the v3.2 release of the Purity Operating Environment consist of between one and four interconnected storage shelves 1 that house solid state drives, connected to either one or two controllers that contain the array’s processors, host interfaces and other components. Compared to conventional disk-based array designs, FlashArray all solid state arrays typically deliver: High performance:

An order of magnitude greater I/OP performance density (measured in I/O operations per second per gigabyte of physical storage)

Performance consistency:

Consistently low latency, free of "spikes" that can afflict disk-based arrays, even those configured with solid state drives

Low operating cost:

An 80% reduction in operating expense (i.e., 20% of the rack space, power, and cooling required for equivalent disk-based capacity, as well as elimination of most of the routine operations required to administer conventional disk-based arrays).

The FlashArray internal design is a radical departure from conventional disk-based array design, however. FlashArrays exploit the unique properties of solid state storage to deliver both higher performance and greater resiliency than is possible to achieve with disks, and to do so at a comparable or lower effective cost per byte.

Modular Hardware Architecture

Each FlashArray is an integrated collection of modular components that includes: Controllers:

One or two controllers, 2 connected to each other via a private Infiniband network, and to hosts via one or more Fibre Channel or 10GbE (gigabit Ethernet) interconnects

Storage shelves:

Between one and four drive chassis, each containing redundant SAS connections to controllers Solid state drives:

SSDs mounted in each storage shelf. An array's first two shelves contain 22 SSDs, with two bays reserved for NVRAMs; each additional shelf contains 24 SSDs

NVRAM:

Two NVRAM modules mounted in each of the first two storage shelves.

FlashArrays scale, both in terms of capacity and I/O performance, by component aggregation; the smallest and largest FlashArrays are constructed from the same building blocks.

1

Definitions for hyperlinked terms are found in the glossary. Only the first mention of such terms is highlighted.

2While Pure Storage Inc. expects that highly available configurations with pairs of controllers will ultimately be the norm, FlashArrays consisting of a single controller and a single storage shelf are completely functional.

(12)

Introduction to the FlashArray Architecture

High Availability

Dual-controller FlashArrays are highly available, with completely redundant components and intercon-nects:

Power and cooling:

Each controller and storage shelf is equipped with redundant hot-swappable power and cooling mod-ules.

Component interconnects:

All paths between controllers and storage shelves are redundant. Drive connections:

Each solid state drive is connected to two or more controller SAS ports via separate Serial Attached SCSI (SAS) buses.

Host interconnects:

A controller's four Fibre Channel or 10GbE ports can be connected to separate storage network fab-rics for redundant host connections. (Whether or not a storage network actually provides redundant connections to hosts depends upon its end-to-end topology.)

The Purity Operating Environment software balances I/O across all connections, and automatically utilizes alternate paths in event of component failure. In addition, the software generates immediate email alert messages to designated addresses when exceptional conditions occur.

Software for Solid State Storage

FlashArrays' uniqueness derives partly from the modular hardware platform, but primarily from the Purity Operating Environment, the operating software that drives them. Purity is specifically designed to utilize solid state storage effectively. Rather than being adapted from disk-oriented designs embodying assump-tions such as multi-millisecond access times, fixed RAID geometry, and data recovery by bulk rebuilding, Purity is optimized for solid state storage in four principal ways:

Data layout:

Purity’s flexible data layout takes advantage of the sub-millisecond access times of solid state stor-age to avoid the rigid layouts that are a necessary consequence of disks' much larger seek time and rotational latency

Capacity utilization:

Purity micro-provisions physical storage, reduces data to minimize the physical space it occupies, and continuously consolidates both free and occupied space to maximize utilization efficiency as data is stored in and deleted from an array

I/O performance:

Purity schedules I/O operations globally across an array to use both internal and external bandwidth efficiently. Moreover, the design completely eliminates the time-consuming log-read-modify-write operations that constrain small write performance in disk-based RAID arrays

Data protection:

Purity dynamically adjusts the parameters of its RAID-3D multi-dimensional data protection in re-sponse to changes in drive error rates and failures, rather than configuring static device groups with the lengthy and disruptive rebuilds they imply.

Because accessing data on solid state storage incurs no seeking or rotational delays, any block of data in a FlashArray can be retrieved as quickly as any other. Purity therefore dispenses with disk-oriented

(13)

Introduction to the FlashArray Architecture

constructs such as data layouts designed to minimize head motion. Instead, it exploits the “all data is equidistant” property of solid state storage to simultaneously optimize utilization, performance, and data reliability.

FlashArray Advantages

Because accessing data on solid state storage incurs no seeking or rotational delays, any block of data in a FlashArray can be retrieved as quickly as any other. Purity therefore dispenses with disk-oriented constructs such as data layouts designed to minimize head motion. Instead, it exploits the “all data is equidistant” property of solid state storage to simultaneously optimize utilization, performance, and data reliability.

For data center managers, the most important FlashArray properties are significantly higher I/O perfor-mance density (IOPS per gigabyte) and more efficient utilization of physical storage capacity than can be achieved with disk-based arrays. With FlashArrays, there is no need (and indeed, no mechanism) for "short stroking" (leaving part of a disk’s capacity unused) or for segregating data sets on separate devices to avoid I/O interference. All of a FlashArray's capacity is a single homogeneous pool of storage. Moreover, Purity minimizes intra-device write amplification (the "extra" I/O operations that solid state drives perform to update flash memory) and essentially eliminates the inter-device write amplification that is due largely to the small write penalty incurred by conventional arrays as they maintain RAID protection in random access I/O environments.

(14)

Part I. An Overview of FlashArray

Hardware and Software

(15)

Table of Contents

1. FlashArray Hardware ... 3

Scalable, Highly-Available Configurations ... 3

High Availability ... 3

Scaling ... 4

FlashArray Hardware Module Details ... 5

Controllers ... 5

Storage Shelves ... 5

Intra-Array Connectivity ... 6

Controller to Drive Connectivity ... 6

Inter-Controller Connectivity ... 8

External Connectivity ... 8

Host Connectivity ... 8

Administrative Network Connectivity ... 8

FlashArray End-to-End Resiliency ... 9

Device and Data Resiliency ... 9

Array-Level Resiliency ... 11

2. The Purity Operating Environment ... 13

Flash Memory: Like Disk, But Different ... 13

Solid State Drives versus Disks ... 13

Disk and SSD Failure Modes ... 14

Purity Operating Environment Goals ... 15

Designing for Flash ... 15

Purity Architectural Highlights ... 16

Data Virtualization ... 16

The Virtualization Map ... 16

Using Virtualization ... 17

Storage Layout ... 17

Storage Layout Structures ... 18

Segment Allocation ... 18

Populating Segments and Scheduling Writes ... 19

RAID-3D ... 20

RAID-3D Flexibility ... 20

Storage Reclamation and Data Reduction ... 21

Storage Reclamation ... 21

Data Reduction ... 22

Data Consolidation ... 22

Data Movement ... 23

Storage Reclamation and Capacity Optimization Priorities and Tradeoffs ... 23

Read and Write Processing ... 24

Write Processing ... 24 Read Processing ... 25 Proactive Troubleshooting ... 26 Information Sources ... 26 Reporting Mechanisms ... 27 Alerts ... 27

The remoteassist Facility ... 28

(16)

Chapter 1. FlashArray Hardware

The FlashArray architecture is designed to simultaneously deliver: High I/O performance:

High I/O performance at consistently low latency compared to disk-based arrays, particularly with transactional and multi-application workloads that generate random I/O patterns

High data reliability:

Best in class data integrity and availability Low cost:

Cost per gigabyte of user data stored lower than that of disk-based arrays.

These properties stem from a combination of modular hardware that is configurable for redundancy and a software architecture that specifically takes full advantage of flash-based solid state storage. This chapter presents an overview of the Pure Storage modular hardware platform. Chapter 2, The Purity Operating

Environment describes the Purity Operating Environment software.

Scalable, Highly-Available Configurations

FlashArrays can be as simple as a single controller and storage shelf, or can configured for high avail-ability and/or extended capacity. From smallest to largest, all FlashArrays are constructed from the same controller and storage shelf building blocks.

A FlashArray controller contains the processor and memory complex that runs the Purity software, buffers incoming data, and interfaces to storage shelves, other controllers, and hosts. FlashArray controllers are stateless—all metadata related to the data stored in an array is contained in storage shelf storage. Thus it is possible to replace an array’s controller at any time with no data loss.

A storage shelf contains either 22 SSDs and two NVRAM modules (first two shelves) or 24 SSDs (third shelf). The NVRAMs provide non-volatile temporary buffering of incoming data. Each shelf includes two

I/O modules (IOMs) that enable two controllers to access SSDs and NVRAMs.

High Availability

A highly available FlashArray consists of: Controllers:

Two controllers interconnected by dual point-to-point Infiniband links Storage shelves:

Between one and three storage shelves. As delivered by Pure Storage, the first two storage shelves in an array contain 22 SSDs and two NVRAM modules. The third contains 24 SSDs. All storage shelves are internally redundant and connect to both controllers.

During normal operation, both controllers are active in the sense that they present disk-like volumes to hosts via all of their Fibre Channel or iSCSI host ports.1

Both controllers have dual connections to all storage shelves, to protect against loss of access to data in the event of a SAS link failure. Host connections can also be made redundant, as long as each controller is connected to them via two independent storage network fabrics.

(17)

FlashArray Hardware

The dual Infiniband controller interconnects protect against failure of an Infiniband path, and enable seam-less failover in the event of a controller failure. Fast failover with no perceptible service interruption makes it possible to perform most maintenance while an array is online.

FlashArrays buffer all incoming data in both NVRAM modules until it has been written to solid state media. NVRAMs are accessible by both controllers. Thus, if a controller fails, the surviving controller can complete or restart in-progress I/O, while continuing to present volumes to hosts without interruption. Finally, Purity’s RAID-3D technology, discussed in Chapter 2, The Purity Operating Environment, pro-tects against data loss due to a minimum of two simultaneous read errors and/or SSD failures, and in many cases, more.

Scaling

Figure 1.1 illustrates how the two FlashArray basic array building blocks--controllers and storage shelves--can be configured in both basic and highly-available, scaled up arrays. The left diagram illustrates an entry-level array consisting of one controller and one shelf. Adding storage shelves increases capacity; adding a second controller makes the array highly available, able to survive failure of any single compo-nent, up to and including an entire controller.

Figure 1.1. Entry-Level and Highly Available FlashArray Hardware

Configurations

In addition to making an array highly available, adding a second controller pair increases: External I/O performance:

The additional controller adds four host connection ports to the array Internal performance:

The additional controller adds a processor complex, cache, and internal I/ O bandwidth for accessing solid-state drives.

Whatever its size, a FlashArray presents a single system image that exports all volumes to all connected hosts via all of its Fibre Channel or 10GbE iSCSI ports. With the exception of a few infrequent operations such as firmware upgrades, a FlashArray can be administered from either of its controllers’ administrative network ports.

(18)

FlashArray Hardware

FlashArray Hardware Module Details

Although array-level high availability stems primarily from redundant components and interconnects, the principal FlashArray hardware modules are also internally resilient to most component failures, as the paragraphs that follow describe.

Controllers

Each FlashArray controller is a 2U rack-mounted chassis that houses: Main board:

Contains the 12-core processor complex that runs the Purity software, DRAM used to hold Purity code and for data buffering and staging, 7 PCI Express slots containing SAS, Infiniband, and host interface cards, and other internal components and interfaces

Boot drive:

A SSD that holds two local copies of Purity for booting convenience, as well as log records containing diagnostic and service information

Host interfaces:

Either (a) two dual-port 8 Gb/s PCI Express Fibre Channel interface cards, or (b) two dual-port 10GbE (iSCSI) cards, that provide the controller’s four host ports

Storage shelf interfaces:

Two dual-port PCI Express SAS interface cards whose 4-lane, 6 Gb/s ports provide a total maximum of 96Gb/s data transfer capability to storage shelf I/O modules

Inter-controller interfaces:

A dual-port PCI Express Infiniband interface card, used to interconnect the controllers in a high-ly-available FlashArray

Administrative interfaces:

GbE Ethernet ports, one of which connects to a network through which an array is administered, and video, serial, and USB ports used for initial configuration. Once configured, FlashArrays are administered from network workstations using either the browser-based GUI or the CLI.

Power and cooling:

Redundant cooling fans and hot-swappable power supplies.

Storage Shelves

Pure Storage ships storage shelves fully populated with either 22 SSDs and two NVRAMs or 24 SSDs, but FlashArrays operate correctly and at full data redundancy (albeit at lower physical capacity) when drives are removed. RAID-3D technology protects segments of data rather than entire SSDs, so it is not necessary for all SSDs in an array to have equal capacities.2

Because Purity can support SSDs of different capacities simultaneously, storage capacity can be upgraded non-disruptively, drive-by-drive, while an array is online. When a drive is inserted in a storage shelf bay, Purity determines its operating parameters based the capacity and model type it reports.

Pure Storage ships drives pre-installed in carriers that, when inserted in a storage shelf bay, connect to both of the shelf ’s SAS I/O modules, providing redundant access to both controllers in highly available arrays.

(19)

FlashArray Hardware

Administrators can install and remove carriers while an array is online. No tools are required for SSD removal or installation (a Torx driver, supplied by Pure Storage, is required to unlock and lock NVRAMs), nor are any array administrative operations required to accommodate drive removals and additions.

Intra-Array Connectivity

All interconnections between FlashArray components are redundant. These include: Controller-to-controller:

The two controllers in a highly available array are interconnected by two Infiniband links Controller-to-drive:

Each SSD and NVRAM is connected to an array's controller, or to both controllers in a highly available array, by two SAS links.

Controller to Drive Connectivity

Within each FlashArray storage shelf are two I/O modules (IOMs), containing SAS expanders that provide redundant paths between all drives (SSDs and NVRAMS) in the shelf and one or two controllers. Each drive is housed in a carrier, illustrated in Figure 1.2 containing an interposer that connects the drive to both I/O modules, allowing a controller to address any drive via either SAS path.

Figure 1.2. Drive and Interposer in Carrier

SAS links connect the I/O modules in a shelf to PCI Express SAS interface cards mounted in both con-troller chassis. Figure 1.3, reproduced from a FlashArray model FA-420 Installation Guide, is a schematic example of drive connection redundancy in a highly available array with two storage shelves. Each port on a controller's SAS interface card is part of a loop that connects one of the I/O modules in each storage shelf to it and to a port on a SAS interface card in its partner controller.

(20)

FlashArray Hardware

Figure 1.3. Example of Redundant FlashArray Controller to Storage Shelf

Connectivity

Each drive connects to both of its storage shelf’s I/O modules, providing continued connectivity in the event of failure of any component in the drive-to-controller I/O path:

Controller failure:

If a controller fails entirely, its partner controller remains fully connected to all storage shelves and the drives within them on both paths

SAS interface card failure:

If one of a controller’s SAS interface cards fails, the controller remains connected to all storage shelves and drives via its other SAS card (dark blue and light green paths or light blue and dark green paths), and the partner controller remains fully connected to both paths

Drive connection failure:

If any one segment of a SAS path is broken, one controller remains fully-connected to all drives, and the other remains connected via its alternate path

(21)

FlashArray Hardware

I/O module failure:

If a storage shelf I/O module fails, all drives remain connected to both controllers through the shelf's other I/O module.

Purity detects these failure cases and adjusts its I/O path management automatically. The software logs events that affect availability and generates electronic mail alerts to inform administrators of component failures, so they can be remediated before service is disrupted.

Inter-Controller Connectivity

The dual-path Infiniband network that interconnects an array's controllers is redundant. Highly available arrays running Purity v3.2 include one controller pair; each controller connects directly to its partner’s two Infiniband ports. In the future, scalable arrays that include two or more controller pairs will utilize Infiniband switches to provide redundant symmetric communication among all controllers. Should one Infiniband path (link, switch, or interface port) fail, exchange of data and state information continues on the other path.

In addition, each controller's four host (Fibre Channel or iSCSI) ports enable it to connect redundantly to two or more storage network fabrics if they are available. Actual redundancy of host connections is determined by the design of the storage network.

External Connectivity

Similarly, connections between a highly available FlashArray and its client hosts can be configured re-dundantly to protect against storage network failures.

Host Connectivity

Each FlashArray controller includes two PCI-to-Fibre Channel interface cards, each with two ports that can be connected to a Fibre Channel storage network fabric (or two separate fabrics for completely redundant host-to-array connections), as Figure 1.4 illustrates.

Figure 1.4. Redundant Host Connections via Separate Fabrics

Administrative Network Connectivity

Each FlashArray controller includes two gigabit Ethernet network ports and a serial port, any or all of which can be connected to administrative workstations running virtual terminal emulation packages to monitor and control the array using the Purity CLI. The Ethernet ports also enable browser-based administration using the Purity GUI. An array with multiple controllers can be completely administered from either of

(22)

FlashArray Hardware

these ports on any of its controllers, with the exception of a few component-specific operations such as controller reboots and firmware upgrades.

FlashArray End-to-End Resiliency

While SSDs are more reliable than rotating magnetic disk drives, when they do fail, they fail in ways different from the typical failure modes of rotating magnetic disks. The FlashArray architecture is specif-ically designed to provide array-wide resiliency while delivering consistent performance from flash-based storage. The FlashArray architecture provides resiliency on three levels:

Array availability:

All FlashArray hardware components are redundant, enabling arrays to sustain all single component failures, as well as many multiple-component ones without impacting availability. In addition, most failed components can be replaced while the array is operating

Data Resiliency:

Multiple checksums (parity and other) ensure that data written by hosts is returned as written every time. Detection and correction of data errors is particularly important with flash, which has a higher raw bit-error rate than magnetic disk storage. Read and Write Processing describes data error detection and correction in the I/O path.

Performance Consistency:

The architecture exploits the fast access time of flash storage to maintain consistent high performance as arrays ride through failure and maintenance events.

The sections that follow describe the architecture that enables FlashArrays to continue to deliver high-performing, reliable data access services even when component failures and data errors occur.

Device and Data Resiliency

Each FlashArray storage shelf contains twenty-two SSDs and two NVRAM modules. The SSDs use the SATA protocol to communicate with an interposer that converts the protocol to dual-channel SAS (see Figure 1.3), so that each drive is connected through SAS expanders to both controllers in a HA pair, as Figure 1.4 illustrates.

The NVRAMs use the SAS protocol to communicate with controllers. Using the SAS protocol provides both controllers in a HA pair with direct access to all storage, and moreover, implements the key SCSI persistent group reservation feature, used by Purity to determine controller “owns” a device at any given instant.

The storage shelves that house SSDs and NVRAMs include redundant fans, power supplies, power distri-bution modules, and SAS I/O modules (IOMs). A passive mid-plane in the shelf chassis provides redun-dant connections between all active components.

Drive Identification, Authentication, and Quorum

Unlike disk-based arrays with their fixed RAID group geometries, FlashArray data protection is done at the data object level. As the Purity software continuously accommodates new data and reduces and reoptimizes existing data, it augments each protected data object with metadata that makes every drive self-describing and allows array controllers to be stateless.

In addition, each drive contains a unique signature, identifying it both as a FlashArray drive and as part of a particular array’s drive set (called an apartment by Purity). When an array boots, Purity reads this configuration information from each drive. The software requires a quorum, or minimum number of drives

(23)

FlashArray Hardware

with correct signatures before beginning operation. Incorrectly formatted drives are considered “foreign,” and are rejected.

As a consequence of these features, one can literally power an array off, rearrange the drives, and power it on again without affecting the correctness of the data they contain.

RAID-3D™

Unlike conventional RAID’s drive-level protection, FlashArrays’ RAID-3D technology protects individ-ual data objects a few megabytes in size, called segments. Each segment is distributed across a subset of an array’s drives.

As data enters a FlashArray, it is checksummed, reduced, copied to NVRAM, and stored in a DRAM

segment buffer, along with descriptive metadata. Segment buffers may contain a mixture of newly entering

data and data that Purity is reoptimizing (e.g., reducing or consolidating). RAID check data is computed and stored both within each segment buffer and across segment buffers.

When a set of segment buffers fills, Purity computes overarching RAID check data, and writes each buffer to a different SSD. The SSDs for each segment are chosen to equalize both loading and wear. Writes are partly serialized so as to minimize the potential impact of (lengthy) write operations on (much shorter) read operations.

Purity determines the properties of each segment (number of drives it occupies and its RAID-3D protection parameters) dynamically, based on conditions within the array. As a consequence, each segment may have different placement and protection parameters. At a minimum, however, each segment is protected against loss in the event concurrent failure of two SSDs that contain data from it.

Recovering from Drive Failures

An important consequence of RAID-3D segment-level data protection is that failure of an SSD is effec-tively a loss of physical capacity rather than of protection. When an SSD fails, or is removed from an array, Purity determines which segments are affected, and for each one, either reconstructs the missing piece from the surviving pieces, or marks the segment as a high-priority candidate for background storage recla-mation and data reduction. In both cases, reconstructed data is distributed among the array’s remaining SSDs. Because the RAID-3D minimum protection threshold is two concurrent failures, the array responds identically if a second SSD failure occurs while data from the first failure is still being reconstructed. Once recovery is complete, an additional double failure is sustainable, and so forth, until there is insufficient physical space to reconstruct data.

Reconstruction of a failed SSD’s contents is both immediate upon discovery of the failure and automatic. No administrative intervention or dedicated spares are required. Reconstruction time varies, from a few tens of minutes to hours, depending on the amount of data to be reconstructed and the host I/O load on the array. The end result is fully-protected data in an array whose physical storage capacity is reduced by that of the failed drive. When a failed SSD is replaced, the replacement drive becomes part of the pool of available storage. Initially, a replacement SSD has a high probability of being selected for space allocation due to its low occupancy and short time in service.

In larger arrays, SSDs are organized as multiple “zones,”each of which is a separate failure recovery domain as described in the preceding paragraphs. Thus it is possible for a large array to sustain multiple double failures of SSDs in separate zones.

NVRAMs and Resiliency

FlashArrays copy all data written by hosts to two high-performance NVRAM modules immediately upon its entry to the array. Data remains in NVRAM until it has been reduced (compressed and deduplicated)

(24)

FlashArray Hardware

and written to an SSD as part of a segment. When the segment buffer containing a block of data has been written to an SSD, the data is persistent and protected by RAID-3D, so the NVRAM copies are obsoleted. As soon as it has made two NVRAM copies of an incoming data block, the array sends a write completion notification to the host. Thus, from the host’s perspective, write latency is very low.

NVRAMs preserve the integrity of writes that have been acknowledged but whose data has not yet been reduced and written to SSD media, should an array failure (such as a power outage) occur between those two events. When an array restarts after such a failure, it re-processes any host-written data that remains in NVRAM. Thus, data conveyed to a FlashArray by a host write that has been acknowledged is persistent, even if an array failure occurs before it has been stored on SSD media.

FlashArray NVRAMs are capable of retaining data for a minimum of three months with no external power supplied. Each FlashArray storage shelf contains two NVRAMs (in slots 0 and 23). In single-shelf arrays, data from each host write is stored in both. In arrays with two or more shelves, each block of incoming data is written to two NVRAMs, in a rotating pattern designed to equalize load on the devices.

When an NVRAM fails, the array generates an alert, but continues to operate normally (from the host perspective). A single-shelf array with a failed NVRAM is able to recover from a power failure, but would be susceptible to a second NVRAM failure. In a multi-shelf array, Purity generates an alert and flushes the redundant copies of data held by the failed NVRAM. Host write performance may decrease slightly during flushing, but the array continues to function normally, making redundant NVRAM copies of incoming data (unless only one NVRAM is functioning). Should all of an array’s NVRAMs fail, the array stops accepting host writes, but continues to process the data it already holds It writes all data accumulated in segment buffers to SSD storage, and continues to execute hosts’ reads.

Array-Level Resiliency

Controller Resiliency

Each FlashArray controller is internally resilient, with redundant fans, power supplies, power distribution, Fibre Channel (or 10GbE) host interfaces, SAS and Infiniband intra-array interfaces, and 1GbE adminis-trative network ports.

In a highly available FlashArray, the two controllers intercommunicate via a redundant Infiniband net-work. Provided that hosts are connected to both controllers, failure of a controller (or taking a controller out of service for maintenance) need not disrupt host access to data. Current FlashArrays consist of one controller or one controller pair. The architecture, however, accommodates scale-out capacity and perfor-mance expansion by interconnection of multiple controller pairs and the storage shelves behind them into a “single system image”array.

Host Connectivity

Each FlashArray controller has 4 host connection ports (8Gb/s Fibre Channel or 10GbE iSCSI). In a highly available array, both controllers handle incoming host I/O requests, so all of the array’s volumes, seen by hosts as LUNs, are accessible via all array ports (unless access has been restricted by storage network zoning).

For fully redundant host access, each controller should have at least one port connected to each of two separate storage network fabrics, to which hosts should also be connected (see Figure 1.4). Where available on the host side, multipathing should be configured, with round-robin scheduling among the visible array ports. With such a configuration, the FlashArray remains accessible with the best possible performance in the presence of any failure in the host connection path (HBA, host port, switch, cabling, array port, array IO card).

(25)

FlashArray Hardware

Controller High Availability

The FlashArray employs redundant active/active storage controllers to ensure high availability of the over-all array. The controllers are connected in two ways; by dual Infiniband links and by the dual access to the shelves of SSDs via SAS. The Infiniband links are used to transfer I/O requests and metadata updates between the two controllers. The SSDs are used to determine quorum between the controllers (i.e. health status and role).

Under normal circumstances both controllers in a highly available array receive host I/O requests, but overall workflow within the array is determined by a single primary controller. The two controllers also use the Infiniband network synchronize state and exchange work items as necessary. When a controller detects a potential failure of its partner, the two use SAS reservation “signatures” written on the SSDs to determine which of them should assume the primary role.

If a controller fails, I/O request execution continues normally. Hosts detect failed I/O requests addressed to the failed controller, usually by timeout. If storage network topology and host multipathing configuration permit, host re-issue failed requests to the surviving controller. Because both controllers have access to all SSDs and NVRAMs, in-flight I/O requests can either be completed by the surviving controller or reissued by hosts, storage network topology and host multipathing configuration permitting.

Non-Disruptive Software Upgrades

Purity software and firmware can be upgraded while an array is online executing host I/O requests. To apply an upgrade, one controller is powered off and rebooted so that upgrades can be applied. (Powering a controller down causes hosts to re-issue I/O requests to the array’s other controller.)

When upgrading is complete, Purity starts, and the upgraded controller joins the array (typically a 3-6 minute operation). The second controller is then powered off and upgraded in like manner. As long as hosts have full connectivity to both controllers, there is no service disruption.

Protecting the Integrity of Host-Written Data and Metadata

Because the FlashArray is a highly-virtualized storage system, extensive use and reliance on metadata are part of the core design. As such, protection and assurance of the integrity of both user data and system-lev-el metadata is of paramount concern. The FlashArray implements a complete end-to-end Data Integrity Fabric, described below, to ensure protection against and recovery from data consistency issues.

User Data Integrity

Purity protects all host-written data with multiple independent checksums to ensure both correct delivery of data and delivery of correct data. As each host-written sector enters an array, Purity computes a hash code that is stored with the (reduced) data and verified each time the data is read. This ensures the correctness of data along the entire software and storage path within the array. (The hash code is also used to identify potential duplicate sectors).

Purity packs both host-written data and metadata “logs” that describe current data location and content in segment buffers for writing to SSD media. The software computes a checksum over each segment buffer page. Checksums are “salted” with their page addresses, making is possible to verify both data and metadata and virtual page location upon reading. This scheme protects against SSD read errors, both in data and positioning. Moreover, segments are self-describing, so in extreme cases, an array’s entire body of metadata can be reconstructed from the contents of its SSD segments.

(26)

Chapter 2. The Purity Operating

Environment

Externally, a FlashArray resembles a conventional disk array that exports block storage volumes to hosts via front-end host connection ports attached either to Fibre Channel storage network fabrics or directly to the hosts Fibre Channel interfaces. As with conventional disk-based arrays, hosts communicate with storage network targets, and direct I/O requests to the targets' numbered logical units (LUNs 1), each corresponding to a storage volume.

Flash Memory: Like Disk, But Different

FlashArrays, especially the Purity Operating Environment software, are designed specifically to deliver maximum benefit from flash storage. As such, they abandon many of the common principles of disk-based array design, and instead adhere to alternate principles that derive maximum benefit from flash SSDs. The reason for the fundamentally different design is quite simply that flash SSDs behave differently than disks in some important ways.

Solid State Drives versus Disks

The atomic unit for reading and writing data on disks is the 512-byte sector. Disks read and write data, and protect it with ECC in 512 byte units, so host reads and writes always devolve to multiples of that size. The analogous unit for flash memory is the flash page. Flash pages, typically 4 or 8 kilobytes in size, are similar to disk sectors in that they are the units in which drives read data, and over which ECC is calculated. Writing data to flash memory differs from writing to a disk, however.

A disk can directly overwrite the data in any individual sector with no effect on other sectors. Flash mem-ory, on the other hand, can only be written if it is in a baseline state. Once data has been written to a flash page, it must be “erased” before being overwritten. Solid state drives erase flash memory in large (~256 to 1,024 kilobyte) units called erase blocks. After erasure, each individual page in an erase block can be written once, but cannot be overwritten until its entire erase block has been erased again.

Disks also virtualize the storage they present. Hosts address disk sectors by a dense set of numbers between zero and the disk’s capacity. Internally, however, a disk may remap sector numbers to alternate media areas over time. They do this mainly to avoid using sectors that they discover to be defective.

Solid state drives virtualize their flash pages as well. Hosts address virtual flash pages using numbers that are analogous to disk drive sector (often called logical block) numbers, but they have no awareness of the pages’ locations within the devices.

SSDs’ virtualization mechanisms attempt to minimize the effect of block erasure on hosts as far as possible, and in addition, attempt to equalize overwrites across all flash memory in order to maximize drive lifetime. A physical flash page may store data for thousands of different virtual page addresses during an SSD’s lifetime.

Block erasure is functionally transparent to hosts, which read and write numbered virtual flash pages just as they would disk sectors. Typically, SSD reads significantly outperform disk reads, because there is no seeking or rotational latency. Writes often encounter unexpectedly large (10-20 times normal) latencies,

(27)

The Purity Operating Environment

however, because they trigger large block read-erase-overwrite cycles. These variation are especially no-ticeable in random workloads because small writes to random page addresses have a high probability of causing large block erasures. One of Purity’s key advantages is that it overcomes SSD write latency vari-ability, and delivers consistently low latency in response to any pattern of host requests.

Disk and SSD Failure Modes

In addition to read errors corrected by drive ECC mechanisms, both magnetic disks and SSDs have three common failure modes:

Complete drive failure:

Drive completely unresponsive Read error:

Uncorrectable errors discovered by ECC mis-compares when data is read Undetected read error:

Erroneous data delivered due, for example, to disk head mis-registration when reading or when data was originally written, or to pathological ECC failures that are not detected and reported by drive mechanisms.

With disks, these failures tend to be abrupt; they occur with no or relatively little warning, and once they occur, they are typically permanent. With SSDs, however, complete drive failures are relatively rare. Read errors tend to increase as a function of device and data age (measured in number of overwrites and in months or years since most recently read respectively), and can often be recovered by refreshing data, if it can be recovered by RAID rebuilding techniques.

Disk-based arrays protect data against loss due to these failures by storing RAID check data calculated over corresponding data blocks on groups of drives. While they are effective at recovering unreadable data, device-based RAID groups have some limiting side effects:

Hardware requirements:

They require dedicated “spare” drives to automate recovery from drive failures Write amplification:

They inherently “amplify” small host writes, because each one requires a “log incoming data-read previous data-calculate check data-write sequence to maintain synchronization of check data with host-written data

Degraded performance during rebuilds:

Application performance can degrade seriously for hours due to I/O interference while the contents of a failed disk are rebuilt on a spare

Configuration inflexibility:

Once a RAID group is created, reformatting it (for example to add or remove a drive, or to change stripe unit parameters) is very time and resource consuming.

Despite these limitations, disk-oriented RAID techniques are appropriate for conventional arrays, because disk failures tend to occur with little notice and be permanent, and because disk seek and rotational latencies effectively require geometrically regular, unchanging on-media data layouts.

SSDs do not incur large up-front latencies when data is accessed, and moreover, their dominant failure mode is localized read errors for which several alternative low-impact recovery techniques can be em-ployed. On the other hand, it is usually appropriate to augment individual SSDs’ error detection

(28)

mecha-The Purity Operating Environment

nisms so that RAID-like recovery mechanisms can be employed to recover data from read errors that elude the devices’ own detection mechanisms.

Purity Operating Environment Goals

The Purity software architecture addresses the three fundamental challenges in flash array design: Cost:

The per-byte cost of flash memory is considerably greater than that of rotating magnetic disk. Purity keeps effective cost per byte low by reducing (deduplicating and compressing) data extensively, by ultra-efficient “micro-provisioning” of physical storage, and by “hardening” commodity class SSDs so that they are suitable for use in the data center environment

Data reliability:

Purity uses two basic techniques to maximize data reliability. First, it maximizes SSD life by (a) bal-ancing long-term write load across all of an array’s SSDs, and (b) using both NVRAM and DRAM as buffers to minimize write amplification. Second, it dynamically adapts RAID-3D protection para-meters each time it allocates storage for incoming data, in order to compensate for missing drives, fluctuating SSD error rates, and other changes in conditions within the array

I/O performance:

In addition to passing on the benefit of fast SSD access to hosts, Purity uses several mechanisms to keep I/O latency predictably low. For example, the software caches both data and metadata extensively so most small read requests are satisfied without SSD access. Moreover, SSD writes are scheduled to minimize the probability of a lengthy write blocking short reads. In some situations, the software rebuilds data to satisfy reads because it determines that to be faster than waiting for a blocking write to complete.

Designing for Flash

Designing an array solely for solid state storage means adhering to different principles than those that govern disk array designs. The most significant ways in which Purity differs from a disk based array software design are that it:

Reads and writes anywhere:

Because every block of data in an all-solid state array can be retrieved in roughly equal time, Purity prioritizes write efficiency and long-term I/O balance well above virtual volume address as it deter-mines where to place incoming data

Maximizes the value of writes

Writing flash memory is "expensive" both because it often implies time-consuming read-erase-over-write cycles and because it increases flash cell wear, resulting in increased read error rates. Purity structures and schedules SSD writes so as to minimize drives’ internal write amplification, and to derive the maximum utility from each one

Reduces data aggressively:

Purity uses FlashArray controllers’ abundant processing power to reduce the size of data by dedupli-cating and compressing it as it enters the array, and to increase reduction efficiency throughout its life Provisions storage precisely:

Freed from the need to manage physical storage by disk sector, Purity provisions exactly as much storage as each reduced data block requires. Purity allocates space as data arrives, so each block of SSD storage is typically occupied by data from multiple volumes. There is no space wasted by partially filled fixed-size "chunks"

(29)

The Purity Operating Environment

Adapts to changing error rates:

Purity adjusts RAID-3D™ parameters dynamically as SSD error rates and other conditions within an array change over time, and also according to the nature of the data stored in each storage segment. The result is the appropriate level of protection for each segment of solid state storage at the time that data is written in it.

All of these principles would be either inappropriate for or impractical to apply in disk-based arrays. Together, they are the essence of what makes the Purity Operating Environment software “purpose-built for flash.”

Purity Architectural Highlights

Five aspects of the Purity architecture are especially instructive in illuminating how the software is specif-ically designed for the properties of flash solid state storage:

Data virtualization Storage layout

RAID-3D data protection

Storage reclamation and capacity optimization The I/O path traversed by host read and write requests.

The sections that follow describe these aspects of the architecture.

Data Virtualization

A FlashArray exports disk-like storage volumes to hosts. In a host’s view, a volume consists of a consec-utively numbered set of 512-byte sectors used for storing and retrieving data. As with any disk, the number of sectors is fixed (unless changed by a relatively infrequent administrative operation).

Internally, Purity completely virtualizes its volumes. The software allocates only as much physical storage as is required to hold volume blocks (sequences of sectors) in which hosts have written data. Moreover, it moves data periodically to consolidate it and to reclaim unused storage. Volume blocks’ physical locations in an array have no fixed relationship to their virtual sector addresses.

The Virtualization Map

To manage constantly changing virtual-to-physical relationships, Purity maintains a persistent

virtualiza-tion map, illustrated in Figure 2.1, that relates each virtual sector stored in an array to a physical locavirtualiza-tion.

The structure of the map allows Purity to re-size volume blocks and to relocate them anywhere within an array, for example, when it re-compresses or consolidates them.

(30)

The Purity Operating Environment

Figure 2.1. The Purity Virtualization Map

When a host reads data, Purity locates it via the virtualization map. When hosts write data, Purity reduces it, allocates storage and writes it, and updates the virtualization map accordingly. There is no “in-place” overwriting of data. Space that contains data superseded by overwriting is continuously reclaimed for other use by a background task.

Using Virtualization

Purity takes advantage of volume virtualization in several ways. For example: Write balancing:

The software selects SSDs on which to write data so as to equalize long-term write activity across an array

Space reclamation:

When reclaiming space occupied by superseded data, the software moves adjacent data that is still “live” to whatever location in the array is most appropriate at the time

Data placement:

When opportunistically reducing already-stored data, the software moves the reduced data to whatever location that is most appropriate at the time of optimization.

Purity does not directly overwrite superseded data. Instead, it reduces the superseding data, moves it to write buffers that are currently being populated, and updates the virtualization map with the physical lo-cations to which the buffers are scheduled to be written when filled. The software reclaims the space oc-cupied by overwritten data during an on-going storage reclamation and optimization process.

Storage Layout

The Purity storage layout is designed to meet two main goals: Cost:

(31)

The Purity Operating Environment

Performance:

Maximize performance and device lifetime by (a) balancing write load across all devices and (b) minimizing write amplification (implied “extra” writes that occur because of SSD block erasures and RAID parity updates).

Storage Layout Structures

Purity allocates and organizes data in solid state storage in conceptually rectangular structures suggested by Figure 2.2. The fundamental organizational unit is the write unit, an erase block-aligned group of con-secutive logical pages2 on a single SSD. Purity reads data in units of one or more logical pages, and writes data in write units that consist of a fixed number of logical pages.

Purity allocates storage for writing data in units called segments, spread across a subset of an array’s SSDs. Each SSD in a segment's subset contributes a column of adjacent write units to the segment. The software determines RAID-3D parameters individually for each segment it allocates. (See RAID-3D for a discussion of RAID-3D.)

Figure 2.2. The Purity On-Media Data Layout

Segment Allocation

Purity allocates solid state storage in segments consisting of a column of write units on each of several SSDs. To allocate a segment, the software does the following:

Location selection:

Selects a subset of the array’s SSDs based on criteria that include storage capacity balance, number of active drives in the array, drive read error rates and type of data expected to be stored

2The logical page is Purity's atomic unit for reading data. It consists of one or more flash pages (the unit of data to which an SSD applies its internal ECC protection).

(32)

The Purity Operating Environment

Protection selection:

Determines appropriate RAID-3D protection parameters based on the same criteria Space reservation:

Reserves enough free space for a column of write units on each SSD in the subset Buffer allocation:

Allocates DRAM buffers corresponding to each write stripe in the segment.

Populating Segments and Scheduling Writes

Purity loads incoming data into segment buffers “horizontally,” one write stripe at a time. Typically, the software populates several write stripe buffers simultaneously, each optimized for a specific type of data, and each with its own RAID-3D parameters. As it moves blocks of reduced data into a buffer, the software mirrors them in NVRAM for persistence in case of power failure, calculates RAID-3D check data, and adds log entries to reflect updates to its virtualization map.

When a write stripe’s buffers are fully populated, Purity schedules a write for each of its write units. Purity always flushes entire write stripes for three reasons:

Efficiency:

It uses drive bandwidth efficiently, because writes are relatively large Minimal SSD write amplification:

It minimizes SSDs’ internal write amplification (read-erase-overwrite cycles), because buffers are aligned with erase blocks and entirely filled with useful data

Minimal array write amplification:

It minimizes cross-SSD write amplification, because each write stripe contains one or more complete RAID rows.

Until all write units in a stripe have been successfully written, Purity retains the data they contain in NVRAM. This has two important consequences:

Host responsiveness:

If an array controller restarts (e.g., due to external power failure) after part of a write stripe has been written, the software reconstructs the entire write stripe from NVRAM contents and rewrites it. Thus, once a host’s write has been acknowledged, data is persistent, even if a controller must restart before it has been completely written to solid state storage

Write staggering:

Individual write units can be written in sequence rather than concurrently. Because SSD writes can occupy drives for 10-20 times as long as typical reads, Purity staggers writes to keep as many devices as possible available for reading. Staggering is feasible because data can be retrieved from NVRAM if necessitated by a controller restart.

(33)

The Purity Operating Environment

RAID-3D

Figure 2.3. RAID-3D Protection Spheres

RAID-3D, illustrated conceptually in Figure 2.3, is designed around protection against data loss due to the common failure modes of flash SSDs rather than those of rotating disks.

RAID-3D protects both against3 drive read errors, and against failures that affect entire write units and drives. It exploits SSDs' low access times to provide more comprehensive protection with lower impact on performance than is feasible with disk-based arrays. RAID-3D includes three orthogonal tiers of protection that Purity applies to each write stripe:

Tier 1:

Uses per-logical page checksums to detect read errors that are not corrected by drive mechanisms. Level 1 detection is particularly helpful with latent read errors exposed during rebuilds of failed drive contents

Tier 2:

Uses RAID-5 parity calculated over subsets of the write units in a stripe. Recovers from all single write unit read failures and many multiple failures

Tier 3:

Uses RAID-6 type check data calculated over an entire write stripe. Recovers from all double failures and many situations involving failure to read more than two write units.

Each tier of RAID-3D protects against progressively more severe error scenarios, and naturally, recovery at each tier takes longer and has a greater impact on processing and I/O resources. Purity attempts to recover from read errors using Tier 2 mechanisms before resorting to those of Tier 3. At all tiers, however, RAID-3D covers the data in a segment, rather than the conventional disk-based array practice of covering an entire group of drives.

RAID-3D Flexibility

Purity determines RAID-3D parameters dynamically each time it allocates a segment of storage for writing, based on conditions within the array and the type of data for which the segment is to be optimized. For

3For example, Purity may rebuild data to satisfy a host read request rather than waiting for a lengthy write to complete so that a read can be scheduled. In disk-based arrays, rebuilding data is usually considered a last resort, to be employed only in cases of hard read failure.

References

Related documents

Exposed Hardware Color and Finish: As selected by Architect from manufacturer's full range.. Locks and Latches: Allow unobstructed movement of the sash across adjacent sash in

We refer to our notice of the [insert date] submitted under the provisions of clause 4.11 of the conditions of contract and our letters of the [insert dates] enclosing the

However, hosting providers need to be able to survive node failure as well as disk failure, so a cloud storage solution should also be able to duplicate objects across multiple

An innovative, knowledge-based and competitive region, with a high quality environment, first class infrastructure, visionary leadership and a quality of life for its

If an option other than “Device space limited” was selected, then logging will automatically stop once the log file reaches the specified size.. If “Device space limited”

Firstly(i) if the observed changes in the BOLD activity in imaging studies are actually caused by dopamine neurons during learning then the the BOLD signal

2 Digestion includes the physical reduction in size and the chemical breakdown of food particles in the gastrointestinal tract. Digestion begins in the mouth and con- tinues through

[r]