Storage Area Networks; Unclogging
LANs and Improving Data
“This important technology is moving into
the mainstream in distributed networking
and will be the normal, adopted way of
attaching and sharing storage in a few
Strategic Research Corporation
Kevin J. Smith
Table of Contents
Server-Dependent or Server-Independent Storage ...3
SAN Taxonomy ...4
Fibre Channel SAN’s ...6
SAN’s and Clusters...9
SAN’s and NAS ...10
SAN-Attached RAID Array Requirements ...11
Mylex External Array SAN Controllers...14
Mylex Product Line SAN Features ...15
Copies of this and other white papers may be obtained from the Mylex web site (www.Mylex.com).
RAID Controllers Are Not Created Equal; Many Will Not Survive on Wolfpack-II Clusters DAC960PJ/PR Two Node Clustering
ServerNet is a registered trademark of Tandem Computer Incorporated. VAXClusters is a registered trademark of Digital Equipment Corporation. NetWare is a registered trademark of Novell.
Windows NT Server is a registered trademark of Microsoft. ESCON and SSA are registered trademarks of IBM.
Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.
The key challenges facing IS executives are satisfying increasingly diverse networking requirements and reducing network complexity to lower total cost of ownership (TCO). Success depends on efficiently delivering:
• High bandwidth for warehousing and web-based applications, i.e. multimedia,
• Low, predictable latency for time-sensitive applications, e.g. video conferencing,
• Performance and resiliency for mission critical applications, e.g. OLTP.
When computing resources were used only for internal operations, the cost of information bottlenecks and network failures was limited to lost productivity. However, as computing resources are used to engage customers, as well as manage operations, bottlenecks and network failures translate into lost business and lost productivity.
A primary benefit of Storage Area Networks (SAN’s) is unclogging network arteries by moving bulk data transfers off client networks and onto specialized sub-networks, often referred to as the networks behind the servers.
Figure 1. LAN’s and SAN’s
With SAN’s, pools of storage (and related traffic) are removed from the LAN, externalized and shared by mainframes, UNIX and PC servers. In addition to de-congesting client networks, cross-platform data sharing and amortizing storage costs across servers, a SAN topology adds value by providing:
• Flexible, modular expansion by de-coupling server and storage investments,
• Bandwidth and capacity scaling by eliminating SCSI and PCI bottlenecks,
• Increased fault tolerance and availability with redundant paths to data,
• Increased application performance with multiple gigabit links to data,
• Simplified systems integration and enriched storage management,
• Improved data protection and security through centralization, and
A paradigm shift from centralized to distributed storage began in the 1980’s driven by peer-to-peer networks, inexpensive UNIX and PC servers and the notion that moving computing and storage resources closer to workgroups would increase productivity. The result was islands of computing and disparate networks tied together by gateways. IS managers were faced with multiple copies of inconsistent data, networks that were expensive to manage and corporate assets (data) that were difficult to access and vulnerable to intrusion. The AberdeenGroup, a respected market research firm, refers to this environment as server-dependent storage (Figure 4).
Figure 2. Server-Dependent Storage
UNIX UNIX NT
Emerging SAN technologies mirror today’s LAN technologies with gigabaud shared and dedicated bandwidth. The AberdeenGroup advises: “Unless enterprises view and implement storage as if it were part of a giant network across the enterprise, they will pay too much for their storage and will face extreme, labor-intensive difficulties in performing vital storage-related functions, such as managing the storage and backing up and moving critical data.” While giant SAN’s may someday become a reality, local SAN’s with server-independent storage are being deployed today.
Figure 3. Server-Independent Storage
UNIX UNIX NT
SAN: Storage Area Network or System Area Network?
SAN is one of the more overloaded acronyms in computer jargon; its meaning is context sensitive. To systems people, SAN means System Area Network, and to storage people, SAN means Storage Area Network. Some people consider both definitions synonymous. However, while System Area Network and Storage Area Network topologies can be similar or even identical, there is an important distinction between the two technologies.
System Area Network
A System Area Network is a specialized network used in clusters configurations for node-to-node, node-to-device (primarily disk), and device-to-device communications that provides both high bandwidth and low latency. Low latency is the distinguishing characteristic of a System Area Network. Short message latency across a System Area Network is generally less than 10 microseconds, an order of magnitude less than Fibre Channel or Gigabit Ethernet. Low latency is a prerequisite to high performance for applications distributed across cluster nodes, e.g. parallel DBMS’s. Instances of a distributed application in a cluster environment frequently exchange messages to synchronize program execution or access to shared resources. Most System Area Networks use proprietary protocols, however, this is expected to change when the VI Architecture is introduced in 1999. The VI Architecture is an interconnect-independent set of protocols and API’s that standardize the interface between OS’s and cluster interconnects. ServerNet developed by Tandem Corporation, and SCI, an ANSI standard implemented by Dolphin, are examples of System Area Networks. System Area Network technologies can also be used to implement Storage Area Networks.
Storage Area Network
A Storage Area Network can be designed with a specialized or standard networking technology, e.g. Fibre Channel. Its purpose is to provide high bandwidth connections between servers and storage devices, and between storage devices, e.g. storage arrays and tape libraries. The primary objective of Storage Area Networks is high bandwidth for moving large amounts of data; latency is a secondary consideration. Storage Area Networks have been implemented with ESCON and HIPPI interfaces, and more recently with SSA and Fibre Channel. They can be deployed in homogeneous, e.g. all UNIX servers, or heterogeneous environments, e.g. a mix of UNIX and NT servers, and can be local to servers or remote from servers and connected to other (remote) Storage Area Networks. They use standard channel protocols, such as SCSI riding on top of Fibre Channel. In Storage Area Networks, storage is de-coupled from servers and managed as an independent resource.
Storage Area Networks can be configured in fabric topologies with switches to interconnect servers and devices or implemented in loop topologies with hubs to simplify cable management and increase loop resiliency.
SAN-Attached Storage (SAS)
SAN-Attached Storage refers to shared storage devices connected to servers and possibly each other via a Storage Area Network (typically Fibre Channel or SSA in open system environments).
Network-Attached Storage (NAS)
Network-Attached Storage are devices that directly attach to client networks, typically an Ethernet LAN, and provide optimized file services to clients and servers on the network using standard file I/O protocols, such as NFS and SMB, and networking protocols, such as IP. These devices are essentially specialized servers that functions as a server in a client-server relationship with other network computers requesting file access. Mylex, Network Appliance and Auxpex NAS products are examples of leading-edge Network-Attached Storage devices.
[Storage attached to application servers but not directly attached to a LAN is sometimes referred to as Network-Attached Storage. In a sense, this is true since clients can access server-dependent storage over the network. However, this is stretching the NAS definition since application servers are not optimized for file serving.]
SAN Interconnects and Topologies
SAN’s can be designed in switched fabrics or arbitrated loops. Fibre Channel is an ideal SAN interconnect because it provides scaleable performance with virtually unlimited addressing and can span campus-wide distances. Bus technologies, such as SCSI, are inappropriate due to bandwidth, distance and device attachment limitations.
SAN Interconnect Devices
Switches, hubs and routers are interconnect devices that can be employed to construct SAN networks. Switches are used in fabrics to provide scaleable performance. Hubs are deployed in loop configurations to simplify cable management and enhance fault tolerance. Routers are useful for interconnecting complex SAN’s, particularly over long distances for data vaulting or disaster protection applications.
ESCON, SSA and Fibre Channel are candidate SAN interfaces. ESCON is the dominate interconnect in mainframe environments. SSA is a newer IBM technology with performance characteristics similar to Fibre Channel Arbitrated Loops and has become a popular SAN interconnect. However, Fibre Channel appears to be emerging as the defacto industry standard SAN interface based on the breath of vendor support and market acceptance.
Fibre Channel defines a set of high performance serial I/O protocols and interconnects for flexible data transfer. It was developed by an ANSI committee and is supported by over seventy system, networking and storage vendors. Fibre Channel was designed to:
• Provide a common interface for transferring large amounts of data at high transmission rates and low error rates,
• Enable the simultaneous use of different transport protocols, such as, SCSI and IP, over a common interface, and
• Support multiple physical interfaces and transmission media with varying distance and cost characteristics.
Networks and Channels
Fibre Channel was designed to provide seamless integration between networks that connect users to servers and channels that connect storage to servers. Networks connects heterogeneous computers located anywhere in the world and enables them to communicate with one another at any point in time on a peer-to-peer basis. Consequently, networks use complex protocols to authenticate users, set up sessions, route data, correct errors and cope with unstructured environments. Complex protocols impose high overhead on network computers. Conversely, channels are employed in structured, predictable environments to connect storage and other devices to serves over distance limited, low error rate transmission paths.
Fibre Channel has not been extensively deployed in networks; the cost of Fibre Channel hardware is high relative to 10/100 and Gigabit Ethernet, and network infrastructures are in place to support Ethernet. However, Fibre Channel is rapidly becoming a storage interconnect standard; it provides the bandwidth, distance and flexibility required for Storage Area Networks:
• Full duplex transmission at 25 MB/s and 100 MB/s at distances up to:
• 25 meters between devices using video or mini-coax copper cable,
• 500 meters between devices using multi-mode fibre cables, and
• 10,000 meters using single mode fibre cables,
• Full duplex 200 MB/s and 400 MB/s data rates in the not too distant future,
• Extended addressibility:
• 126 devices on an arbitrated loop, and
• 1,000’s of devices in switched fabrics,
• Low error rates and end-to-end error checking for high data integrity,
• Optional redundancy for high availability,
• Low cost in arbitrated loop topologies, and
Loops or Fabrics
Fibre Channel devices, called nodes, have a single or multiple ports to communicate with other nodes. Today, most Fibre Channel Host Bus Adapters (HBA’s) provide a single port but HBA’s with dual ports are expected in the future. Each port has a pair of electrical (for copper cables) or optical transceivers (for fibre cables); one for transmitting data and the other for receiving data. The pair of conductors is referred to as a link. Fibre Channel nodes can be connected with a single link or dual links for fault tolerance.
Storage Area Networks can be configured as arbitrated loops with bandwidth shared by nodes on the loops or as switched fabrics with dedicated bandwidth between communicating nodes.
Fibre Channel Arbitrated Loops (FC-AL)
In a FC-AL, nodes arbitrate to gain access to the loop and then pairs of nodes establish a logical point-to-point connection to exchange data; the other nodes on the loop act as repeaters. With FC-AL, bandwidth is constant and shared; only one pair of nodes on the loop can communicate at any point in time. FC-AL is similar in operation to other shared media networks, such as FDDI or Token Ring.
Figure 4. Fibre Channel Arbitrated Loop and Fabric Topologies
Fibre Channel Switched Fabrics
In a switched topology, the full bandwidth of a link is available to pairs of communicating nodes and multiple pairs of nodes can simultaneously transmit and receive data. As nodes are added to a switched configuration, the aggregate bandwidth of the network increases. The switch is an intelligent device that provides crossbar switching functions enabling multiple pairs of nodes to simultaneously communicate.
Hubs or Switches
Hubs and switches are interconnect devices used in Storage Area Networks. They are available with advanced management capabilities similar to LAN management techniques; SNMP-based management is generally provided and some devices conform to the new Web-Based Enterprise Management (WBEM) standard. Hubs and switches can be cascaded to increase node connectivity; hubs up to the FC-AL limit of 126 nodes, and switches to thousands of nodes.
Except for two node configurations connected in a point-to-point fashion, hubs are generally used in FC-AL topologies, and can be used in a complementary role to switches in fabric topologies. Compared to switches, hubs are lower cost. They are useful to simplify cable management and contain bypass control switches that enable failed nodes on a loop to be bypassed without compromising the integrity of the loop. The hub’s bypass control switches also enable nodes to be hot plugged into a Storage Area Network or removed with affecting loop operation.
Figure 5. SAN Interconnected with a FC-AL and Hub Hub
Switches are used to create Fibre Channel fabrics. They provide the same resiliency features as hubs and enable the fabric’s aggregate bandwidth to scale as nodes are added. With hub-connected Storage Area Networks, aggregate bandwidth remains constant as nodes are added; hence, bandwidth per node decreases as more nodes share a fixed amount of bandwidth. With switched fabrics, bandwidth per node remains constant as nodes are added and hence, the aggregate bandwidth of the fabric increases as nodes are added; aggregate bandwidth is proportional to the number of nodes.
Figure 6. SAN With Switched and Shared (Loop) Interconnects
A cluster is a group of autonomous servers that work together to enhance reliability and saleability, and can be managed as a single entity to reduce management costs. Clusters always share storage devices and sometimes share data.
Clusters have been implemented using the Shared Disk and Shared Nothing models:
• Shared Disk Model -- Data is simultaneously shared by cluster nodes. An access control mechanism, generally referred to as a distributed lock manager, synchronizes access from multiple nodes to shared data. Example: Digital VAXClusters.
• Shared Nothing Model -- Access to data is shared but at any point in time, disk volumes are exclusively owned by one of the nodes. Example: Microsoft NT Clusters (NT/E 4.0). In advanced shared nothing clusters, nodes can access data they do not own through the node that own the data. Example: Next generation NT Clusters.
Advocates of the Shared Nothing model claim that Shared Nothing clusters are more scaleable because the overhead of a distributed lock manager increases as nodes are added and eventually bottlenecks the cluster.
Figure 7. Four Node Cluster With Shared Access to RAID Arrays
Clusters Use SAN Technologies
• Data is removed from the network and stored on the network behind the servers,
• Storage model is server-independent, not server-dependent,
• Access to storage devices is shared, generally across high performance links,
• I/O bandwidth is scaleable (with switches); storage capacity is scaleable,
• Redundant links can be used for fault tolerance and higher data availability,
• Data is centralized; security and storage management can be enhanced,
• Storage can be added incrementally; server and storage investments are de-coupled,
• Storage nodes communicate and exchange data (in advanced cluster designs),
• Hubs or switches provide simplified cable management and increased resiliency, and
Network-Attached Storage is implemented with devices directly attach to client networks, typically Ethernet LAN’s, and provide optimized file services to clients and servers on the network using standard file I/O protocols such as NFS and SMB. Network-Attached Storage is essentially a specialized server that functions as a server in a client-server relationship with other network computers requesting file access.
Figure 8. NAS on LAN Segments and Connected to SAN
SAN’s and Network-Attached Storage (NAS) are complementary technologies. A NAS device functions like a mini-SAN for LAN segments. With NAS devices, storage is moved from PC’s and workstations to file access optimized NAS devices where it can be protected, secured and managed. NAS design objectives are similar to SAN objectives but at lower price points.
• Workstation-dependent storage is replaced by workstation-independent storage,
• Data is centralized at the workgroup level where it can be more easily secured, shared, RAID-protected, backed-up and accessed in an optimal fashion by all clients,
• LAN’s can be more easily designed to avoid bottlenecks with centralized NAS storage on shared or dedicated LAN segments,
• Modular expansion is enabled; workstation and storage investments are de-coupled,
• Redundant links can be used to increase data accessibility, and
Fibre Channel Interface -- Fibre Channel is the optimal interface for storage arrays used in SAN applications. It provides the performance, distance and scaleability required for SAN environments. The Fibre Channel standard is widely supported and a broad range of Fibre Channel interconnect devices (hubs and switches) and storage devices (RAID arrays, tape and optical libraries, and disk, tape and optical drives) will be available with varying levels of features and performance in a competitive market environment.
Dual Channels – High performance SAN-attached storage devices require multiple front-end SAN channels for performance and fault tolerance. Dual channel arrays offer twice the potential performance of single channel arrays at marginally higher costs and provide continuous access to data access if a SAN interconnect device or link fails.
Heterogeneous Host Support
Most enterprises have heterogeneous computing environments with mainframes in the data center, and UNIX, NT or NetWare servers distributed across the network. Investments in these systems (acquisition, application development and infrastructure costs) are substantial and migration to a homogeneous environment is a multi-year proposition. SAN-attached RAID arrays should support storage volumes formatted with different file systems and allow heterogeneous systems to access to the volumes.
Data Availability Features
Duplex RAID Controllers – SAN’s should be designed without any single point of failure that can cause storage devices to become inaccessible. SAN-attached arrays should be configured with duplex controllers and with the disks connected to both controllers. Multiple SAN interfaces (on each controller) and duplex controllers with shared disks provide the level of fault tolerance required in SAN configurations. Redundant paths to data is a necessary but not a sufficient condition for high availability.
Transparent Host Independent Failover – In addition to redundancy, controllers should implement a transparent failover/failback scheme such that logical disks, i.e. logical arrays, are continuously accessible. With a Fibre Channel interconnect, each controller requires a port held in reserve to accommodate a controller failover, i.e. assume a failed controller’s port ID; although the Fibre Channel standard allows ports to have multiple addresses, this feature has not been implemented in Fibre Channel chips. Failover and failback should be implemented such that transitions occur without any required host intervention.
Alternate Path Support -- Following the no single point of failure principle, SAN-attached servers should have redundant Host Bus Adapters (HBA’s) with alternate path software support. Alternate path is generally implemented as a software driver and provides a level of indirection between the OS and HBA’s. If one HBA fails, the driver redirects I/O’s intended for the failed HBA to the alternate HBA so that I/O requests can be satisfied without the host OS’s knowing that an alternate path to storage was used.
Data Integrity Features
Mirrored Write Caching -- Most external RAID controllers implement write-back caching to enhance performance but few implement a mirrored write-back caching scheme to protect data. Data in a controller’s cache is vulnerable to power loss or controller failures until it is written to disk. To make matters far worse, if data in a write-back cache is lost, the application is oblivious to the loss since the controller has already acknowledged the write as complete. Controller battery back-up units can hold-up cache contents during power outages but cannot move cached data to an operational controller which can write it to disk. Mirroring write-back caches across controllers solves this problem and can be designed with minimal effect on cost or performance. With a mirrored cache architecture, I/O’s are written to cache memories in multiple controllers before the write is acknowledged as complete. If one controller subsequently fails, the surviving controller flushes the contents of the failed controller’s cache (which are stored in its cache) safely to disk. Mirrored caches protect data like mirrored disks.
Data Management Features
SAN-attached RAID arrays should support disk mirroring, all the commonly used RAID levels, on-line expansion of logical drives, on-line addition of logical drives and other data management software. Network-wide array management should be available from any client machine on the network.
Bandwidth Scaling Capabilities
Scaleable I/O Performance – SAN’s require a balanced design. The number (and compute power) of servers attached to a SAN can vary by orders of magnitudes, and SAN’s naturally tend to expand over time. However, adding servers to a SAN will result in marginal performance gains if the SAN-attached RAID arrays lack the horsepower to feed the additional nodes. Scaleable compute power requires scaleable I/O power. External storage arrays inherently provide scaleable I/O performance since arrays can be incrementally added to a SAN. However, a large number of under-powered or disk channel limited arrays are less cost-effective and more difficult to manage than a smaller number of arrays with performance better matched to the SAN’s I/O requirements.
Active-Active Operation -- Duplex controllers can operate in passive or active-active mode just like cluster nodes. In this context, active-active-active-active implies a duplex controller configuration with both controllers simultaneously servicing SAN I/O requests. This is analogous to the active-active operation of cluster nodes. To realize the full performance potential and total cost-of-ownership (TCO) effectiveness of a SAN, all storage resources, must contribute to performance. Controllers that operate in active-passive mode with one controller idle until the other fails is a waste of a valuable SAN resource.
Capacity Scaling Capabilities
Scaleable I/O Capacity -- System storage has been increasing 50% a year since the first disk drive was invented. SAN’s require controllers with surplus back-end channel capacity to accommodate expanding storage needs. Storage controllers that only support a few disk channels are marginal for SAN applications. Controllers with more back-end channels can not only accommodate more storage but their arrays can also be configured to provide performance and data availability benefits. Logical arrays on controllers with two or three disk channels are striped vertically down a channel. Since applications direct most I/O to a single logical array, vertically stripping arrays can cause a single I/O processor (IOP) to become a bottleneck (the hot disk phenomena moved into the controller). Horizontal striping balances the I/O load across IOP’s.
Figure 9. Vertically and horizontally stripped arrays. External RAID Controller
IOP IOP IOP IOP IOP
If an IOP fails, vertically striped arrays on the failed channel become unavailable to the controller with the failed IOP. However, if a RAID 5 array is horizontally striped across channels, then an IOP failure causes the loss of a single disk which RAID 5 algorithms can repair on-the-fly without disrupting application access to data. This failure mode is identical to a single disk failure in a RAID 5 set.
Mylex Fibre Channel Product Line
Mylex offers a seamless product line of external RAID array controllers designed to meet the performance, connectivity, cost, interface, topology, data integrity, data availability and data management requirements of SAN and cluster environments.
Mylex external controllers are ready to be packaged in stand-alone or rack-mountable JBOD (disk) enclosures. External controllers are shared storage resources packaged apart from systems (internal controllers plug into a system bus, typically PCI).
The RAID firmware and management software implemented across the product line deliver a uniform set of data protection and performance optimization features:
• At varying levels of performance and storage connectivity,
• With dual fibre host interfaces, and Ultra or Ultra-2 LVD disk interfaces, and
• At price points for entry level SAN’s and with performance for enterprise SAN’s. Mylex array controllers are available in simplex configurations for network servers and duplex (dual) configurations for SAN’s and clusters. In duplex mode, advanced features are implemented to accelerate performance, protect data and guarantee data accessibility.
• Active-active controller operation for increased performance,
• Transparent failover / failback for high data availability, and
• Cache mirroring for guaranteed data integrity.
SX SX SF SF FL FL SX SX SF SF FL FL
Figure 10. Relative Performance Figure 11. Product Line Features
Duplex controllers (gray boxes) deliver up to twice the performance of simplex controllers (white boxes). Fibre controllers provide over twice the performance of SCSI controllers. Data protection, accessibility and management features are important in any SAN or cluster environment and are uniformly implemented across the product line:
• DAC SX Dual Ultra SCSI host ports and four Ultra SCSI disk channels
• DAC SF Dual Fibre Channel host ports and six Ultra SCSI disk channels
Heterogeneous Server Support
Mylex external array controllers can accommodate SAN’s configured with heterogeneous operating systems, such as UNIX and NT. Disk volumes are configured into logical arrays and then formatted with NTFS or UNIX file systems. Up to eight logical arrays accessed as LUN’s (Logical Unit Numbers) can be configured on each controller. With Mylex controllers, a logical array is the unit of space allocation and RAID level protection. Each logical array can be configured with the RAID level that provide the optimal level of performance and fault tolerance for applications accessing the arrays.
NT and UNIX servers can simultaneously access Mylex array controllers formatted with NT and UNIX volumes. This level of flexibility is required by enterprise SAN’s and facilitates migration from heterogeneous to homogeneous computing environments.
Figure 12. DAC FL Attahed to a Fibre Channel SAN
FL Controller FL Controller Port 0 Active Port 0 Active FL Controller FL Controller Port 0 Active Port 1 Reserved Port 0 Active Port 1 Reserved Port 1 Reserved Port 1 Reserved
SANNT Server UNIX System
In Figure 12, two pairs of duplex DAC FL controllers are SAN-attached. Each controller has redundant paths to host systems and pairs of controllers provide redundant paths to disks. The dark shaded disks are formatted with UNIX file systems and the lightly shaded disks with NTFS files systems. The SAN-attached servers can have multiple paths through the SAN interconnect to both pairs of redundant controllers, and redundant paths from the controllers to the disks are provided for fault tolerance and high data availability.
Mylex SAN-attached controllers provide reliable and simultaneous access from heterogeneous servers to provide a higher level of resource sharing and an integrated storage environment.
Active-Active with Transparent Failover / Failback
A key SAN benefit is high data availability; SAN devices should be able to fail without negatively impacting access to data (aside from momentary transition delays). Mylex external RAID array controllers incorporate features that increase data accessibility.
• Active-Active Controller Operation
• Controller Heartbeat Monitoring
• Transparent Failover and Failback
• Dual Host Ports
Mylex controllers are designed to provide analogous and complementary behavior to high availability cluster nodes making them well-suited to SAN applications.
For cost-performance effectiveness, Mylex controllers support active-active operation; both controllers simultaneously satisfy I/O requests from SAN nodes. Some vendors offer active-passive controllers which is similar in concept to a hot spare disk. The passive controller waits for the active controller to fail and then assumes the I/O load of the failed controller. With active-active, both controllers service I/O requests and hence, deliver up to twice the performance of active-passive controllers.
Mylex controllers provide high availability by heartbeat monitoring and transparent controller failover / failback. Like clustered servers, they are linked by a private network used to transmit I’m alive heartbeat messages (Figure 13). The absence of heartbeat messages signals that one of the controllers is off-line and the remaining controller immediately initiates a failover operation and then begins servicing I/O requests directed to itself and its off-line partner to provide non-stop access to data. Controller failover / failback operations are host independent and transparent to SAN nodes. During the failover / failback process, SAN nodes simply continue sending I/O requests to the same ID’s across the SAN interconnect. As far as the nodes are concerned, these commands are processed identically whether both controllers are functional or one has failed..
HBA Controller 0 Port 0 Port 1 HBA Controller 1 Port 0 Port 1 Heartbeats
Figure 13. Controllers Monitor Each Other Via Heartbeat Messages
Mylex controllers have dual SAN ports which doubles the bandwidth to controllers and allows redundant paths from other SAN devices to the controllers to increase the resiliency of the SAN topology. As described later in this paper, dual host ports are particular critical for controller failover in Fibre Channel topologies. The SAN ports can be connected directly to UNIX and NT servers or indirectly through hubs and switches.
Failover and Failback With SCSI Controllers
During normal operations, each SCSI port has a unique ID. The controller’s disk channels are connected to the disks and each other; each controller has equal access to all drives.
HBA Controller 0 ID 0, 1 ID 1, 1 HBA Controller 1 ID 0, 2 ID 1, 2 Heartbeats
Figure 14. Normal Operating Mode of Dual DAC SX SCSI Array Controllers
If a controller fails, the surviving controller senses the absence of heartbeats, assumes the ID’s of the failed controller in addition to its own, and updates its data structures with configuration information stored on disk (COD). The failover process is transparent since nodes still see the same two SCSI ID’s on the interconnect.
HBA Controller 0 ID 0, 1 ID 1, 1 HBA Controller 1 ID 0, 2 ID 1, 2 Heartbeats
Figure 15. DAC SX SCSI Array Controller Failover
ID 0, 1 ID 1, 1
When the failed controller is replaced, it is detected by the surviving controller which allows it to restart and returns the failed controller’s SCSI ID’s, and then it starts processing I/O.
HBA Controller 0 ID 0, 1 ID 1, 1 HBA Controller 1 ID 0, 2 ID 1, 2 Heartbeats
Failover and Failback With Fibre Controllers
Fibre arrays operate similar to SCSI arrays in failover / failback but are configured differently. Fibre chips do not allow ports to have multiple addresses like SCSI chips. Hence, fibre arrays are configured with one port active and the other reserved.
HBA Controller 0 Port 1 Reserved HBA Controller 1 Port 2 Reserved Heartbeats
Figure 17. Normal Operating Mode of Dual DAC SF and FL Fibre Controllers
If a controller fails, the surviving controller senses the absence of heartbeats, fails over the ID of the active port on the failed controller to its reserved port, and updates its data structures with configuration information stored on disk. The failover process is transparent since the nodes still see the same fibre port ID’s on the SAN interconnect.
Figure 18. DAC SF and FL Fibre Array Controller Failover
Port 2 Port 1
Port 1 Reserved
When the failed controller is replaced, it is detected by the surviving controller which allows it to restart and returns the failed controller’s port ID’s, and then it starts processing I/O.
Figure 19. DAC SF and FL Fibre Array Controller Failback Controller 0
Port 1 Reserved
Fibre Channel Loops and Fabrics
Since Mylex controller have dual SAN ports for increased performance and resiliency in the face of interconnect failures, they are well suited for SAN applications. Ports can be directly attached to SAN servers or indirectly through hubs and switches.
Fibre Channel Arbitrated Loop Topology
Loops provide shared bandwidth to SAN devices; only one pair of devices can communicate at any point in time. Hence, as devices are added, aggregate loop bandwidth remains constant and bandwidth per device decreases. In Figure 20, one of the servers is shown communicating with FL Controller 3 (bold line).
Figure 20. Mylex External Array Controllers in a Loop SAN Topology FL Controller 0 FL Controller 1 FL Controller 2 FL Controller 3 Port 0 Port 1 Port 0 Port 1 Port 0 Port 1 Port 0 Port 1
FC HBA FC HBA FC HBA
Fibre Channel Switched Topology
Switched fibre provides dedicated bandwidth to SAN devices. Multiple pairs of devices can communicate simultaneously. As devices are added, aggregate bandwidth of the fabric increases and bandwidth per device remains constant. As illustrated in Figure 21, all four servers can simultaneously communicate with the four FL controllers.
Figure 21. Mylex External Array Controllers in a Switched SAN Topology FL Controller 0 FL Controller 1 FL Controller 2 FL Controller 3 Port 0 Port 1 Port 0 Port 1 Port 0 Port 1 Port 0 Port 1
Mirrored Write Caching
To accelerate application performance, Mylex external RAID array controllers use sophisticated caching algorithms for both read and write operations. The write cache can be optionally set to Write-Through or Write-Back mode on a LUN-by-LUN basis.
In Write-Through mode, the Command Complete message is not returned to the SAN node that issued the Write until the data has been written to disk. Write-Through caching ensures that data is written to disk before acknowledging I/O completion to the system and improves performance in applications that frequently read data that has recently been written by retaining a copy of the data in the cache.
In Write-Back mode, higher performance is achieved by returning the Command Complete message when data has been written to cache but prior to writing it to disk. This permits the node to clear its write command queue an order of magnitude faster than if it waited for the data to be written to disk; hence, more commands can be processed in a given amount of time. This also allows multiple write operations to be coalesced by the controller into larger contiguous blocks before being written to disk in bursts without missing revolutions of the disk and enduring long latency times.
Cache Structure Controller 0
Figure 22. Conventional Controller Cache
Without the protection provided by Mylex controllers, Write-Back caching may subject data to certain risks. After a node has been told that a Write command was completed, it is unable to re-issue the command if a failure prevents the cached data from actually being written to disk. To ensure data integrity in the event of power failures, Mylex controllers are protected by battery back-up and / or active UPS monitoring. If AC power is lost, the write caches are held up by the battery backup unit and flushed to disk when AC power is restored before new commands are accepted from a node. If AC power is lost to the UPS, Write-Back caches are automatically flushed to disk and switched to Write-Though mode. When AC power is restored to the UPS and it has recharged, Write-Back caching is turned back on.
To protect the write cache data in the event of controller failure, Mylex array controllers implement a cache mirroring architecture. In duplex configurations, data written to the Write-Back cache in one controller is immediately copied to the write buffer in the second controller (Figure 23). Controller 0 Read Ahead Disk Bus Write Back Write Buffer Controller 1 Read Ahead Write Back Write Buffer
Figure 23. Mirrored Cache Writes
If the controller servicing the I/O Write commands fails, the surviving controller has a copy of the writes and will immediately complete the Write operations for its failed partner (Figure 24). The cache mirroring process is accomplished by a Write operation from one controller to the other using one of the disk channels. To minimize bus contention with disk I/O operations, the least busy channel is selected.
Controller 0 Read Ahead Disk Bus Write Back Write Buffer Controller 1 Read Ahead Write Back
Figure 24. Surviving Controller Flushed Failed Controllers Writes to Disk Write Buffer