Storage Virtualization in Cloud
Cloud Strategy Partners, LLC
Sponsored by: IEEE Educational Activities and IEEE Cloud Computing
Course Presenter’s Biography
This IEEE Cloud Computing tutorial has been developed by Cloud Strategy Partners, LLC.
Cloud Strategy Partners, LLC is an expert consultancy firm that specializes in Technology and Strategy relating to Cloud Computing.
Course Summary
In this tutorial, we will review the various ways that Storage is virtualized and implemented for large scale, distributed systems, including Cloud. We will also discuss the storage primitive which was virtualized, and see that some systems concentrate on virtualizing files, and some systems concentrate on virtualizing blocks. Next we will review how the virtualization function can run in a variety of places in the architecture. It can run in the host, in the network, or all the way back where the drives are. We also will discuss how virtualization can be placed “in band” of the storage operations, and for scale, is usually placed “out of band”. Finally, we will look at several of the New file systems optimized for managing heterogeneous cloud storage farms.
Transcript
Outline
As you can see from the slide, we will be covering several areas relating to Storage virtualization in Cloud IaaS; Storage virtualization techniques Storage virtualization layers;
Approaches to storage virtualization; Storage types in cloud; Virtualized file systems for cloud.
Storage Virtualization
This slide illustrates Virtualization techniques in Cloud IaaS. At the top the General Virtualization Technique is illustrated. Underneath, one can visualize how the general technique is applied to several virtualization problems. For Server virtualization, hypervisors are extensively used. For storage virtualization, many software techniques including volume management, file systems, and replication are applied.
For network virtualization, there are many different features including link aggregation, VPN, and also firewall, switching, routing, and application filtering and load balancing have virtual capability in many cloud implementations today.
Outline
As we have seen from previous lessons, virtualization of any kind of resource follows a common blueprint. The physical hardware is put under a special kind of software control which abstracts the physical layer, and presents to the user what “looks” like the actual resource but is actually a “virtual” instance of that resource.
Storage follows this blueprint. The hardware and the hardware interfaces are virtualized by a software layer as shown in the illustration, thus presenting virtual storage primitives (disks, file systems, etc.) to the consumer.
Common Storage Architectures
In clouds, there are many options for implementing physical storage, because, the virtual storage interfaces can be kept to a small set, with the software providing common interfaces.
The illustration shows three kinds of physical storage: DAS -Direct Attached Storage NAS - Network Attached Storage SAN -Storage Area Network.
Transcript
Once can see by the diagram, that the underlying architectures of these different schemes are quite different. The applications interface (legacy, non-virtual) all look the same – basically an application accesses a file system. Under the hood, the connectivity of the components –and where the file system code actually runs –can be at any number of locations (as shown) The software layer which is implementing the virtualized storage, can also enhance the storage model offered to beyond that which physical storage can
accomplish.
Manageability Virtualized storage resource are easier to configure and manage Scalability virtualization simplifies storage resources scalability. Availability virtualization simplifies protecting against storage hardware failures and overloading Storage redundancy, backup and load balancing are part of the distributed cloud storage.
Security Virtualized storage instances provide additional security by storage segments isolation The illustration on this slide goes into more depth as to how the virtualization layers for storage work, and what kind of storage model they present.
Properties of Storage Virtualization
The software layer for storage is in kernel space in the operating system, where it can intercept the disk and file system primitives and insert the capability of utilizing external, networked storage, and storage built from replicating, distributed drives.
The most common storage models are “file system” and “block device”. While the names imply the capability of the models, the Cloud OS may not provide the exact same capabilities as a standard let’s say Linux file system or block device. We will discuss more on this later.
File System Level Virtualization
What is file system
• A file system is a software layer responsible for organizing and policing the creation, modification, and deletion of files
• File systems provide a hierarchical organization of files into directories and subdirectories
• The B-tree algorithm facilitates more rapid search and retrieval of files by name
• File system integrity is maintained through duplication of master tables, change logs, and immediate writes off file changes
Transcript
Different file systems
• In Unix, the super block contains information on the current state of the file system and its resources.
• In Windows NTFS, the master file table contains information on all file entries and status.
File Metadata
• The control information for file management is known as metadata.
• File metadata includes file attributes and pointers to the location of file data content.
• File metadata may be segregated from a file's data content.
• Metadata on file ownership and permissions is used in file access.
• File timestamp metadata facilitates automated processes such as backup and life cycle management.
Different file systems
• In Unix systems, file metadata is contained in the i-node structure.
• In Windows systems, file metadata is contained in records of file attributes.
Block Device Level Virtualization
Block Device Level virtualization is a low level technique which creates a “volume pool” from a collection of drives. It presents virtualized storage primitives called LUN for Logical Unit Identifier, and an offset within that LUN, which known as a Logical Block Address (LBA) This is illustrated in the slide
Block Device Level: Logical Unit and Logical Volume
Block level data The file system block
• The atomic unit of file system management is the file system block.
• A file's data may span multiple file system blocks.
Transcript
• A file system block is composed of a consecutive range of disk block addresses. Data in disk
• Disk drives read and write data to media through cylinder, head, and sector geometry.
• Microcode on a disk translates between disk block numbers and cylinder/head/sector locations.
• This translation is an elementary form of virtualization.
Block device level interface: SCSI (Small Computer System Interface)
• The exchange of data blocks between the host system and storage is governed by the SCSI protocol.
Storage Interconnection
Drives are not always local to the server, and therefore aStorage Interconnection is utilized.
This illustration shows that the path to storage includes multiple layers of physical and logical data transformation
The storage interconnection provides the data path between servers and storage The storage interconnection is composed of both hardware and software components
Approaches to Storage Virtualization
Abstracting physical storage
Physical to virtual The cylinder, head and sector geometry of individual disks is virtualized into logical block addresses (LBAs).For storage networks, the physical storage system is identified by a network address / LUN pair.Combining RAID and JBOD assets to create a virtualized mirror must accommodate performance differences.
Metadata integrity Storage metadata integrity requires redundancy for failover or load balancing. Virtualization intelligence may need to interface with upper layer applications to ensure data consistency.
Host-based Virtualization
Important issues
Transcript
Storage metadata servers Storage metadata may be shared by multiple servers. Shared metadata enables a SAN file system view for multiple servers. Provides virtual to real logical block address mapping for client. A distributed SAN file system requires file locking
mechanisms to preserve data integrity.
Host-based storage APIs May be implemented by the operating system to provide a common interface to disparate virtualized resources. Microsoft's virtual disk service (VDS) provides a management interface for dynamic generation of virtualized storage.
Host-based Virtualization: Example
An additional layer of abstraction and control can be run on each host, and is called Logical Volume Manager (LVM). This code runs on the host and front-ends all kinds of back end storage resources. The use cases and the architecture are shown on the slide.
Host-based Virtualization: Pros and Cons
Host based storage virtualization has gotten very popular in server machines because no additional hardware or infrastructure requirements Simple to design and implement Improve storage utilization However Storage utilization optimized only on a per host base Software implementation is depending on each operating system Consume CPU cycles for
virtualization As we all know, NFS is very popular, and it is a form of Host based storage virtualization
Network-based Virtualization
Animation illustrates… Fabric switch should provide Connectivity for all storage transactions Interoperability between disparate servers, operating systems, and target devices
FAIS ( Fabric Application Interface Standard ) Define a set of standard APIs to integrate applications and switches. FAIS separates control information and data paths. The control path processor (CPP) supports the FAIS APIs and upper layer storage virtualization application. The data path controller (DPC) executes the virtualized SCSI I/Os under the management of one or more CPPs
Network-based Virtualization: Pros and Cons
Network-based Virtualization come with plus and minus factors as well True heterogeneous storage virtualization No need for modification of host or storage system Multi-path technique improve the access performance However Complex interoperability matrices -limited by vendors support Difficult to implement fast metadata updates in switch device Usually require
Transcript
to build specific network equipment (e.g., Fibre Channel) IBM SVC ( SAN Volume Controller )is an example
Storage-based Virtualization
This animation illustrates how the different layers in a storage system perform their function.
In the first part of the animation one can see a mode where the underlying virtualized storage provides a virtual filesystem interface; the connected Operating Systems send the files there to get saved. The virtualized storage takes care of replicating the file across actual drives in the cloud for high durability.
In the second part of the animation, on can see the mode where the underlying virtualized storage presents and block level interface. Here, the application is running on an OS, which presents a local file system interface. The operating system deconstructs, through the file system code, the save into a series of blocks which need to be written. The blocks go to the virtualization layer in this case, which stores and replicates at the block level.
Storage-based Virtualization: Pros and Cons
Storage virtualization is extremely useful
On the one hand Provide most of the benefits of storage virtualization Reduce additional latency to individual IO
However Storage utilization optimized only across the connected controllers Replication and data migration only possible across the connected controllers and the same vendors devices
In-band Virtualization
Storage virtualization can be implemented in a number of ways.
The animation in this slide shows what is called In-Band virtualization, Also known as symmetric, virtualization devices actually sit in the data path between the host and storage.
Hosts perform IO to the virtualized device and never interact with the actual storage device.
While Easy to implement it has Bad scalability & Bottle neck characteristics
Transcript
Out-of-band Virtualization
The animation in this illustration shows out of band virtualization Also known as asymmetric, virtualization devices are sometimes called metadata servers. It Requires additional software in the host which knows the first request location of the actual data. While a good architecture for Scalability & Performance It is Hard to implement
Storage Types in Cloud (1)
Block storage
Block storage is a type of data storage where data is stored in blocks, also referred to as volumes. Each block is treated as individual disk drive and can contain multiple files. In this way, block storage provides a good abstraction for physical storage devices and well suited for most of file systems. In cloud, VM instance is often provisioned with the attached block storage of the configured or requested size.
Object storage
Storage architecture that manages data as objects. Each object contains data, metadata and accessed via a globally unique identifier, typically in a form of URI or URL. Object storage systems use namespace that is consistent across multiple physical devices. Object storage systems usually include such additional services as data replication and distribution, and may also support application specific access protocols and data management. As an example, object storage infrastructure is used by Dropbox for storing files and Facebook for storing photos.
Storage Types in Cloud (2)
Bucket storage
Bucket storage is a storage organisation where data objects are stored in the basic
containers and using single global namespace, where data can be accessed with their own methods. Bucket storage type is used by Amazon S3 and Google.
Blob storage
Blob storage represents a generic key-value data store, often designed for storing large data objects. A blob (short for binary large object) is a collection of binary data stored as a single entity in a database management system. Blob storage is used in Microsoft Azure cloud.
Virtualized File Systems for Cloud
We will look closer at some popular file systems for cloud.
Transcript
Each of them has a number of benefits when implemented in specific cloud environment.
Most applications need filesystems because servers have filesystems. Most filesystems need block storage upon which to mount. Therefore it is no surprise that filesystems with underlying block storage are popular cloud storage options.
A simple approach is to use server or NAS based technologies extended directly to the cloud for the block filesystems. That is tie a number of drives together, and maybe access across the network. As th slide lists, LVM, RAID, and NFS are all examples of virtualized filesystems on block storage commonly found in smaller clouds.
Larger clouds utilize distributed file systems, which have a high degree of redundancy across the cloud they are serving. As the slide shows, they can be file or object based and there are many examples of both popular in cloud implementations.
HDFS (Hadoop Distributed File System) is specifically designed for large scale data processing on massively parallel clusters. Can be used in cloud for high performance data input/output, in particular for CDN (Content Distribution Network)
Logical Volume Management (LVM)
Logical Volume Manager is very popular because it is commonly found in Linux. It implements block level host-based virtualization approach
Allows disks to be added or replaced without downtime and service disruption. Supports file systems extension and dynamic re-sizing, data backup, creation and dynamic resizing of logical volumes
Suitable for managing large disk farms
Logical Volume Management Architecture
This slide illustrates the Logical Volume Management Architecture Tools and utilities are in user space Device mapper framework implements a Linux kernel driver for different mappings
Logical Volume Management Implementation
Logical Volume Management Implementation LVM project is implemented in two components:
In user space Based on FUSE (Filesystem in Userspace) In kernel space which Implements device mapper framework
Transcript
LVM implementation using FUSE
This slide diagrams the Logical Volume Manager using Filesystem in User Space
As can be seen by the diagram, the key is a that the Loadable kernel module FUSE provides a bridge to actual kernel interfaces
LVM Implementation in Kernel Space
For performance reasons, there is an implementation available of LVM in kernel space This slide speaks to the system calls when implemented this way
Redundant Array of Independent Disks (RAID)
Another common scheme for virtualizing storage is RAID (Redundant Array of Independent Disks)
RAID is a software layer which groups together disks and implements various levels of replication and distribution of data across the drives.
RAID schemes provide different balance between the key goals: Reliability, Availability, Performance, Capacity
The slide speaks to the Difference in common RAID schemes RAID0, RAID1, RAID1+0, RAID5, and RAID5+0
Network File System (NFS)
Filesystems have characteristic which application developers have come to depend on. For example, when the “write” call returns from the kernel to the user application, the user application assumes that the data is actually written, or at least that a subsequent read of the same data will return what was just written. These behavioral assumptions are part of the POSIX specification.
When clusters of disks are used, and when filesystems ae exported across the network, it becomes challenging to live up to all of the POSIX filesystem requirements. Network
filesystems which had correct behavior became very popular. The SUN Network File System (NFS) was one of the first and most reliable POSIX-compliant distributed file systems.
As the slide lists, NFS is specified by a number of RFCs and the protocols use to implement NFS are well known and understood. Ove the years NFS has become a “go to” network filesystem.
Transcript
Notably, NFS has been extended to support clustered server deployments including scalable parallel access to files distributed among multiple servers (pNFS extension). This technology is a key part of for example the IBM private cloud implementation.
Lustre
Lustre is a type of parallel distributed file system used for large amount of data and can work with large computer clusters.
Lustre name is derived from two words "Linux" and "cluster".
Lustre is often used as a file system for supercomputers and multi-site computer clusters.
Lustre storage cluster may contain thousands of nodes and Petabytes of storage volume.
Lustre architecture includes three main components: metadata servers that stores filesystem information (files and directories) as well as access rights, object storage servers, and clients that access and use data. Lustre uses unified namespace compatible with POSIX semantics.
Lustre File System Architecture Components
The main file system components inside of Lustre are described in this slide.
Note one of the most significant elements of the design is the notion of many Object Storage Servers. This lends to the scalability of the design.
Lustre Cluster at Scale
This slide illustrates the scalability design introduced in the previous slide.
In general, highly available and high scalability concepts are both used in large deployments.
Here a Lustre deployment at scale is illustrated showing multiple networks between clients and the Lustre cluster, and also the number of I/O Servers (paired as fail over groups).
Lustre File System in HPC: Examples
The high performance computing community has been working on pushing the limits of performance and scalability and Lustre has achieved popularity within that community. Lustre has significant momentum in the HPC community and is actually the leading distributed filesystem for those systems, as the slide details.
You can see impressive scale-out and high performance numbers achieved.
Transcript
Ceph
Ceph is an example of fully distributed storage architecture (not having central management) and the file system designed to integrate object, block and file storage servers from a single distributed computer cluster.
Ceph distributed object storage is built around the Reliable Autonomic Distributed Object Store (RADOS) that support data replication. Ceph block storage can be directly mounted to a VM and provides automatic data replication across the storage cluster.
Ceph file system runs on top of the object or block storage and maps file names and directories across RADOS cluster.
Ceph Architecture and Design
Ceph has three components
Clients: Near-POSIX file system interface Cluster of OSDs: Store all data and metadata
Metadata server (MDS) cluster : Manage namespace (file names)
It is designed for high availability and scalability using key design patterns:
Separating data and metadata
Dynamic distributed metadata management Reliable Autonomic Distributed Object Storage
Ceph Architecture
The illustration in this slide shows the Ceph architecture. Ceph separates data and metadata operations Data/file request includes request to MDS to obtain file components/inodes location and metadata
Ceph Operation on Request
Ceph uses an effective client synchronization model.
The client makes a request to the Metadate Server which translates the file name into inode (inode number, file owner, mode, size, …)
Then the CRUSH (Controlled Replication Under Scalable Hashing) module goes to work.
CRUSH is A scalable pseudo-random data distribution function designed for distributed object-based storage systems Maps objects to Placement groups (PGs) using a simple hash function
It returns inode number, map file data into objects.
The client then accesses the Object Storage Device, as can be seen by the illustration.
Transcript
Gluster
Gluster storage and files system is an Open Source platform for scale-out public and private cloud storage. Similar to Lustre, the Gluster name is derived from two words “GNU” and
“cluster”. Gluster aggregates heterogeneous storage server connected over Ethernet or Infiniband network.
The Gluster file system provides simple functionality and leave all file management functionality to clients.
Gluster Architecture
This slide illustrates the Gluster architecture.
Gluster accesses a variety of physical storage devices, from Direct Attached, to JBOD, to SAN, and creates a global namespace across them. It also puts a virtualization layer across them, providing a variety of filesystem models, such as NFS or CIFS or WebDAV up to the clients.
Gluster has been widely used in many cloud distributions. The slide lists many of them.
Gluster is the standard storage system used in Red Hat’s OpenStack distribution One can also use Gluster easily in Amazon with the available AMI
Hadoop Distributed File System (HDFS)
The Hadoop Distributed File System (HDFS) is a very different sort of “filesystem” optimized for a specific class of applications, those are Map Reduce and similar “Big Data” systems like
“no-SQL” databases.
It is a scalable distributed file system for large scale data analysis
A part of the Open Source Apache Hadoop suite The primary storage used by Hadoop MapReduce applications
Can run on commodity hardware assuring high fault-tolerant
HDFS Architecture
HDFS cluster consists of a single master node/server that runs NameNode and multiple DataNodes, usually one per physical node in the Hadoop cluster. User data are stored in the files, externally they are exposed through namespace managed by the NameNode. To access a file, a user client needs to request a file location or metadata from the NameNode, and after that it can send read or write request to the DataNode directly. DataNodes create data blocks and do replication based on instructions from NameNode.
Transcript
Summary and Take Away
This tutorial has explored the various ways that Storage is virtualized and implemented for large scale, distributed systems, including Cloud.
We explored the storage primitive which was virtualized, and saw that some systems concentrate on virtualizing files, and some systems concentrate on virtualizing blocks.
We saw that the virtualization function can run in a variety of places in the architecture. It can run in the host, in the network, or all the way back where the drives are.
We saw that virtualization can be placed “in band” of the storage operations, and for scale, is usually placed “out of band”.
There are many types of storage primitives which, after all the virtualization has occurred, end up getting exposed to applications. Objects (buckets, blobs), blocks, or file systems There are many ways to layer the filesystem in leveraging the virtualization and replication in a cluster. The capability can be close to the operating system such as LVM or across the network like NFS.
Finally, we took a hard look at several of the New file systems optimized for managing heterogeneous cloud storage farms