• No results found

Storing Big Data The Rise of the Storage Cloud

N/A
N/A
Protected

Academic year: 2021

Share "Storing Big Data The Rise of the Storage Cloud"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Storing Big Data—The Rise of the Storage Cloud

Young-Sae Song, December 2012 www.seamicro.com

Whereas even a few years ago a terabyte was seen as a large amount of data, today individual applications can generate petabytes of data per second. Some examples include1:

• Over 100,000 hours of video per day are uploaded to You Tube, translating to 360 terabytes every day

• 500 terabytes of new data per day are ingested in Facebook databases

• The CERN Large Hadron Collider (LHC) generates more than 86 petabytes of data per day

• The proposed Square Kilometer Array telescope is expected to generate an Exabyte (1,000,000 terabytes) of data per day

• Sensors from a Boeing jet engine create 20 terabytes of data every hour

These are just a few examples, and there are thousands more. To make matters worse, the rate of data growth is increasing and will continue to expand as hundreds of millions of new internet users from the third world join the networked world. China is forecast to add more than 28 million Internet users by the end of 20122. India is adding more than 18 million

Internet users per year, and Russia continues to grow its Internet users by more than 20 percent per year.

With the exponential growth of data as a backdrop, this paper addresses the simple question—how will we cost effectively store all of this new data so that it can be quickly and easily accessed for analysis? This question underpins an industry wide imperative to provide storage solutions that allow unprecedented amounts of data to be stored at little cost and to be quickly and easily accessed.

Challenges of Big Data

In the face of this unprecedented growth of data, existing solutions have been stretched beyond breaking. Neither the performance nor the economics of traditional storage were designed for data at this scale.

The effort to solve this challenge—and to develop storage and analysis tools to cost effectively derive insights from the unimaginable quantities of data—is collectively known as the Big Data Movement. This movement is an effort to bring together massive amounts of cost effective, high capacity and high performance storage, compute, software and bandwidth.

The movement has all the moving parts of a new technology frontier. Hardware companies are trying to build solutions that are cost effective, have increased capacity and that lower operational expense. Software companies are developing solutions that make it easier to manage and analyze the data without relying on expensive hardware or specialized devices that permeate the enterprise storage environment. Storage Area Networking (NAS) and Network Attached Storage (SAN) solutions adopted in the enterprise environment do not make sense for such big data environments due to their high cost and complexity. The Apache™ Hadoop® project is the

most well known industry initiative to develop open-source software for reliable, scalable, distributed storage and computing. The fundamental assumption is that key features such as high availability, detection of failures or managing the scaling up or down of capacity is provided by the software layer. As a result, expensive, enterprise or carrier grade server and storage infrastructure is not necessary to deploy reliable and scalable big data infrastructure.

Failure of Network Attached Storage

and Storage Area Networking

Traditional storage solutions emerged to solve the problem of how to share storage. This problem can be reformulated as how to cost effectively amortize storage capacity and performance over CPUs. With storage directly attached to 1 Stacey Higginbotham, As Data Gets Bigger, What Comes After

Yottabyte?, Gigaom, Oct 30, 2012 and AMD analysis.

2 Jon Russell. China’s Internet Population Reaches 537 million,

as Smartphones Drive 11% Annual Growth, The Next Web, July 19, 2012.

(2)

© 2012 Advanced Micro Devices, Inc. AMD, the AMD arrow logo, Freedom, TIO, and combinations thereof are trademarks of Advanced Micro Devices, Inc. servers, the hard linking of disks to CPU is exceptionally

cost effective, but precludes sharing and is generally available at fixed storage to compute ratios. As storage is added, compute must also be added as well as networking. NAS, which used Ethernet to connect arrays of storage to compute, and SAN, which typically used Fiber Channel to connect storage to compute, were two solutions initially developed to address this problem, but they have largely failed to live up to their promise.

First and foremost, both solutions are punishingly expensive. The same disk, as direct attached storage, costs a fraction of what it is costs for the same capacity in a NAS or SAN solution.

Delivering on the Promise

of Cloud Storage

The Cloud infrastructure of tomorrow needs to enable data center operators to deploy flexible ratios of compute, storage and networking resources. Not only is flexibility of initial deployment vital, it is also mandatory that the ratios be quickly and dynamically increased in any dimension with minimal impact to existing operations and TCO. The requirements of cloud and big data applications create a tremendous challenge for the industry that cannot be solved with incremental innovations. AMD Data Center Server Solutions has developed the most innovative solution—the

SeaMicro SM15000™ Fabric Compute System—to address

these challenges and provide benefits that are cost effectively unachievable with existing technology.

AMD’s SeaMicro SM15000 is a revolutionary server that brings together compute, storage, and networking in a single 10 RU system. The SM15000 delivers 64 sockets of AMD’s octal core Opteron™, Intel® quad core Xeon® (“Ivy

Bridge”or “Sandy Bridge”) or 256 sockets of Intel®dual core

Intel®Atomprocessors all interconnected via a 1.28 Tbps

SeaMicro Freedom™Fabric. The SM15000 delivers

best-in-class energy efficiency, density and bandwidth while dramatically reducing CAPEX and the OPEX associated with server deployment and management.

The SM15000 incorporates the Freedom Fabric that removes the constraints of traditional servers, while delivering up to 75 percent in power savings. The Freedom Fabric within the SM15000 provides customers with the ability to tune the performance and ratio of compute to storage. The SM15000 has the ability to extend fabric-based computing across the racks and aisles of a data center and provides the ability to connect massive disk arrays supporting over five petabytes of storage capacity to a single 10 RU system. The result is SAN- or NAS-like storage capacities at direct attached storage (DAS) economics and simplicity. Furthermore, the SM15000 integrates switching functionality and enables reduction of top of rack switches, terminal servers and hundreds of networking devices and cables to simplify the installation, management and maintenance of a warehouse-sized computer or hard drive.

The SeaMicro Freedom Fabric is the key technology and interconnects hundreds of computational server nodes, with significant reductions in power, cost, and latency. It incorporates techniques used to interconnect the CPUs of the largest and most complicated supercomputers and scales the technology down for data center applications. The result is a technology that adapts supercomputer fabric functionality designed as a three-dimensional torus, with both path redundancy and diversity. The fabric is FLIT-based and wormhole-routed, with integrated virtual-channel technology to manage congestion. These technologies combine to produce a low-latency, high-bandwidth, redundant fabric with unmatched performance.

This leading density and power efficiency also comes with significant operational savings. The below compares a five petabyte server and storage cluster built using traditional two socket servers with AMD SeaMicro servers and Fabric Storage products.

© 2012 Advanced Micro Devices, Inc. AMD, the AMD arrow logo, Freedom, TIO, and combinations thereof are trademarks of Advanced Micro Devices, Inc. 2 NAS/SAN Promise vs. Reality

Reality Low utilization (20–40%) results in high infrastructure costs Vendor heterogeneity increases complexity of the overall storage infrastructure

Limited workload sharing, less powerful processors and application islands create performance and scalability issues Promise

Reduced hardware acquisition costs with increased utilization Less complexity with consistent management

(3)

In summary, the AMD SeaMicro solution provides the following benefits:

• 50 percent less acquisition cost • 70 percent less software licensing costs • 67 percent less the rack space

• 50 percent less power

Simplifying Enterprise Cloud Storage

Case Study

Despite the unprecedented amount of data that is created and stored, only a small percentage is actually used at any given time. A general rule of thumb is that only 5-15 percent of data is actively being processed and the data as a whole is processed by multiple applications. The concept of multi-temperature storage (hot, cold) was developed to improve the economics of storing the massive amounts of data. Frequently accessed (hot) data is usually available on fast, high performance storage, while dormant (cold) data is archived onto lower cost technologies such as tape backup. The transition of data from hot to cold is warm storage. With all the data being captured, it does no use if data cannot be accessed when it is needed. Warm storage provides the benefits of cold storage while making the data available in the time it takes to read it from the disk drive. Reducing the amount of data that goes into cold storage enables faster response times for applications and improved SLAs for cloud storage providers.

One of the largest service providers in the world has validated this approach by choosing AMD’s SeaMicro SM15000 as the platform for its next generation cloud computing and storage services. The SM15000 was selected because it is able to provide ready access to huge amounts of data to customers whenever they have a need for it in the time it takes to read the data from a hard drive.

The customer is leveraging the unique compute, storage, and power efficiency capabilities of the SM15000 to build out a warm cloud storage infrastructure offering that provides unprecedented performance and scalability. The business case showed a 50 percent savings in power consumption as well as drastic simplification of operations and ongoing management.

• 4,000,000+ network endpoints

• 1,000,000 customer virtual machine instances • 256,000 virtual isolated networks

• 12,000+ high performance cores

• 50,000,000+ Input/Output Operations per Second (IOPS) • 100+ Petabytes of storage

• 1.28 Tbps east-west bandwidth • 800+ 10 GbE network uplinks

AMD SeaMicro SM15000 and Fabric Storage 2 racks

1 SM15000

16 Freedom Fabric Storage Enclosures

64 OS/Big Data software licenses

No top of rack switches

No terminal servers 38 power cords

32 fabric extender cables 20,000 watts, 91 amps* Standard Servers

6 racks

112 2RU dual socket octal core “Sandy Bridge” servers with 12 3.5” SATA/SAS disks 224 OS/Big Data software licenses 12 10 GbE top of rack switches 6 terminal servers 224 power cords 248 network cables 40,000 watts, 182 amps* * Assumes 220 volts

Five Petabyte Server and Storage Cluster Comparison

(4)

© 2012 Advanced Micro Devices, Inc. AMD, the AMD arrow logo, Freedom, TIO, and combinations thereof are trademarks of Advanced Micro Devices, Inc. One of the key differentiators designed into the service is

the ability to provide large amounts of warm storage, resulting in high performance and scalability. The customer achieved this with a unique approach that was not previously possible within their business case. The ability of the SeaMicro Freedom™ Fabric to disaggregate the functions of a server

allowed the customer to achieve a service architecture that met the differentiated product definition they were building to: • All objects that need to be stored have a key and value

facets. The keys tend to be small objects and are stored in the shared SSD store that provides the high IOPS required of a cache while the values tend to be larger files that are stored in the high capacity shared fabric storage.

• Since the fabric separates the data store and compute, the system is inherently resilient in that a failure of any compute unit does not affect the overall resiliency of the system.

• The high performance compute enables the customer to create erasure codes and spread both the keys and values across multiple physical storage devices so that a failure of any storage device has zero impact to the system availability and operations.

• The SSD cache and the object store can be hot expanded by a factor of 4x and 3x respectively with no modifications to the operating environment.

The diagram at left shows a logical view of how the SM15000 interconnects compute to IOPS performance SSD storage and high capacity fabric HDD storage. The Freedom Fabric is inherently resilient and a single node failure of compute or storage does not affect the performance of the overall cloud. All of the different resources can be upgraded and replaced without any modifications or down time.

The SM15000 operated in this mode, provides unprecedented space and operational savings. As the front and rear rack views demonstrate, the deployment of 64 servers, 1.6 PB of persistent object cache and two petabytes of object storage consists of six FS-5084-L Fabric Storage enclosures connected to a single SM15000. The six enclosures are connected using 12 fabric extender cables (6+6 for redundancy). The two petabytes of storage are shared across the 64 large core servers seamlessly, resulting in a highly flexible data center computing platform. What is noticeably missing, because they are unneeded, are the top of rack switches, terminal servers and hundreds of networking and power cables.

© 2012 Advanced Micro Devices, Inc. AMD, the AMD arrow logo, Freedom, TIO, and combinations thereof are trademarks of Advanced Micro Devices, Inc. 4 Front and Rear Views of the SM15000

with Freedom Fabric Storage

Front Back Storage Enclosure 1 Storage Enclosure 2 Storage Enclosure 3 SeaMicro SM15000 Compute Unit Storage Enclosure 4 Storage Enclosure 5 Storage Enclosure 6 Fabric Extender Connections

Logical View of Cloud Computing and Storage Services

(5)

Conclusion

The industry is entering what many refer to as the zettabyte era of storage. One leading analyst firm forecasts that 34 zettabytes of data will be stored by 20163. The International

Bureau of Weights and Measures has defined metric prefixes up to yotta, which is equal to one billion terabytes. Beyond that, the industry is entering uncharted territory and does not have a prefix or word even to describe the amount of data. The era of big data is defining not only the next generation of computing storage needs, but a new vocabulary which does not exist today.

With the capacities and capabilities required, one of the keys for big data to be successful is to have an efficient computing and storage data center platform. Attempting to solve this problem with a traditional server approach is doomed to fail from the beginning. The data center is an end-to-end system comprised primarily of compute, storage, and networking. Incremental gains in each of these areas will only result in small steps forward for the industry. AMD SeaMicro has taken a revolutionary approach to building data center servers for the new realities of cloud computing and storage. It has been many years since software requirements have pushed the performance of the underlying hardware, and as a result, hardware innovation has been minimal. We are entering a new age where real innovation is required, and the AMD SeaMicro SM15000 is not just an incremental step forward, but rather a giant leap forward to improve data center performance and efficiency.

3 National Association of Software and Services Companies

(NASSCOM) and CRISIL Global Research & Analytics (GR&A), Big Data—The Next Big Thing

References

Related documents

 Relevant clinical interventions and practice standards for pressure injury prevention and management including a risk assessment for all patients on admission and when

This study allows educators within higher education to better understand the complex processes of civic commitment development and how to holistically support college students

An owner's agent must perform the broker’s minimum duƟes above and must inform the owner of any material informaƟon about the property or transacƟon known by the agent, including

Data obtained were subjected to summary statistics, di- versity analysis using both Simpson diversity and Shannon evenness index, and rank abundance curve and model.. The

But in 2004, probably as a result of a greater rainfall from the start of the growing season that year, and of the annual basic fertilisation, there was a greater uptake

In the following computational experiment, we applied various state-of-the-art computational models of visual attention, such as Judd model ( Judd et al., 2009 ), Erdem model (

Regions in visual cortex and posterior intraparietal sulcus (pIPS) that correlated with the graded saliency map (red) and regions in the anterior IPS and frontal eye fields (FEF)