Revision 7.12-4
Discussion of the Cloud and Tiered Architecture | Hie Electronics, Inc.
Copyright 2012
Introduction
This paper compares some of the most popular cloud storage options along with introducing a new option in tiered data storage for the cloud. SSD has changed the cloud storage environment to allow for a high-speed access tier, but not every customer would need high-speed access. A special type of customer will utilize cloud storage rather than onsite storage, owning a disaster recovery and backup tier would increase the number of these customers. Today, the bottom line is return on investment and, for those who do cloud services, economies of scale. There are many ways to go about deciding on the best internal schematics of the service and then also just as many options for a customer to decide on the right service. There seems to be a consistent bottleneck in two different areas; one, the access of data via network and, two, efficiency of the tiered storage architecture. Hie Electronics can expand cloud storage environments to a new tier.
The need for tiered storage data architect is a requirement to utilize economies of scale. A study by Carnegie Mellon documents that data does not need to be online at all times; in fact, data acts under Pareto principles and about 80% of all data is considered sleeping data. Sleeping data, or cold data, is rarely accessed and/or changed. (Gibson, 2007) The other 20% of data is real time, or hot, data that must be constantly and instantaneously accessed. To illustrate how that affects the cloud service industry, an EMC expert defines the time line for what is sleeping data: “80% of the world’s data does not change after it is 90 days old.” (Herzog, 2010)
What happens to data that is 90 days old when it resides in a SAS environment? Should that data stay online in an expensive Storage Area Network (SAN) or Network Attached Storage (NAS) appliance? The
logical answer is no. Business best practice for the cloud computing service and the majority of the IT world utilizes tiered and actively managed storage architecture. The best of these Active Archive™ architectures will use hybrid media choice models to combine and exploit the best possible storage technologies available.
Tiered Storage in the Cloud
In this example, tiers will be referred as stated: Tier 1: SSD (Instant Access)
Tier 2: SAS HDD 15K (Hot/High Priority) Tier 3: SATA HDD 7.2K (Warm Data) Tier 4: Blu-ray Disc (BD) (Cold/Sleeping)
this energy passive disc. The TeraStack® Solution will become Tier 3 and 4 in this example. With the TeraStack® Solution, the IT personnel would set rules as to when data would be considered cold or sleeping and automatically this data would be pulled into the system. Since it is an automated process, no human error can occur and all data that is sleeping will be on the correct storage device. Data will be on the cheapest storage medium, while still allowing for access, if need be. The TeraStack® Solution also allows for sleeping data to become hot or active again. When this type of data is accessed, it is then marked hot and will be available for instant access (and therefore can become cold again). Due to the very affordable cost to capacity ratio of the TeraStack Solution® this can be accomplished with amazing cost savings immediately and even more so long-term.
Utilizing the Cloud
The SAS cloud service provider must build a storage architecture that utilizes all the tiers to provide the customer the best cost option for their needs. Many industry reports, such as those by ESG, conclude that instant access to all 100% of their data is no longer a common need or desire. The balance of cost versus performance is the customer’s calculation. How do we approach such a complicated issue that is
so very customer specific? The answer is simple. As cloud providers must offer Service Level Agreements
(SLA), they must offer guidelines meeting all major requirements. These include the following top 5 basic must-have requirements:
Instant access to data accessed and added within the past 90 days Guidelines to access data that is older than 90 days
Speed of access and access privileges
Disaster Recovery Data Model that matches their needs, minimum 2 copies available Disaster Location Management Plan, minimum of 2 different unique and independent sites Providing this information to the customer will allow for the SAS cloud provider to deliver service that is sufficient for the customer and meets the internal requirements. Economies of scale are the purpose of the cloud. SAS can only be offered with economic feasibility by the cloud provider by utilizing a tiered architecture. Having a massive Tier 1 front end will require a substantial and potentially even
exponential back end. This back end is typically found in Tier 2 technology such as SATA HDD or Slower HDD devices. Typically, Tier 2 can have a wide array of options: virtual libraries, slower disk arrays, optical technology, or a wall of tapes. Why not use an automated system to manage the movement of
data from active data to sleeping data and back i.e. Tier1 to Tier 2 and back to Tier 1?
Nearline Data in the Cloud
Cloud SAS storage offerings are missing one possible option for their customers. The majority of SAS customers do not need fast access to their data. These customers need Active Archive™ when stored in the cloud. What if this data could be stored on an energy passive media, and still be available for
customer retrieval and use within a minute or two? This option offers data availability but with slightly
recovery, or do not need instant access to their data. Being expensive to maintain, hard drive storage should not be utilized for storing this type of data. The TeraStack® Solution would tier this infrequently accessed data while still allowing for access in a relatively short time.
The Hybrid Solution
Hie Electronics is a leader in Active Archive™ data storage system technology industry and the manufacturer of the TeraStack® Solution, an Active Archive™ processing, data backup and archiving system. It allows for application hosting, 50-100 terabytes of data to be nearline accessible on Blu-ray optical media, and up to 42 terabytes of data on online hard drives. The company has been recognized by Frost and Sullivan with the “American Video Surveillance Product Innovation of the Year Award” and the “Data Storage Technologies Green Excellence Award in Technology Innovation,” for its ability to reduce energy costs with its storage technologies. A leader in Sustainable IT technology, the Hie Electronics TeraStack® Solution product line delivers a 90 percent energy cost savings when compared with that of current technology. Hie Electronics is an Energy Star Small Business Network participant and a stakeholder in the Energy Star Enterprise Storage Initiative.
Comparison of Cost and ROI
The following example (Figure 1.1) shows different providers that meet all the basic requirements: have a stable internet access with practical WAN access speeds (approximately 2TB/day), multiple data storage facilities, two thirds of the storage uses network egress, ten percent uses network ingress, normal Get and Head requests, and normal service requests. The pricing is a generalization for
customers and based on customer experience or published pricing. It may not include all network costs that may be associated.
Service Provider Back-Up Required (TB) $ / Month $ / GB Total Cost for 4 Years Total Cost for 8 Years Total Cost for 10 Years Amazon S3 50 $5,515.00 $0.11 (Amazon, 2012) $264K $528K $662K Atmos (Value) 50 $7,500.00 $0.15 (Peer1 hosting, 2011) $360K $720K $900K Barracuda 50 $12,500.00 $0.25 $600K $1,200K $1,500K
Google (World Wide) 75 $13,968.12
(Google, 2012)
$0.19 $670K $1,341K $1,676K Figure 1.1: Example of Cloud Pricing Options
software, and management are nulled. Rather, it takes into account assumptions such as 24/7 operation with 99% uptime, average national electricity costs with no increase, and to buy the product upfront. Notice that there is a need to buy the hard drive set up every four years due to industry standard failure rate for HDD. The comparative return on investment is noted immediately after the first data migration that is required by the cloud provider in year 4.
Year 0 4 8 10
Generic Cluster HDD SAN Turnkey(150TB Cluster Online System)
Hardware Cost $313,158 $313,158 $313,158
Estimated Power Usage @ 675 w/ hr $710 $710 $710
Estimated Cooling Costs $759 $759 $759
Total Cost $313,158 $314,627 $314,627 $1,469
Total Accrued Costs (150TB On-line System) $313,158 $632,191 $951,224 $954,162 Cost per GB/month (150TB On-line System) 8.78¢ 6.61¢ 5.30¢
Generic HDD SAN Competitor (150TB Online System)
Hardware Costs $142,083 $142,083 $142,083
Estimated Power Usage @ 7,870 w/ hr $8,273 $8,273 $8,273
Estimated Cooling Costs $8,852 $8,852 $8,852
Total $142,083 $159,208 $159,208 $17,125
Total Accrued Costs (150TB On-line System) $142,083 $352,666 $563,249 $597,499 Cost per GB/month (150TB On-line System) 4.90¢ 3.91¢ 3.32¢
8.14.42 TeraStack® Solution(142TB Online & Near-line Solution)
Hardware Cost $241,571 $5,600 $5,600
Estimated Power Usage @ 425 w/hr $477 $477 $477
Maximum Cooling Costs $478 $478 $478
Total Costs (142Tb near-line system) $241,571 $6,555 $6,555 $955
Total Accrued Cost (142 TB Near-Line System) $241,571 $250,961 $260,381 $262,291 Cost per GB/month (142TB Near-line System) 3.68¢ 1.91¢ 1.54¢
Figure 1.2: Cloud Comparison Cost per GB over 10 years
As time progresses, using the hybrid architecture found in the TeraStack® Solution will allow a
substantial decrease of hardware costs and running costs. The investment made by the cloud provider when using this solution will see this investment as a win.
Speed of a Download
Today’s CIO and Enterprise Storage Engineers the end users of cloud based storage must consider access speed over a slower Wide Area Network (WAN) against the data access speed when the storage
appliance is attached to a Local Area Network (LAN) internet access vs. intranet access. What
downloads of the same 15MB file copied to each cloud storage service provider. After the download was complete, the customer cleared the Gladinet cache and then initiated by the Command Prompt for the next downloads via specific cloud provider. (Huang, 2010)
Upload (sec) Download # 1 (sec) Download # 2 (sec) Download # 3 (sec) Average Download Amazon S3 83 18 20 17 18 AT&T Synaptic Storage 84 37 49 45 43 Google Storage for Developer 94 15 17 17 16 Peer1 CloudOne 85 20 16 19 18 Windows Azure Blob Storage 86 48 72 52 57
Figure 1.3: Example of Customer Experience to Speed of Access to Downloaded Data
Noting the time difference and the way the study was done, the difference can be drastic. But who won?
Was it Google or Amazon? The question has to be answered on how the customer set up each service
and if it was identical. If all is equal, then the real winner is the provider that gave the basic
requirements and got the greatest profit based on performance. More than likely Azure won the battle by using their tiered architecture and automated data retrieval system. Using an automated solution, such as the TeraStack® Solution, there is a win-win that provides the data to the customer well within the specs and at the lowest cost to the provider.
Other Cloud Provider Concerns
Application agnostic is another concern to some cloud providers; can the cloud provider use any
application desired for internal results and fluidity? The answer should be yes and explains why larger
cloud solutions typically have sections of their infrastructure designed and separated based on purpose and applications installed. The data environment is always changing. It is time to take an approach that can mold with how data acts during the Data Life Cycle; a hybrid solution that recognizes the Pareto principle for 80% of the data that is sleeping vs. the 20% that is active. It is time to counteract the problem that all companies face; how to keep 100% of a customer’s data for as long as the customer needs to keep it.
Summary
period. The most proven method to support the lowest total cost of ownership is to use a hybrid architecture that exploits online, nearline, and offline capabilities. The solution for cloud providers that look for these economies of scale and economic efficiency can be found in the superior hybrid
Works Cited
Amazon. (2012). Amazon S3 Pricing. Retrieved June 17, 2012, from Amazon Web Services: http://aws.amazon.com/s3/pricing/
EMC. (2010, April). Retrieved June 17, 2012, from EMC Centera Contect-Addressed Storage System. : http://www.emc.com/collateral/hardware/data-sheet/c931-emc-centera-cas-ds.pdf
Google. (2012, June 16). Google. Retrieved June 19, 2012, from Pricingandterms: https://developers.google.com/storage/docs/pricingandterms
Herzog, E. (2010). Cloud Tiering Appliance. (EMC, Interviewer)
Peer1 hosting. (2011). CloudOne Storage powered by EMC Atmos. Retrieved June 18, 2012, from Ping & People:
http://www.peer1.com/sites/default/files/pdf/datasheets/CloudOne_Storage_consumer.pdf Robinson, K. (2012, May 31). Tiering in RAID Storage Environments. Retrieved May 31, 2012, from
Storage Newsletter: http://www.storagenewsletter.com/news/systems/lsi-tiering-in-raid TechTarget. (2012). Find a Tech Definition. Retrieved June 18, 2012, from WhatIs.com:
http://whatis.techtarget.com/
For more information about Hie Electronics and the innovative TeraStack® Solution, visit the company’s website at www.hie-electronics.com.