Lab Validation
Report
Symform Cloud Storage
Decentralized Data Protection
By Ginny Roth
June 2011
Contents
Introduction ... 3
Background ... 3
Centralized Data Center Security ... 4
Decentralized Data Center Security ... 4
Symform Cloud Storage ... 5
ESG Lab Validation ... 6
Getting Started ... 6
Securing Data in the Cloud ... 10
Cloud Control Security ... 13
Durability and Availability ... 15
ESG Lab Validation Highlights ... 18
Issues to Consider ... 18
The Bigger Truth ... 19
Appendix ... 20
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of the Enterprise Strategy Group, Inc., is in violation of U.S. Copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482.0188.
ESG Lab Reports
The goal of ESG Lab reports is to educate IT professionals about emerging technologies and products in the storage, data management and information security industries. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Lab's expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by Symform.
Introduction
Many solutions are available to deal with data protection these days, with most approaches solving specific business needs whether it’s services requiring continuous uptime or offline storage for less sensitive data. Cloud services are the next step in the natural evolution of data protection, offering another option for their data protection strategies as customers can transfer the costs and burden of data protection to third-party providers.
Symform’s data protection in the cloud employs a decentralized model that distributes data to multiple nodes in the cloud rather centralizing storage to one data center, differentiating itself from most cloud storage solutions. ESG Lab examined Symform’s Cloud Storage with a focus on the security and cost benefits derived from a decentralized data protection model.
Background
There’s no doubt that companies still struggle with developing effective data protection strategies, mostly due to the challenge of keeping up with rampant data growth. Not only do they need high availability for critical services, they also need effective storage solutions for disaster recovery. As shown in Figure 1, backup and recovery solutions still remain the most significant investments in the area of data storage, especially among midmarket companies.1 Those same companies are looking for new solutions for offsite data replication to provide effective disaster recovery.
Figure 1. Data Storage Investments
Source: Enterprise Strategy Group, 2011.
ESG research has also discovered the majority of customers considering replacing their current backup with a third-party online backup service cite some sort of cost concern as the main driver.2 Providing a solution that is more cost effective than current in-house solutions and processes is cited by 52% of those surveyed. In addition, 42% of
1
Source: ESG Research Report, 2011 IT Spending Intentions Survey, January 2011.
16% 17% 17% 20% 20% 29% 41% 15% 25% 24% 25% 17% 20% 31%
0% 10% 20% 30% 40% 50%
Tape replacement Storage virtualization Improved storage management software tools Purchase new SAN storage systems Purchase more power-efficient storage hardware Data replication solution for off-site disaster recovery Backup and recovery solutions
In which of the following data storage areas will your organization make the most significant investments over the next 12-18 months, by midmarket vs. enterprise
(Percent of respondents, five responses accepted)
Enterprise (1,000 employees or more, N=157)
Midmarket (100 to 999 employees, N=132)
respondents stated that predictable costs (e.g., simpler budgeting) would be another compelling driver for moving to third-party solutions.
Predicting storage needs, however, can still be a challenge, particularly with offsite storage—growth in the data volumes that requires protection can lead to escalating costs for not only storage, but the software and
management of the software for data protection. Customers that aren’t interested in incurring the costs of building an off-site data center for their increasing data protection needs may look to a third-party service.
Centralized Data Center Security
The majority of cloud storage providers have standardized on a data center architecture where all servers typically reside in a single location, known as the centralized data centers, these servers are assumed to be trusted and reliable.
Centralized cloud storage has become an attractive option for many organizations as these operations enable customers to realize economies of scale by consolidating on standardized hardware and software platforms. Virtualization has only accelerated adoption of this approach as many servers can be hosted on a single hardware platform.
In the context of data protection, centralizing storage in one data center allows organizations to easily reach needed resources for high availability and disaster recovery; data is not scattered throughout the network, where it can be difficult to retrieve and protect from outside attacks. Centralization relies on traditional physical and logical security controls and allows for complete management and control of infrastructure. At the same time, though, consolidation creates its own challenges.
Centralized data storage creates a single point of failure and single point of attack. While replication to another site can reduce these risks, it can make data protection exercises even more costly (sometimes doubling or tripling costs) in terms of the power, cooling, hardware, and ongoing operational expenses associated with maintaining another site.
In addition, consolidating data to one or two sites creates a single target for security threats from both inside and outside the organization. Once a data center is breached, backed up data is vulnerable; since backup data is centralized in one location, a breach of this sort means that potentially ALL data is vulnerable.
Decentralized Data Center Security
A decentralized architecture has emerged as cloud technology and broadband penetration have become ubiquitous. Under this model, protected data is distributed on servers in many locations across the participating network. These decentralized servers are assumed to be untrusted and unreliable, which requires the software service to do more for security and reliability.
Decentralized data for backup storage can provide cost savings for many organizations as cheap, unused
commodity storage is often already available and scattered throughout a company’s network. In addition, since the potential storage is dispersed to many locations, access to data doesn’t come under the same bandwidth
constraints since each location has its own pipe to the internet. Decentralization also shrinks the footprint for data attacks, since a breach of one data center doesn’t expose all backed up data to the attacker.
The challenge with decentralization is providing effective protection for data that can be exposed at a number of locations. Since these locations can be remote offices, branch offices, and even home office locations, providing and maintaining security for so many disparate sites can be daunting.
There is also the matter of data integrity: protected data would potentially be distributed to multiple locations, but the risk still remains that loss of any of those data stores would result in a total loss of protected files. Because of storage constraints, it’s likely that the data would exist in only a few locations throughout the network, increasing the likelihood of data loss should those resources go down.
Symform Cloud Storage
Symform’s solution for data protection provides a decentralized approach to cloud storage that utilizes customers’ available local storage to provide backup and restore services. As shown in Figure 2, customers participate in a storage exchange where their nodes at company premises are used to store data from other customers’ participating source servers. Each customer’s data is spread as thinly and redundantly as possible to provide an added level of security and durability against localized disasters like rogue employees, fires, natural disasters, etc. This distinguishes Symform’s cloud solution from other cloud offerings that provide data storage on centralized data centers.
Symform’s Cloud Control provides the service that maintains the locations of the data distributed throughout the Symform user network using a set of highly available replicated databases. It also continuously monitors the uptime of participating nodes to identify those that are no longer available. When a node fails, the Cloud Control service will rebuild the lost data fragments from that node and redistribute them throughout the rest of the Symform storage cloud.
Figure 2. Symform Cloud Storage
Using a decentralized approach to data protection, Symform offers the following features to provide security, reliability, and speed for data backups to the cloud:
Sound Data Protection. Symform divides local data into 64 MB blocks and encrypts it using strong AES-256 encryption. The blocks are then broken into 1 MB fragments. Thirty-two 1 MB parity fragments are
generated and moved to the cloud along with the data fragments to provide data integrity and reliability. Cloud Speed. 96 fragments from a block of data are sent in parallel to 96 locations on the Internet, providing increased speed over file copies between two centralized sites.
Durability. Constant monitoring of contributing resources detect when a contributing node is down and uses parity fragments to regenerate lost fragments.
Cost. Because Symform’s Resilient Storage Architecture does not have the costs associated with centralized data centers (power, cooling, bandwidth, etc.) it can potentially offer a lower cost of ownership than other cloud providers that rely solely on centralized datastores.
ESG Lab Validation
ESG Lab performed evaluation and testing of Symform’s cloud storage using servers located at Symform’s facilities in Seattle, Washington connected to their live cloud service. Focus was on examining the security advantages with a decentralized approach for backup/restore and disaster recovery.
Getting Started
ESG Lab used Symform local servers connected to the Internet to test the features of its cloud storage service. As shown in Figure 3, one node was used to synchronize data to the cloud. That same server was configured as a contribution node used to store data from other nodes in the Symform storage cloud. In addition, another server was used as a contribution node for data availability tests.
ESG Lab used a single workstation to access the web-based administrative console via a standard web browser to a hosted site in Symform’s Cloud Control.
Figure 3. ESG Lab Test Bed
With data replicated solely to the cloud, the onboarding process for Symform’s cloud storage involves registering servers as sync nodes, contribution nodes, or both.
ESG Lab Testing
ESG began testing by launching the symformsync.exe program located on the server used for synchronizing to the cloud. This program launches a wizard to set up a new node or to reinstall the software on a computer that may have had a Symform node running on it previously.
ESG Lab configured the new node with a name and location to be used later in the administrator dashboard. As shown in Figure 4, ESG Lab chose to make the new node synchronize files and contribute storage to the cloud. This allowed ESG Lab to examine the backup and storage parameters that can be defined by the configuration wizard.
Figure 4. Node Type
Once the node type was defined, ESG Lab configured its synchronization mappings. Mappings define the location for local storage and map those to a cloud folder that can be accessed later for restores. ESG Lab chose the local data path D:\Nodes\Build and mapped to a location in the cloud called “Test Folder.”
Next, ESG Lab configured bandwidth allocation from the server to the cloud. Bandwidth allocation can be defined for both business and non-business hours to minimize disruption to the network and server load. In addition, business hours can be specified for companies adhering to an alternative schedule. For testing purposes, ESG Lab defined normal business hours of 8AM-6PM, Monday-Friday. For bandwidth, ESG Lab configured 2Mbps\upload, 2Mbps\download for business hours and 4Mbps\upload, 8Mbps download for non-business hours.
ESG Lab then defined the parameters for the contribution server. As Figure 5 illustrates, ESG Lab used the local path, \\vine\development\jared\contrib, as the contribution folder. This folder is used as the repository for data from other customers using the Symform service. Since the Cloud Control service needs access to the folder for
depositing data, a set of credentials needs to be defined for use by the Cloud Control service. These credentials can be a special service account with access to only the folder defined as the contribution folder. ESG Lab used a set of test credentials with access to the specified folder.
Figure 5. Contribution Folder
Next, ESG Lab defined the port used by the contribution server for inbound connectivity from the Internet. This port can be manually configured with special considerations for firewall rules regarding port usage. Additionally, servers can be placed in the DMZ and used for contribution nodes to avoid exposing internal network resources to access from the outside.
Figure 6 summarizes the configuration parameters ESG Lab completed in the previous steps.
When ESG Lab completed configuration of the node for synchronization, a unique ID and key for the server was assigned from the Symform service. Each synchronization node requires a license to participate in the cloud storage network.
ESG Lab observed that the setup of a new customer environment to synchronize data to the Symform storage cloud was easy to configure using a simple wizard. In ten steps, ESG Lab was able to install, configure, and begin backing up data to the cloud.
Why This Matters
Data protection in the cloud is attracting attention as it promises savings in OPEX and CAPEX to customers looking for an off-site solution for backup and disaster recovery. Those savings can vary significantly depending on how storage space is licensed, the time to backup and recover files from the cloud, and how easy the solution is to maintain.
Typically, providers of cloud storage price products based on the amount of storage capacity required by the customer. As data needs grow, so does the cost for storage in the cloud. But since customers are the suppliers of space for cloud storage, Symform provides pricing per server with no limits on the actual storage space required. The costs to the customer are usually already sunk in storage capacity that is already paid for and not used. Even if additional local storage must be purchased, the price for local storage is significantly less than the cost of storage space in the cloud.
ESG Lab was able to set up a Symform data protection environment with just a few simple steps and found backups and restores easy to manage. The “set it and forget it” functionality of backups makes the cost to maintain the solution negligible.
Securing Data in the Cloud
One of the main concerns for storage data in the cloud is security. This is especially the case when data is stored on unsecured nodes throughout a decentralized cloud. Data can reside anywhere in the world, making it potentially accessible by the hosting party.
Symform utilizes multiple security measures to ensure that data is safe and unreadable in the storage cloud. Figure 7 illustrates the steps taken as data is prepared and synchronized to the cloud. To begin, data is broken into 64 MB blocks at the source. A unique 256-bit key is generated using a hash of the block itself, eliminating the need to manage any keys for encryption and decryption of data stored in the cloud. That key, in turn, is used to encrypt the block using AES-256 encryption and is stored and secured in Symform’s Cloud Control service.
After the block is encrypted, it is broken into 64 1 MB fragments. Using Reed-Solomon encoding, 32 parity fragments are created. All encryption and encoding happens behind the firewall. The 96 combined fragments are then distributed to 96 random computers on the network determined by Symform’s Cloud Control.
Figure 7. Encryption, Fragmentation, and Distribution
In order to compromise data, one would need to first know where the 96 fragments dispersed, extract the fragments from the customer premise for all 96 fragments, and then reassemble the fragments. Additionally, the key generated for encrypting the block would have to be obtained from the Symform Cloud Control. All this would accomplish the theft of one block of data, not even necessarily an entire file.
ESG Lab Testing
ESG Lab examined the data on the synchronization node to verify the encryption and fragmentation. Browsing to the file path of the synchronized data, ESG Lab noticed a folder created called .sync where all the operations are performed on the data. ESG Lab opened an encrypted block and observed the contents. As shown in Figure 8, the block was encrypted and unreadable.
Figure 8. Encrypted Block Contents
Once the block is encrypted, it is broken into 64 fragments. ESG Lab opened a folder containing the fragments and observed that the data was broken into 684 KB fragments. The smaller size is due to the fact that the original file was smaller than 64 MB. Consequently, the 64 fragments were smaller that the expected 1 MB. Figure 9 shows the directory containing fragments for synchronization to the cloud. ESG Lab also observed that the names associated with the fragments at this point have no correlation to the actual files identified for backup.
Once the fragments are sent to contribution nodes, they are accessible by the customer that hosts the node. However, since the fragment is only one piece of a data block and it’s encrypted, it shouldn’t be readable when accessed with a normal file editor. ESG Lab confirmed this by browsing to a data fragment on the test contribution server. As shown in Figure 10, the contents of the fragment were also completely encrypted and unreadable.
Figure 10. Encrypted Fragment
In addition to the encrypted contents of the fragment, ESG Lab also observed that the files themselves contain no names or data that would identify the name of the customer or type of data associated with that data fragment.
Why This Matters
Using a decentralized approach for data protection in the cloud provides some inherent security advantages over more centralized solutions. With data encrypted, fragmented, and geographically distributed to disparate nodes throughout the cloud, it’s clear that Symform’s approach to data protection provides maximum assurances that the data cannot be compromised. This addresses a key concern customers cite when looking at cloud solutions, which is perceived security shortcomings with their data in the cloud.
ESG Lab was able to confirm successful encryption of data both on the synchronization and contribution nodes. In addition, the file names themselves had no correlation with the customers they belonged to. Without access to the administration tool, it would be impossible to reassemble the fragments in any meaningful way.
Cloud Control Security
Symform’s Cloud Control provides the main service that manages the location of fragments, manages keys for encryption, monitors contribution servers for failed nodes, and provides a web management service for customers. This service is maintained by Symform redundantly in multiple SAS70 Type II certified hosting facilities including Amazon’s Web Services and SoftLayer. Access to the database is provided by a web interface that requires credentials assigned to the customer when they register.
As shown in Figure 11, customers access the Symform web service through a firewall in a SAS70 hosted network. All resources are contained behind the firewall, including the database and its replicas.
Figure 11. Symform Cloud Control
ESG Lab Testing
ESG Lab examined the multi-tenancy feature of the database service. Since one database is used to store
information about every Symform customer, rights must be managed and maintained to ensure customers are not able to view one another’s data. ESG Lab accessed the service with two separate web browsers and used logins from two different customers to access the same database hosted within the third-party service. Figure 12 illustrates two views showing nodes configured for synchronization and contribution. ESG Lab observed that each customer was able to see only the nodes in their own network.
Figure 12. Multi-tenancy
ESG Lab also audited network security for the database within Symform’s Cloud Control. The database itself provided the minimal number of ports to communicate specifically with the Symform web service. In addition, one secure port is available for Symform engineers to maintain and update the database.
Why This Matters
With its unique design, Symform is able to store customer data scattered throughout the storage cloud. The nature of this decentralized model makes data extremely secure from outside attack. In order to manage the fragments and track their locations, a central store is maintained in a SAS70 hosted network. This central store represents a fraction of the overall data needs required if Symform were to offer a traditional centralized model for storing data in the cloud. With one small metadata store, Symform shrinks the work by multiple orders of magnitude, which keeps its costs very low.
ESG Lab was able to observe effective separation of data for multi-tenancy accounts. Access to the database was limited to the Symform Cloud Control web service, providing a secure layer between customers and the database itself.
Durability and Availability
Providing a cloud storage solution requires a robust strategy for high availability. Data must be available and resilient. Because of the nature of Symform’s decentralized model, a customer’s data sits at many locations throughout its storage cloud.
As shown in Figure 13, a data block is shredded into 64 1MB fragments with 32 parity fragments added using the Reed-Solomon algorithm for error correcting. These fragments are dispersed to 96 different nodes throughout the Symform storage cloud. Symform’s Cloud Control service constantly monitors all nodes in the storage cloud in order to detect ones that fail. After a node failure is detected, the Cloud Control service, using the 32 parity fragments, regenerates and redistributes the lost fragments to other nodes in the storage cloud.
In order for a data block to be lost, 33 of the nodes where the fragments live would have to fail. The odds of this occurring are extremely low.
ESG Lab Testing
Next, ESG Lab tested the effect of a file restore when a contribution node is taken down. Both the contribution node and the synchronization node reside in the test network; however, the files that were backed up to the storage cloud were dispersed over multiple servers. ESG Lab’s test was conducted to confirm that there isn’t a one-to-one relationship between a synchronization and a contribution node on the same customer network.
Before the restore began, ESG Lab took down the contribution server so it was no longer visible to the storage cloud, shown in Figure 14, effectively cutting off the data residing on the contribution from the rest of the Symform storage cloud.
Figure 14. Downed Contribution Server
ESG Lab then used the Symform web administration tool to select the file “tail.exe” to restore back to the synchronization server. When the restore completed, ESG Lab examined the original location of the file and confirmed that “tail.exe” was correctly restored. Figure 15 shows the results of the directory search, listing tail.exe back in its original location.
Figure 15. Successful File Restore
Why This Matters
Customers have long dealt with high availability within their own data centers and know full well the challenges associated with providing continuous services and recovering from outages. When looking at the cloud for outsourcing data protection solutions, customers expect the same level of high availability. Cost savings achieved with the cloud mean nothing if data is not available when needed because of disruptions in service.
Symform’s decentralized approach distributes data fragments throughout its storage cloud using a highly effective redundancy solution to ensure data is always available. With constant monitoring of contributing nodes, Symform can recover quickly when servers become unavailable and reassemble lost fragments across other nodes.
ESG Lab tested the file restore capabilities of the Symform storage cloud by removing a node from the network. ESG Lab was still able to easily retrieve the requested data.
ESG Lab Validation Highlights
Setting up a data protection environment was easy with Symform’s software. With a simple wizard servers were defined for protection and backups and restores were easy to manage.
Data was encrypted and fragmented at the source, providing a high level of security as data was distributed to the storage cloud.
Symform’s Cloud Control showed effective multi-tenancy support for customers accessing information about their data protection environments.
Symform’s data in the cloud showed effective durability as disruption to contribution nodes had no effect on restoration of backed up files.
Issues to Consider
When planning for data protection with Symform, it is important to take into account the total capacity required for the solution. Since, for redundancy, 32 parity fragments are added to a set of 64 fragments, the total free storage space needed to contribute to the cloud comes to 1.5 times the actual storage earmarked for backup.
The current release of Symform’s Cloud Storage is available for Windows platforms only. Support for Linux and Mac operating systems is planned for future releases.
Symform’s Cloud Control database is hosted by a third-party SAS70 service. While the database is replicated for high availability and snapshots are kept for disaster recovery, disruptions that can impact the availability of the database can occur. Customers should evaluate the hosting party’s SLA for security and availability when designing a data protection strategy.
The Bigger Truth
Recent ESG research reveals that improving data backup and recovery continues to be a top five spending priority.3 Data protection strategies are constantly evolving as new technologies that promise to change the landscape of how data is stored, secured, and recovered are introduced. This comes at an opportune time as companies large and small are struggling to keep up with the protection demands created by rapid data growth. As these
organizations start to re-evaluate their options for disaster recovery and business continuance, they increasingly look to emerging solutions to provide answers to evolving problems.
Symform provides a cost effective solution for those looking for a new approach to data protection. Small to medium-sized companies can benefit from a solution that is easy to install and configure and requires no expertise to maintain. Even companies with a solid strategy for onsite storage for high availability for critical data can benefit from a solution that augments this strategy with an offsite storage offering that can provide a rapid return to business in the event of loss of onsite data.
Using a storage exchange with its customers, Symform has changed the paradigm in how companies look at the cloud for their data storage needs. Its unique design allows it to maintain a highly scalable data cloud storage network without the costs associated with building data centers to accommodate massive data growth. These savings allow Symform to deliver data protection at a fraction of what customers would pay for other cloud storage offerings.
Cloud offerings that maintain a central data center must deal with the familiar challenges enterprises face today with their own data centers: how to replicate data to avoid single points of failure and how to effectively protect backed up data that is stored and accessed from one location. Customer concerned about securing their backup data in the cloud will find Symform’s decentralized design to be an effective solution to address the integrity, availability, and confidentiality of protected data. Dispersing strongly encrypted fragments of data with parity throughout the Symform storage cloud provides strong protection against theft, damage, or loss of data assets. Those looking for a unique and cost effective methodology for data protection will find a fresh approach with Symform’s Cloud Storage. ESG Lab was able to confirm a strong level of security built in to the solution, beginning with the moment data is identified locally for protection to the end nodes where the backed up data resides. With security, durability, and high availability built in to a low cost cloud storage framework, Symform delivers a
compelling solution for businesses of all sizes.
Appendix
Table 1. ESG Lab Test Bed
?
Symform Cloud Storage
Servers
Synchronization node
Intel x64-based with 8 GB RAM Windows 2008 R2
Contribution node (2 servers)
Intel x64-based with 8 GB RAM Windows 2008 R2