• No results found

DataStax Enterprise, powered by Apache Cassandra (TM)

N/A
N/A
Protected

Academic year: 2021

Share "DataStax Enterprise, powered by Apache Cassandra (TM)"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

PerfAccel

(TM)

Performance Benchmark on Amazon:

DataStax Enterprise, powered by Apache Cassandra

(TM)

Disclaimer:

All of the documentation provided in this document, is copyright Datagres Technologies Inc. Datagres PerfAccel is a patent pending technology from Datagres Technologies Inc. Information in this document is provided in connection with Datagres products. No license, express or implied, by estoppel or otherwise, to any Datagres intellectual property rights is granted by this document. Except as provided in Datagres's Terms and Conditions of Sale for such products.

(2)

EXECUTIVE SUMMARY

NoSQL databases, cloud deployments and SSDs are some of the buzz words that dominate the current IT infrastructure conversations and for good reasons. More and more deployments are moving to the cloud and a vast portion of these deployments include the current breed of Non-Relational

Databases or NoSQL as they are mostly referred to. Solid State Devices are gaining popularity too, given the low latency high throughput option they present. All cloud providers now have SSD as part of their offerings, as direct attached devices and also as network attached devices.

The direct attached devices are fast but available in limited

capacity on the instances. Larger direct attached SSDs are only available with larger and more expensive instance types. This paper presents how PerfAccel can be used with cloud deployment of Cassandra to provide improved performance at lower cost and as a way to using cloud instance storage better.

NOSQL DATABASES AND I/O PERFORMANCE ISSU ES

NoSQL or Not Only SQL as it is referred to in some contexts, is a new method of data management that is different from the more traditional relational model of data storage. NoSQL has evolved out of the need to provide simplicity in design, on-demand horizontal scaling and finer control over consistency, availability and partition tolerance which are some of the key requirements of modern applications. Adoption of NoSQL databases has increased with increase in size of datasets also referred to as Big Data and with applications trying to store data in a way that makes application design and

development simpler. It’s a good alternative to a normalized relational data model which causes impedance mismatch and makes application design and development slow and feature expansion difficult. Traditional databases typically scale vertically. NoSQL databases however, scale out and on demand. This is very critical for most cloud-based deployments

which require rapid scaling on demand at times of high load. Horizontal or scale out model used by NoSQL databases presents a new dimension to IO Performance handling. Database IO optimization techniques which used to work well with vertical scaling model, are not completely applicable to the horizontal scaling approach. Hence, there is a need to apply different techniques of achieving better throughput and latencies while using cloud deployments.

NoSQL databases are particularly prone to be affected by performance cliff, which happens when the working set of the application exceeds the system RAM. Due to the inherent way in which NoSQL databases and the applications that use them work, where most of the data access patterns are random and even with considerate design and good choice of primary key to drive a working data set, IO performance issues crop up all the time.

Cassandra is one of the leading NoSQL databases. It is preferred for its properties of high-availability and high-scalability without any compromise on performance. While the architecture of Cassandra does provide many great features, it comes at a cost of I/O performance unless used with high speed disks (SSDs). This is due to the way Cassandra manages updates to existing keys. Cassandra follows the log structured update model, where updates are sequentially written to a new immutable file, and the older entries marked with tombstones. Compaction algorithm takes care of removing the stale entries. The additional task of removing stale entries and compressing the tables comes at a cost of additional reads and writes on an IOPS constrained data disk. This coupled with the limited network bandwidth on cloud instances is one reason why best practice suggests that local SSD based instance storage be used for storing the Cassandra dataset. However, the choice of running an all SSD system substantially increases the cost of deployment.

NoSQL databases run better with

low latency SSD systems

PerfAccel provides the best

combination of price performance

NoSQL databases scale horizontally

presenting different IO challenges

Cassandra compaction process

creates high IOPS requirement

(3)

CLOUD INSTANCES AND STORAGE TYPES

Cloud instances come in different forms and compositions. The variations are from RAM sizes, to number of CPUs and the type of storage devices connected. Instances typically are configured to be optimized for memory, cpu or storage. The choice depends on the use case and application behaviour.

Figure 1 - Instance type categories for Amazon EC2

Two basic types of storage choices are available for use within cloud instances. Storage that is directly connected to the hosted virtual machine instance or using remote storage served over the network. Remote storage served over the network can be served off a magnetic media or off a bank of SSD storage. Similarly, the storage directly attached to the virtual machine also could be SSD or magnetic disks. SSD storage provides fastest performance as compared to magnetic storage especially if it is also locally attached SSD device. We describe the storage choices based on how the Amazon EC2 provides them. Other cloud providers also have similar offerings.

Figure 2 - Storage types available on Amazon EC2

(4)

USAGE OF INSTANCE STORE DISKS / LOCALLY ATTACHED DISKS

The instance store disks or locally attached disks, which are typically SSD devices in a storage optimized instance are very different from the rest of the storage devices. These disks are

ephemeral in nature, and can lose data on a hard reset or if the base machine were to restart. This makes using these disks challenging. Since the data cannot survive past hard resets and restarts, the data on these devices need to be backed up at all times on a more reliable device type.

Because of this difficulty the instance store disks are either not used at all or used as a temporary storage making it problematic. The

only other alternative is when the application using the disk is able to seamlessly handle the loss of data which might ensue.

PERFACCEL SOLUTION

PerfAccel presents a unique solution that provides deep analytics to observe IO behavior, helping determine better data placement and improve performance of NoSQL database deployments. In addition, using its intelligent caching capabilities, PerfAccel can deliver much higher performance. The result is a significant reduction in

infrastructure costs while providing rich analytics and much higher performance.

PerfAccel supports acceleration of all IO across multiple platforms. It supports NAS, SAN and DAS to provide a seamless performance benefit to all types of applications. Configurable caching policies ensure that the right working set resides in the cache for maximum performance benefit. It is extremely easy to deploy and manage and the in-depth analytics can provide deep insights to help users understand application IO pattern and IO footprint to optimize workloads. GUI based management console helps in managing across

large grid deployments with a centralized data repository for analytics.

PerfAccel can be used to take advantage of instance store SSD in a manner which is beneficial in two ways. Firstly, there is no need for the application or any tool to ensure that the data lost at reboot be placed back on the faster device before the application can start using it. Secondly, PerfAccel can actually use the device in a much more efficient manner, by ensuring the hot data resides on the faster storage providing much better performance from the same instance type.

PerfAccel would use the faster device available as a cache and will ensure optimal placement of frequently used hot data. The application directly benefits since all the reads coming from this device are much faster increasing performance and reducing latency. In addition, since these read operations are offloaded by the cache, the backend storage device which holds the entire dataset is more responsive as it has to serve fewer IOPS. Thus PerfAccel cache not only improves read performance, it also implicitly improves the write performance of the application.

Instance Store SSDs are

ephemeral and can lose data

Limitations create problems for

getting most out of the system

PerfAccel provides:

Storage visibility through deep

file-level analytics

Intelligent caching & deterministic

placement of hot files

Higher performance using fewer

SSDs used optimally

Increased scale by leveraging

spinning disks

(5)

TEST AND BENCHMARK CONFIGURATION

The following tests and benchmark were performed on Amazon EC2

 Run a workload with the entire dataset residing on a provisioned iops, ssd backed EBS (optimized) volume.  Run the workload with the entire dataset residing on a locally attached SSD.

 Run the workload with the entire dataset on provisioned iops ssd backed, EBS (optimized) volume, which is cached on a locally attached SSD, using PerfAccel.

Benchmark

 Datastax’s Cassandra as the NoSQL database.

 Cassandra-stress as the tool to generate load on the database.

Figure 3- Test/Benchmark Configuration

Chosen Instance Types

 The chosen instance types are same in all respects, except for the size of instance store SSD attached to them.  The r3.2xlarge instance has a huge cost advantage as it is more than 50% cheaper compared to i2.2xlarge.  The r3.2xlarge instances were used to run the workload with PerfAccel, and with EBS.

 For running the workload with all the data on SSD, i2.2xlarge instances were used.

 While running the workload with EBS and with PerfAccel, the dataset was stored on an optimized EBS volume (general purpose SSD backed).

(6)

TEST RESULTS

Throughput Results

 Significant throughput improvement as compared to EBS.  Within 10-20% of direct SSD throughput.

 Write heavy throughput is almost the same for all the cases, as the commit log resides on SSD for all the test cases.

Figure 5- Throughput (Read Heavy) Figure 6- Throughput (Write Heavy)

Results - Latency for 95% of ops

 Latency numbers follow the same pattern as throughput.

 Majority of data being served from the SSD with PerfAccel cache keeps the latency low and performance good.  Once again, for the write heavy workload, the difference is not much.

Figure 7- Latency for 95% of ops (Read Heavy) Figure 8- Latency for 95% of ops (Write Heavy)

Results - Latency for 99% of ops

 These are important latency numbers, showing that majority of the operations benefit from the low latency I/O from the SSD cache.

 The I/O performance follows the same pattern of throughput.

(7)

PERFACCEL STATISTICS

Following PerfAccel stats were collected at the end of the test when run with PerfAccel.

Summary

Figure 11- PerfAccel Summary Stats

Top Files in the Cache by Read Hits

Figure 12- Top files in the cache by read hits

Top Files in the Cache by size of cache used

(8)

PERFACCEL ANALYTICS

Summary Graph

Figure 14- PerfAccel advanced analytics summary graph

 The graph shows high read misses and low read hits in the earlier part of the run.  At about the 20 minute mark, the read misses go down and the read hits start going up.  Few cache cleanups are seen in the later part of the run.

 Lot of write misses, but no write hits. Which means no data in the cache is being updated.

Inode Read Hits Graph

Figure 15- PerfAccel advanced analytics Inode Read hits graph

 For the file with the most number of read hits. We look at the pattern of read hits over 60 second intervals.  Initial part of the run, there are very few hits, as most of the file is still not in the cache.

(9)

Inode Read Distribution Graph

 The inode read distribution graphs shows the read pattern from the file.  Very clearly the pattern is completely random.

 With the overall pattern matching the previous graph with more hits in the middle part of the run, than at the start or end.

Figure 16- PerfAccel advanced analytics File read activity graph

RETURN ON INVESTMENT

As seen above, by using an r3.2xlarge instance that costs less than half of a full ssd instance i2.2xlarge, one can reduce infrastructure costs by more than 50%. For a single node of i2.2xlarge node replaced with r3.2xlarge instance the cost savings are in the range of close to $9,000 on an annualized basis.

PerfAccel can enable significant cost savings, by reducing the instance size, and by reducing the number of instances with its storage intelligence that uses fast SSD devices and can enable systems to handle much more load then they would otherwise be able to.

SUMMARY

PerfAccel cache with Instance store SSDs on smaller EC2 instances is a winning combination of Performance and Cost. For use cases, where the entire dataset cannot reside on instance store, PerfAccel presents an excellent solution to effectively use faster I/O media. PerfAccel is very easy to deploy and provides valuable insightful analytics Detailed analytics and configurable caching policies can further improve performance, by optimal use of cache space.

(10)

About Datastax:

DataStax delivers Apache Cassandra™, the leading distributed database technology, to the enterprise. Apache Cassandra™ is built to be agile, always-on, and predictably scalable to any size.

With more than 400 customers in over 50 countries, DataStax is the database technology and transactional backbone of choice for the world’s most innovative companies such as Netflix, Adobe, Intuit, and eBay. Based in Santa Clara, Calif., DataStax is backed by industry-leading investors including Comcast Ventures, Crosslink Capital, Lightspeed Venture Partners, Kleiner Perkins Caufield & Byers, Meritech Capital, Premji Invest and Scale Venture Partners. For more information, visit DataStax.com or follow us @DataStax.

For more information, visit www.datastax.com.

About Datagres:

Datagres provides software that helps companies visualize, control and accelerate their application performance using deep storage intelligence. Datagres’ flagship product PerfAccel is a very powerful analytics driven software solution that operates at a file level and can show the exact IO pattern of an application data access especially in a scale-out grid environment. As a result, it can provide an effective way of controlling IOs and also accelerate for higher throughput and lower latencies using high-performance SSD devices.

The company is headquartered in Palo Alto, California and is venture-backed by Nexus Venture Partners For more information, visit www.datagres.com.

Datagres Technologies Inc

2600 EL CAMINO REAL, Palo Alto, CA 94306 Phone: 510-402-4365

References

Related documents

In order to produce the correct output pulses to encode y(t) = abs(x* (t)), it is only necessary to determine when the encoded input signal x(t) could be zero while

No representation, express or implied, is made by British Business Bank plc and its subsidiaries as to the completeness or accuracy of any facts or opinions contained in

The overall form of this movement is an AaBb, with the sections being divided by large areas of rest in the saxophone, the introduction of different aspects, such as the piano,

After studying with Master Chia for only two months, through simply concentrating on his navel, the Chi energy started to circulate in the Microcosmic Orbit automatically

The finance solutions will be distributed via intermediaries, such as lenders, internet-based finance providers or investment funds, taking into account the business bank’s

There are many other products which may help fatigue, depending on the cause—low thyroid TS-II, Kelp, KC-X, Target TS-II, adrenal weakness or blood sugar imbalances Siberian

Graduation Repayment Period In-School Period • FAFSA • Budgeting • Supplemental Loan Selection: PLUS vs. Private Loans • Credit Score • Understanding Regulatory Updates

L&S Credit Type: Counts as LAS credit (L&S) Course Options: Sustainability.. URB R PL 590-001 Topic: Building Leadership Competency for