• No results found

EMC SOLUTION FOR SPLUNK

N/A
N/A
Protected

Academic year: 2021

Share "EMC SOLUTION FOR SPLUNK"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

EMC SOLUTION FOR SPLUNK

Splunk validation using all-flash EMC XtremIO and

EMC Isilon scale-out NAS

ABSTRACT

This white paper provides details on the validation of functionality and performance of Splunk technologies using EMC XtremIO and EMC Isilon.

(2)

2 To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller, visit www.emc.com, or explore and compare products in the EMC Store

Copyright © 2015 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

(3)

TABLE OF CONTENTS

EXECUTIVE SUMMARY ... 4

OBJECTIVES ... 4

AUDIENCE ... 4

INTRODUCTION ... 5

EMC XTREMIO FOR SPLUNK HOT AND WARM BUCKETS ... 5

EMC ISILON SCALE-OUT NAS FOR SPLUNK COLD AND FROZEN BUCKETS ... 5

SPLUNK ... 6

SPLUNK OVERVIEW ... 6

SPLUNK ARCHITECTURE ... 6

SPLUNK VALIDATION OVERVIEW ... 8

XTREMIO BONNIE++ PERFORMANCE TESTING ... 8

CLARIFICATION ON BONNIE++ AND INLINE DATA REDUCTION ... 9

RESULTS ... 10

THE EMC XTREMIO & ISILON SPLUNK SOLUTION ... 11

(4)

4

EXECUTIVE SUMMARY

The Big Data market continues to grow with a greater than 40% year-over-year increase in 2014. The main driving force behind this growth is the use of analytics to gain valuable insight from new and existing types of data sources, resulting in increased

productivity, profitability, customer satisfaction, and competitive advantage.

Splunk has become a leader in this space with over 9,000 customers in 100 countries. Splunk provides the capability to mine machine generated data and turn it into valuable insights. Splunk can take any data, any log, from anywhere in your infrastructure and add it to a searchable, intelligent index through which you can extract meaningful data about what's happening. Splunk calls this Operational Intelligence which is aimed at three main use-cases:

 IT Operations: Utilization, Capacity Growth

 Security: Fraud Detection, Real-time Detection of Threats, Forensics

 Internet of Things (IoT): Sensor Data, Machine-to-Machine, Machine-Human Interactions

Machine data is generated by applications, networking devices, host and server logs, mobile devices, and more. Splunk not only captures this information, but will also search and analyze it. The data can be analyzed by examining the real time feeds. Splunk captures and indexes the data and allows you to run searches on the live data as it streams. Splunk can quickly analyze and provide insight into issues and problems in a matter of minutes instead of hours. This data analysis can provide you with a better

understanding of your operational environment, reveal patterns, correlate events from multiple sources, and reduce the time for detection of important events.

As customers adopt Splunk and take advantage of these compelling features, managing the underlying DAS infrastructure becomes challenging. Maintaining consistent performance and leveraging flash to ensure the end users get fast query and search capabilities from the Splunk Dashboard begins to involve significant time in design and troubleshooting. In addition, enabling longer retention periods increases floor space in the data center and adds management overhead.

This EMC Splunk reference architecture offers a solution which provides the capabilities and features to easily and economically support scaling Splunk within an IT infrastructure. By combining the high performance and linear scalability of XtremIO® with the multi-protocol linear scalability of Isilon®, customers can feel confident supporting large-scale Splunk deployments growing to TB’s of ingest per day. This white paper provides insights into a cost-effective, scalable, and flexible infrastructure that combines the value of EMC’s Splunk Reference Architecture with the operational intelligence of the Splunk eco-system.

OBJECTIVES

The key objectives of this whitepaper are:

 Validate Splunk high-scale throughput and IOPS with an architecture that includes XtremIO and Isilon

 Prove the Splunk scale-out capabilities with this architecture by starting at 500GB ingest/day and scaling to 1TB ingest/day

AUDIENCE

(5)

INTRODUCTION

This white paper focuses on supporting a large-scale Splunk ecosystem (500GB-1TB+ ingest rate/day). The paper will demonstrate consistent and linear performance for a large Splunk deployment. It also will prove the ability to support growth and scale of a Splunk deployment. The intent is to prove that a customer can confidently continue to analyze more and more of their IT

environment without the concern of the underlying infrastructure being able to keep up. Ultimately, the advantage of this reference architecture over traditional large-scale DAS deployments will prove the benefits of large-scale consistent performance, less

management overhead, data efficiencies, and data center environmental advantages (density, power, cooling, etc.). This reference architecture is built around two EMC solutions: All-Flash EMC XtremIO and EMC Isilon Scale-out NAS. Their advantages and strengths in relation to Splunk are outlined below.

EMC XTREMIO FOR SPLUNK HOT AND WARM BUCKETS

EMC XtremIO is a scale-out all flash array that provides predictable and consistent low latency performance. XtremIO provides always on inline data reduction services such as data deduplication and data compression. The simple all-flash design of XtremIO requires significantly lower administrative overhead compared to local server storage for hot and warm buckets. XtremIO inline data reduction allows the unique ability to leverage Splunk clustered indexers without the additional disk overhead as XtremIO reduces the capacity of the clustered copies. By leveraging Splunk clustered indexers with XtremIO, administrators have application

protection as well as XtremIO’s XDP data protection. This can avoid lengthy and performance-impacting data or index rebuilds in the event of disk failures.

Key benefits of utilizing XtremIO scale-out all flash array for hot/warm storage include:  Linear & simple scalability up to 90TB all-flash in a highly available architecture.

 Enterprise rich features such as double parity data protection, inline data reduction, inline data compression, and no impact snapshots.

 Access via fiber channel or iSCSI with boot from SAN support.  Data at rest encryption with self-encrypting drives.

EMC ISILON SCALE-OUT NAS FOR SPLUNK COLD AND FROZEN BUCKETS

Acting as the Data Lake Foundation, the center of an analytics ecosystem, EMC Isilon provides a highly scalable, flexible, and secure storage system that protects data and optimizes the flow of information within an organization without sacrificing application performance. The Isilon OneFS operating environment provides the specialized data protection, data security, compliant retention, and simple, massive scalability required for long-term retention.

Key benefits of utilizing Isilon scale-out NAS for Cold storage include:

 Linear and simple scalability up to 50PB in a highly available architecture. Your Splunk cold bucket can start out with a smaller footprint and easily scale to fit your Splunk environment as it grows.

 Significantly lower administrative overhead as compared to local server storage by providing administrators as easy way to grow without configuring more physical servers and storage.

 Unmatched efficiency with over 80% storage utilization to reduce IT capital investment requirements.  Enterprise rich features such as snapshots, WORM retention, encryption, multi-tenancy, and deduplication.  Multi-protocol access including but not limited to SMB, NFS, Object and HDFS to leverage HUNK functionality.

 Option to leverage Isilon automated tiering to further lower TCO of cold data retention and utilize Splunk frozen process to automate deletions, controlling data lifecycle management.

(6)

6

SPLUNK

SPLUNK OVERVIEW

The Splunk application provides the ability to search, analyze, and visualize data gathered from different sources in your IT infrastructure including applications, networking devices, host and server logs, mobile devices, and more. For each incoming data source, Splunk indexes the data into a series of events that you can view and search.

Splunk Overview

In summary, the power of Splunk is to  Collect data from anywhere  Search and analyze everything  Gain real-time operational intelligence

SPLUNK ARCHITECTURE

The main architectural features for Splunk are its Web Interface, Apps, Forwarders, Indexers, and Search Heads.

The Web Interface is called Splunk Web and provides the ability to administer and manage the Splunk deployment, create searches, and create reports. Splunk Web is the primary interface for any Splunk User.

Splunk provides extensions through the use of Apps. For instance, an organization may need more specific networking or administration views. As another example, the EMC Isilon App provides a detailed view for your EMC Isilon cluster. The Forwarder forwards the data to either another forwarder or to an indexer.

(7)

tier requires high-performance, low latency storage that can either be provided via local disks in index servers or in externally attached SAN storage, which is the focus of EMC’s XtremIO solution in the paper.

As data ages in the Splunk environment, Splunk provides the ability to continue to tier data down into a Cold Bucket. The Cold Bucket is still searchable and is often used for longer tail searches, forensic analysis, or as a retention tier where less frequently accessed data can be kept at a lower cost, but remain searchable. The Cold Bucket is often served by externally attached storage via NFS protocol access. NAS technologies offer an acceptable blend of performance and lower cost per TB, which is the focus of Isilon’s use in this reference architecture.

Data can also tier into a Splunk Frozen Bucket, but this data is no longer searchable and requires manual user action to bring the data back into Splunk Enterprise Buckets in order to be searchable. While customers sometimes choose to leverage Frozen Buckets to meet compliance retention requirements, the purpose of this paper is to show how Isilon’s massive scalability and competitive cost of ownership can empower customers to retain more data in their Cold Bucket so data is searchable and retained to meet any compliance or regulatory retention requirements. The graphic below describes Splunk Bucket concepts in more detail.

Splunk Index Buckets

The Search Head manages and directs the search functions such as directing requests to peers. After receiving results from the different peers, it will merge the results back to the user.

For the purpose of this white paper, we will focus on the Indexers and the Search Heads. The following is an example of the Splunk architecture.

(8)

8

SPLUNK VALIDATION OVERVIEW

For the validation of the Splunk ecosystem, a virtual environment was setup with a Cisco UCS Blade infrastructure. Leveraging a shared storage model demonstrates the ability to use a denser compute environment such as blade servers. This will allow customers to take advantage of data center footprint cost savings including reduced power and cooling costs.

The environment sizing for Splunk was created using the Splunk recommended guidelines according to the tech brief found at:

http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Referencehardware

Each Indexer VM was configured with 12 vCPUs and 12GB of RAM.

For the validation, Splunk recommends using Bonnie++ to simulate Splunk indexing and querying. The Bonnie++ tool provides an indication of disk performance to simulate Splunk indexing and random I/O read performance to simulate Splunk searches. These values will be reported below to show the capabilities of this architecture in a Splunk environment.

For each VM, Bonnie++ was installed with the latest version from http://www.coker.com.au/bonnie++/bonnie++-1.03e.tgz. The following Bonnie++ command was used for testing:

bonnie++ -u root:root -d <destination_mount> -fb where

<destination _mount> is the mount point for the XtremIO storage.

For the overall testing strategy, the Splunk guidelines for an indexer were followed where each indexer handles about 125 GB per day. The first series of tests were performed with 4 Indexers to simulate a throughput of 500 GB per day. Then, the next series of validations were performed with 8 Indexers to simulate a throughput of 1 TB per day. The Bonnie++ commands were run

simultaneously on the 4 Indexers and then on the 8 Indexers.

XTREMIO BONNIE++ PERFORMANCE TESTING

A test environment was setup that included the following components which were focused on hot/warm performance disk I/O on the XtremIO X-Brick:

For 500GB ingest:

• (1) XtremIO 3.0.2 10TB X-Brick

• (4) vSphere 5.5 hosts configured with CentOS Linux release 7.1.1503 (Core)

Isilon X410 Cluster

(9)

For 1TB+ ingest:

• Added 2nd XtremIO brick online with automatic expansion

• (8) vSphere 5.5 hosts configured with CentOS Linux release 7.1.1503 (Core)

Isilon X410 Cluster

splunk-a-indx01 splunk-a-indx02 splunk-a-indx03 splunk-a-indx04 ESX Server Virtual Machines XtremIO xbrick XtremIO Hot/ Warm Bucket Isilon Cold Bucket Isilon X410 Cluster

splunk-b-indx05 splunk-b-indx06 splunk-b-indx07 splunk-b-indx08 ESX Server Virtual Machines XtremIO xbrick XtremIO Hot/ Warm Bucket Isilon Cold Bucket

1 TB Ingest

CLARIFICATION ON BONNIE++ AND INLINE DATA REDUCTION

Bonnie++ is the most widely used benchmarking tool for Splunk. It creates data using an algorithm for the put block (write) and rewrite file tasks that leads to many duplicate data/blocks in each of the files. XtremIO’s inline data reduction engine eliminates these duplicates (and compresses what remains) which enables the array to process more data than it can do for real Splunk data set. Hence, the bandwidth seen with Bonnie++ might be artificially higher than a production Splunk environment. This effect can multiply when running multiple instances of Bonnie++ against a single XtremIO array, which is the case in this testing.

(10)

10

RESULTS

The XtremIO system was easily able to handle write throughput of 2.2GB/s scaling out to 4.3GB/s. The read throughput scaled from 2.3GB/s to 5.1GB/s and the IOPS scaled from 24K to 45K. This is well above the Splunk minimum requirements for disk IO.

Bonnie++ Results for 4 Indexers and 8 Indexers

Splunk VM's

put_block

(MB/s)

rewrite

(MB/s)

get_block

(MB/s)

seeks

(IOPS)

4 indexers

2227

1190

2360

24708

8 indexers

4298

2648

5144

45151

Bonnie++ Scaling from 4 Indexers to 8 Indexers

Bonnie++ IOPS scale from 4 Indexers to 8 Indexers

Customers can be confident that the XtremIO and Isilon platform easily handles ingesting TBs of data. In addition, customers could consolidate the amount of Splunk indexers required to support the necessary ingest rates resulting in a savings of compute resources as well as density, power and cooling in the data center.

0

1000

2000

3000

4000

5000

6000

put_block (MB/s)

rewrite (MB/s)

get_block (MB/s)

(11)

THE EMC XTREMIO & ISILON SPLUNK SOLUTION

EMC XtremIO and EMC Isilon scale-out architectures make them an ideal fit to handle the demanding Splunk requirements around intensive workloads for hot and warm data along with the ever-expanding capacity requirements of cold and frozen data.

By addressing these key Splunk priorities separately, it allows the customer to implement the solution that best fits their needs without any contention across these three tiers that would be found in either a DAS or single platform appliance solutions.

Importantly, it also allows customers the flexibility of expanding hot\warm or cold\frozen solution needs independently and protects against limitations and bottlenecks found in traditional architectures at scale.

CONCLUSION

Deep insight into new or previously ignored data sources has resulted in increased competitive advantage for corporations as they are able to improve productivity, profitability, customer experience, and retention. Splunk is a leading platform in this space that enables collection, analysis, and real-time insights into data sources. As customers take advantage of these capabilities and increase the volume of their analyzed data, supporting the performance, reliability, and security of the underlying infrastructure becomes critical. The EMC Splunk reference architecture composed of XtremIO and Isilon not only meets these requirements, but does it with the right economic model as key features such as data efficiencies and data-at-rest encryption are leveraged.

References

Related documents

Abstract: In this study KEmeny Median Indicator Rank Accordance (KEMIRA) method is applied for solving personnel ranking and selection problem when there are two subgroups of

Instructors can create handwritten lecture notes in a series of pages on the Sympodium tablet using the digital pen in 3 colors (red, blue, and black) and save them in Notebook

This form is used to rapidly establish default model parameters: design codes; analysis type; material properties; member design settings etc.. Some of these settings can

Hawkins T, Chitale M, Luban S, Kihara D (2009) PFP: automated prediction of gene ontology functional annotations with confi dence scores using protein sequence data.. Chitale

Diagnostics Overview • Network management and diagnostic Windows GUI software • Network-wide operation from any remote terminal • Non intrusive protocol – runs simultaneously

Table 1 Continued System Effects Specific changes Specific changes in health condition/disorder/dysbiosis Role of microbiota Type of study Reference administration of multiple

Copulas split the multivariate distribution function of a random vector into the univariate marginal distribution functions and the dependence structure represented by the copula..

El coste del tratamiento de las LPP está directamente relacionado con su impacto como problema de salud y con el tiempo necesario para su cicatrización, además de importantes