Example Use Cases. Solving the Need for Speed in Data Ops. Doc Version 2.1

(1)

Example Use Cases

(2)

1 Introduction to VelociData

VelociData is leading today’s Big Data hardware-accelerated revolution to provide answers to challenging business questions in fractions of the time spent in standard compute systems and solve time-bound service level requirements. The need for speed in business decision-making requires real-time actionable intelligence from all relevant data sources. The need for speed in achieving time-bound operational processes in the face of increasing volumes, varieties and velocities of data requires transformative improvements in transformation performance. VelociData provides data transformation, data quality, data platform offload, and data sort solutions on an ultra-high performance appliance platform that enable real-time operational decisions and dramatic improvements in ETL processing performance. These unique cost/performance solutions are compatible with leading data integration tools and respond to global organizations’ growing demands while reducing total cost of ownership.

1.1 Solution Templates for Accelerating Data Ops

This document offers a quick introduction to VelociData’s technology by reviewing the best practices and solution templates that have proven to be successful within customer deployments. The use cases below have been selected to illustrate the various kinds of processing that can be greatly accelerated by VelociData. Although these selections are by no means complete, they illustrate how dramatically applications can be accelerated by offloading the toughest performance challenges to VelociData. Customers often see speed-ups of 10, 100, or even 1000 times that of traditional software-only approaches. It is easy and quick to demonstrate how the ground-breaking innovations can achieve great cost savings and improve data operations.

1.2 Extending the Life of Corporate Infrastructure

VelociData’s massively parallel but small footprint hardware appliances snap into existing data flows to perform line-rate processing. This processing can offload the most critical pain points in an infrastructure to ease the burden on existing servers. When the appliance shares the workload for an environment, the extra processing headroom returned to the offloaded platform helps to reduce the total cost of ownership against ever-increasing data loads.

Today, every Fortune 1000 organization contends with the challenge of taming a flood of data to gather valuable business information. The VelociData ETL / Data Integration Appliances dissolve the most critical performance pain points to bring actionable business intelligence to real-time. VelociData is working with leading ETL / Data integration vendors to integrate these new solutions into existing engines to allow users to deploy accelerated solutions without making changes to their existing data integration architecture (e.g., Press Release: VelociData and Informatica Partner to Fulfill Customers' Requirements to Affordable Hyper-Scale/Hyper-Speed Big Data Analytics).

(4)

VelociData Use Cases

2 Identifying the Right Use Cases to Prove the Value

Taking advantage of VelociData’s superior technology is easy and fast. VelociData was designed to seamlessly integrate within existing reference architectures. The time required for initial installation, implementing, and configuration is measured in hours, not weeks. The technical staff does not need to undergo re-training for new skillsets. No new programming languages need to be learned.

VelociData contributes their expertise in the form of collaborations, webinars, and other events to help educate the customer team on its technology and best practices for taking advantage of the appliance. This process can be very lean, lightweight, and quick. As an example, a customer reflecting back on the decision process said, "What's unique about VelociData is that you can prove the business and technical claims very quickly." Usually VelociData recommends an educational session, such as a “Lunch & Learn” or “Coffee & Donuts”, to introduce our solutions to a wider audience. This educational session is not a sales pitch, rather, it covers the nature of heterogeneous computing and why/how it has become a relevant innovation to enterprise computing. This has proven to spawn questions that help people connect the dots to possible internal use cases, as well as help in demystifying the effort involved. Once the broader technical team is energized, the proof of concept identification and testing become a natural and easy process for all involved.

VelociData helps guide teams through the structured process of:

1. Identifying the bottlenecks that are causing the worst problems in the data flow. Our team’s depth of experience in performance analysis across other well-known data transformation platforms allows us to quickly isolate areas in your work flow that can benefit from acceleration. In addition, VelociData has tools that can help to quickly identify existing bottlenecks by analyzing performance metrics in the metadata repository of existing ETL solutions.

2. Adapting workflows to utilize VelociData’s acceleration at the most critical performance points. Typically this resides in-line with existing systems, causing only minimal changes to the process.

3. Planning a Proof of Concept that will demonstrate how well the new solution can accelerate data operations. Our team can help select key success criteria that demonstrate value to the business and quickly attain “apples to apples” benchmarks to showcase performance and ROI.

At each step of the way, VelociData works with the team to ensure plan execution goes smoothly and quickly. VelociData can help with testing, training, performance analysis and other trouble-shooting and walk you through the process of utilizing VelociData to efficiently and effectively scale your existing architecture for high volume data growth. This will breathe life into conventional architectures, protecting and extending the investments as data volumes increase.

(5)

3 Proven Use Case Examples

3.1 Example 1: Complementary Data Integration Offload

The most common use case is to offload specific data integration transformations from existing data platforms. VelociData engineered solutions can extend the life of corporate infrastructure by offloading the most performance-hungry aspects of ETL; existing ETL infrastructures can remain intact. Our engineered solutions are capable of processing input streams ranging from unstructured text to structured record-oriented data at sufficient rates to completely saturate multiple 10 Gb/s lines. These accelerations can be applied to ETL, ELT, distributed, and MPP platforms.

3.1.1 Data Transformations

The VelociData appliance is capable of performing line-rate data transformations and data enrichments. Since these tasks run at 10 Gb/s line rates, there is no measurable degradation while delivering improved data to the target.

The following are some examples of transformation tasks that can be chained together with each other and any other cleansing and validation steps without slowing the data flow.

 Lookup & Replace: Enrich data by populating fields from a master file, or convert values

from an old dictionary to a new one (for example, Product ID lookups, etc.)

 Type Conversions: Inter-convert data elements from binary to character, and convert

character encodings between platforms

 Format Conversions: Convert data between XML, delimited (CSV), or fixed-width, and

rearrange add or drop fields to change layouts

 Key Generation: Hash multiple field values into a unique pseudo-key using MD5 or SHA

 Data Masking: Obfuscate data for delivery to non-production environments using

persistent or dynamic masking; Format-preserving encryption or format-preserving masking that leverages AES or SHA accordingly

(6)

3.1.2 Typical process for adopting VelociData’s solutions

To assess the best candidate opportunities to leverage VelociData accelerations for ETL, run times for individual transformation steps (stages) are reviewed. VelociData can help with this performance analysis by reporting on the operational stats within the ETL server’s metadata repository. After looking at performance numbers, it is usually evident which transformations could be most improved with VelociData acceleration. In the example below, “Lookup and replace”, “Field validation”, “Bounds Checking”, and “USPS Address Standardization” appeared to be good candidates for acceleration for this customer. They appeared to be bottlenecks that were contributing the most to making the batch job lengthy. In other words, if these transformations could be speedup significantly, the overall batch job would run much faster.

Based on these recommendations, VelociData could be installed in front, behind, or to the side of the ETL server in the workflow. In the following scenario where VelociData sits in front of the ETL server, speedups happen before the ETL server even begins its processing. This configuration is simple and practical because the data passes once through first VelociData and then the ETL server. No existing interfaces need to be changed other than commenting out the bottlenecked transformations in the ETL server.

(7)

After these changes were made (taking only two hours in this instance), performance of the ETL transformations improved by more than 1000 times. For the end-user it was very simple and quick to put into place.

3.2 Example 2: Improving Data Quality

Another area where VelociData accelerations have shown great value is to improve data quality operations. VelociData typically looks for those operations that take the longest time or the most resources and see if there could be a faster, cleaner approach. Because VelociData uses a streaming data architecture, many quality checks can be done on-the-fly without slowing the work flow down. No time is taken to even store data on the appliance.

3.3 Example 3: USPS Address Standardization

The VelociData engineered solution is capable of validating and standardizing USPS addresses at wire-speed while correcting bad or incomplete data. This solution can standardize over 10 billion addresses per hour, which is 200 times faster than a competitive system (as deployed on a 64-bit RedHat server with 8 processing cores and 16 GB of memory). This high speed solution can be integrated into structured data flows and can be coupled with other data validations and corrections for greater offload capabilities.

(8)

3.4 Example 4: Data Cleansing and Validation

The VelociData solution offers a suite of data validation and correction engines that can be run on structured and semi-structured data. These validations offer such extreme performance they can be run on data in flight without slowing it down. The quality checks typically run at “line rates” even for 10 Gb network lines. Below are some examples:

 Standardization, verification and cleansing of USPS addresses

 Domain Set Validation, Null Checks

 Regular Expression Field Validation (validate format of email addresses, SSNs, dates, etc.)

An ETL workflow handling company-wide information for a Fortune 50 company initially ran overnight. This pushed the limits of the current solution and affected the SLAs of other workloads sharing the same environment. The use case involves validation and filtering of over 3 billion records. Through targeted offload of the slowest cleansing and validation tasks within this workflow, VelociData was able to reduce the overall runtime by an order of magnitude to just over 1 hour, creating new headroom for other applications in the shared environment and acting to future-proof the particular workflow against SLA infringement.

3.5 Example 5: Data Platform Offload

Another solution template that has been very useful is to apply VelociData to offload other data platforms that have been tasked with transformations or processing for which they are just not designed. Mainframes are tremendous at transactional processing but not so good at some other tasks and algorithms more suited for other platforms. These platforms were just not built to do the types of transformations while VelociData’s engineered system is purpose-built for these processes. Similarly, an MPP platform (e.g., Teradata, Netezza) is quite good at certain analytic workloads but when it comes to more mundane ELT, the platform is not as economical. Using a purpose -built solution for transformations with better price-performance can deliver substantial savings each & every year. In most platforms, where VelociData complements an existing solution, it does not replace the existing platform but adds great value.

3.6 Example 6: Mainframe Offload & Acceleration

An IBM mainframe offload began with an EBCDIC to ASCII conversion where the data was destined for Hadoop as part of a large-scale analytics project. The customer gave VelociData their most complicated Cobol Copybook. In the POC, VelociData took an EBCDIC data set with 1300 variable-length fields in 30 different record types with >10,000 COBOL redefines and fully

(9)

unpacked/converted a 16 GB densely populated file (i.e., 9 million records) into Big Data-ready ASCII in less than a minute. When asked to demonstrate "the cost of change" by altering the input/output data formats, VelociData demonstrated how its product could be configured to handle the changes without any coding, in a matter of a few minutes. Offloading this effort from the customer's IBM mainframe saved nearly $200,000 per month, but the real strategic value will be gained by providing wire-speed, high volume analytics-ready data into Hadoop in support of a new money making service being introduced. Solving one begets the other.

3.7 Example 7: Combining Mainframe Offload with Data Quality

In response to a customer request, VelociData has also offloaded and accelerated the processing of a custom mainframe record format. Each record of this format included a header section followed by thousands of key / value encoded data elements. For each of these records, VelociData parsed the key-value encoded section converting each element from various mainframe formats into printable ASCII that could be easily parsed by a downstream Hadoop system. This solution not only converts these elements to ASCII, but also filters out any key and value containing invalid or unset characters, and it trims extra whitespace from each element to compact the output file for faster downstream parsing. This solution ran at over 600 MB/sec, processing a 1.2 GB daily file in just 2 seconds.

3.8 Example 8: Streaming Sort

Perform streaming data sort with VelociData at a million records per second on large datasets – even billions of records – without slowing down the rest of your processing. Accelerated sorting can be used to accelerate a variety of other applications, such as deduplication, mainframe sorts, JOINs, merges, indexing, aggregations, and Map/Reduce (Big Data / Hadoop).

3.8.1 Comparisons with Existing Solutions

A customer dataset of 3 million rows each populated with 100 fields (800 Bytes total) and keys from three fields was accelerated by VelociData SORT. This data ran ten times faster than a leading industry application on an 8-core 64 GB system. A large insurance company found that performance was at least 20 times faster than their existing solution on files having over 500 million rows.

(10)

3.9 Example 9: Encryption and Data Masking

As one final pattern of where an engineered solution can improve data operations, consider today’s security and compliance challenges. Regulatory compliance requirements and increased security needs have opened up new avenues for corporations to find innovative, economical ways to meet their goals. VelociData’s approach to encryption and data masking is through a streaming data architecture. No data is stored on hard drives and no plaintext data resides in regular memory. What if you could mask or encrypt your data anywhere at wire speeds?

3.9.1 Encryption

The VelociData appliance delivers the ability to encrypt and decrypt data faster than it can be transferred through a network. VelociData has demonstrated the ability to secure data with strong encryption (up to 256 bit AES) at over 2 GB per second, which can more than saturate two 10 Gb/s channels. This security can be offered for a full data stream or for selected fields within structured data. VelociData can also deliver key rotations and key changes without slowing the data processing flow.

3.9.2 Format-Preserving Encryption & Format-Preserving Masking

VelociData offers this high-speed capability in an extremely valuable format-preserving mode. This encryption mode, which conforms to the NIST 800-38G specification, allows users to encrypt (reversibly) or mask (irreversibly) data without changing its field specification. It is applicable for local targets or for a private or public cloud. A data set containing 10 million records with ten sensitive fields in each record can be masked or secured in seconds compared with a day using conventional approaches. One of the most desirable features of format-preservation is that databases do not require schema changes in order to store a column and analytics tools function normally.

Masking differs from encryption in that masking cannot ever be reversed back to the original plaintext values. Masking is used to generate non-production datasets, such as for QA/testing or to move the dataset to the cloud. VelociData was used in a Format-Preserving Data Masking use case for the purpose of improving QA testing data samples. In a recent proof of concept, VelociData was given a densely populated 100 field record layout and directed to mask 25 specific fields in every input record. VelociData was able to demonstrate these transformations on 100,000 records in one second which is the equivalent to masking 2.5 million fields per second.

(11)

4 Summary

VelociData can be used in a variety of ways to save corporations time, money, and reduce complexity. VelociData solutions can snap into a reference architecture in a variety of ways to offload and accelerate existing processes. When hardware acceleration is applied to data processing challenges, the results can be dramatic. Processes that run in hours or days can be completed in minutes or seconds. Costly hardware additions or upgrades can be significantly reduced, delayed or in some cases, negated entirely.

VelociData provides not only the expertise to identify use cases but we also provide the implementation and integration expertise to demonstrate value quickly. VelociData proof of concept engagements typ ically demonstrate multiple use cases within days of entering the data center. Put VelociData to the challenge, we’ll deliver an appliance, and once installed, PoC use cases are typically running and showing results in hours, not days.

VelociData is a fixed price model. A monthly subscription fee covers as much use as you can throw at the appliance. The fee includes all maintenance, support and enhancements.

VelociData’s claims are easy to prove out. Typically, once we are installed, we can show results in a couple of hours.

Example Use Cases. Solving the Need for Speed in Data Ops. Doc Version 2.1