Full text






Where does big data come from?

Big Data is often boiled down to three main varieties:

• Transactional data—these include data from invoices, payment orders, storage records, and delivery records.

• Machine data—this can be data gathered from industrial equipment (for

example, the latest generation of aircraft produce several terabytes of data on a single transatlantic flight), real-time data from sensors (including sensors on your smart-phone or your heart rate monitor, not to mention the 4m CCTV cameras around the UK), and web logs that track user behaviors online.

• Social data—this could be data coming from social media services, such as Facebook Likes, Tweets and YouTube views.

In many cases, this data on its own is meaningless. Real business value often comes from combining these Big Data ‘feeds’ with ‘traditional’ (relational) data such as customer records, sales location data, and revenue figures to generate new insights, decisions and actions.


What makes it big data?

Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

Evolution of Big Data


Big data Analytics

Big data analytics is the process of examining large data sets to uncover hidden

patterns, unknown correlations, market trends, customer preferences and other useful business information.

Various Kind of Analytics

Predictive Analytics

Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data to make predictions about future.

Real Time Analytics

A real-time system is one that processes information and produces a response within a specified time, else risk severe consequences, sometimes including failure.

Real-time Big-Data Analytics or Real-time business intelligence (RTBI) is the process of delivering information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.

Real-time Processing Systems

Real-time means a range from few seconds to a few milliseconds after the business event has occurred. While traditional business intelligence presents historical data for manual analysis, real-time business intelligence compares current business events with historical patterns to detect problems or opportunities automatically. This automated analysis capability enables corrective actions to be initiated and/or business rules to be adjusted to optimize business processes.


Tools For Real Time Analytics

1. Apache Spark

2. Apache Storm

3. Apache kafta

Apache Spark

pache® Spark™ is a powerful open source

processing engine built around speed, ease of use, and sophisticated analytics. It was originally

developed at UC Berkeley in 2009.


• Speed


• Ease of Use

• A Unified Engine

Engineered from the bottom-up for performance, Spark can be 100x faster than Hadoop for large scale data processing by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.

Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for

manipulating semi-structured data.

Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

Stream Analytix Solution with Apache


Impetus Technologies Announces StreamAnalytix 2.0 Featuring Support for Apache Spark

StreamAnalytix™ 2.0, featuring support for Apache Spark Streaming, in addition to the current support for Apache Storm. The platform will provide enterprises with the advantages of the industry's first open-source based, enterprise-grade, multi-engine platform for rapid and easy development of real-time streaming analytics applications.

Among stream processing engines, Spark Streaming is gaining popularity, while Apache Storm has been in production deployments for many years and is a robust, proven, widely used option. StreamAnalytix 2.0 builds on its existing visual integrated development and application-monitoring environment to provide abstraction over multiple streaming engines. It can also accommodate newer engines as they gain market acceptance. This approach allows developers and data analysts to use drag- and-drop operators to create real-time analytics applications by choosing the most optimal engine for each use case.

StreamAnalytix 2.0 builds upon the successful adoption of version 1.0, which is used by leading Fortune 1000 companies that are taking advantage of streaming data for


improved business outcomes. In addition to support for Spark Streaming,

There are a number of important functional enhancements in this release, including:

• Spark Streaming

• Rich array of drag-and-drop Spark data transformations.

• Support for Spark SQL and MLlib operations.

• Platform Enhancements

• Ability to interconnect subsystems, which individually use different streaming engines.

• Embedded complex event processing engine enhanced for high-availability support.

• Built-in operators for predictive models including inline model-test feature.

• Additional support for industry standard message queue systems, including Amazon Kinesis and Simple Storage Service (S3), Apache ActiveMQ, IBM MQ and TIBCO.

• Enhanced self-service, real-time dash-boarding with editable widgets for various chart types.

• Multi-tenancy controls with the ability to restrict resources for specific tenants and pipelines.

• Ability to create multiple versions of real-time pipelines and choose the active version.

• Rich array of real-time data processing functions for string, time, date, numeric and other data types.

• Code-free enrichment and blending of streaming data with static data with lookups and MVEL expressions.

• Extensibility of stream-processing operators and libraries with user-defined functions.


Apache Storm

Apache Storm is a free and open source distributed realtime computation system.

Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can

be used with any programming language, and is a lot of fun to use!

Stream Analytix Solution with Apache


Ease of Development

A powerful visual designer interface makes it extremely easy to build applications quickly using built-in operators.

Abstraction over Complex Technologies

Lets you focus on your business logic rather than worrying about the underlying infrastructure.

Apache kafta

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is

designed to allow a single cluster to serve as the central data backbone

for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.


Contact 720 University Avenue Suite 130 Los Gatos, CA 95032 4082133310




  1. StreamAnalytix™