• No results found



Academic year: 2021


Show more ( Page)

Full text






Where does big data come from?

Big Data is often boiled down to three main varieties:

• Transactional data—these include data from invoices, payment orders, storage records, and delivery records.

• Machine data—this can be data gathered from industrial equipment (for

example, the latest generation of aircraft produce several terabytes of data on a single transatlantic flight), real-time data from sensors (including sensors on your smart-phone or your heart rate monitor, not to mention the 4m CCTV cameras around the UK), and web logs that track user behaviors online.

• Social data—this could be data coming from social media services, such as Facebook Likes, Tweets and YouTube views.

In many cases, this data on its own is meaningless. Real business value often comes from combining these Big Data ‘feeds’ with ‘traditional’ (relational) data such as customer records, sales location data, and revenue figures to generate new insights, decisions and actions.


What makes it big data?

Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

Evolution of Big Data


Big data Analytics

Big data analytics is the process of examining large data sets to uncover hidden

patterns, unknown correlations, market trends, customer preferences and other useful business information.

Various Kind of Analytics

Predictive Analytics

Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data to make predictions about future.

Real Time Analytics

A real-time system is one that processes information and produces a response within a specified time, else risk severe consequences, sometimes including failure.

Real-time Big-Data Analytics or Real-time business intelligence (RTBI) is the process of delivering information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.

Real-time Processing Systems

Real-time means a range from few seconds to a few milliseconds after the business event has occurred. While traditional business intelligence presents historical data for manual analysis, real-time business intelligence compares current business events with historical patterns to detect problems or opportunities automatically. This automated analysis capability enables corrective actions to be initiated and/or business rules to be adjusted to optimize business processes.


Tools For Real Time Analytics

1. Apache Spark

2. Apache Storm

3. Apache kafta

Apache Spark

pache® Spark™ is a powerful open source

processing engine built around speed, ease of use, and sophisticated analytics. It was originally

developed at UC Berkeley in 2009.


• Speed


• Ease of Use

• A Unified Engine

Engineered from the bottom-up for performance, Spark can be 100x faster than Hadoop for large scale data processing by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.

Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for

manipulating semi-structured data.

Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

Stream Analytix Solution with Apache


Impetus Technologies Announces StreamAnalytix 2.0 Featuring Support for Apache Spark

StreamAnalytix™ 2.0, featuring support for Apache Spark Streaming, in addition to the current support for Apache Storm. The platform will provide enterprises with the advantages of the industry's first open-source based, enterprise-grade, multi-engine platform for rapid and easy development of real-time streaming analytics applications.

Among stream processing engines, Spark Streaming is gaining popularity, while Apache Storm has been in production deployments for many years and is a robust, proven, widely used option. StreamAnalytix 2.0 builds on its existing visual integrated development and application-monitoring environment to provide abstraction over multiple streaming engines. It can also accommodate newer engines as they gain market acceptance. This approach allows developers and data analysts to use drag- and-drop operators to create real-time analytics applications by choosing the most optimal engine for each use case.

StreamAnalytix 2.0 builds upon the successful adoption of version 1.0, which is used by leading Fortune 1000 companies that are taking advantage of streaming data for


improved business outcomes. In addition to support for Spark Streaming,

There are a number of important functional enhancements in this release, including:

• Spark Streaming

• Rich array of drag-and-drop Spark data transformations.

• Support for Spark SQL and MLlib operations.

• Platform Enhancements

• Ability to interconnect subsystems, which individually use different streaming engines.

• Embedded complex event processing engine enhanced for high-availability support.

• Built-in operators for predictive models including inline model-test feature.

• Additional support for industry standard message queue systems, including Amazon Kinesis and Simple Storage Service (S3), Apache ActiveMQ, IBM MQ and TIBCO.

• Enhanced self-service, real-time dash-boarding with editable widgets for various chart types.

• Multi-tenancy controls with the ability to restrict resources for specific tenants and pipelines.

• Ability to create multiple versions of real-time pipelines and choose the active version.

• Rich array of real-time data processing functions for string, time, date, numeric and other data types.

• Code-free enrichment and blending of streaming data with static data with lookups and MVEL expressions.

• Extensibility of stream-processing operators and libraries with user-defined functions.


Apache Storm

Apache Storm is a free and open source distributed realtime computation system.

Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can

be used with any programming language, and is a lot of fun to use!

Stream Analytix Solution with Apache


Ease of Development

A powerful visual designer interface makes it extremely easy to build applications quickly using built-in operators.

Abstraction over Complex Technologies

Lets you focus on your business logic rather than worrying about the underlying infrastructure.

Apache kafta

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is

designed to allow a single cluster to serve as the central data backbone

for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.



http://streamanalytix.com 720 University Avenue Suite 130 Los Gatos, CA 95032 4082133310 info@streamanalytix.com


Related documents

It will ensure record of all rented machineries and equipment as per project with help of proper daily based data entry.. Resources includes mainly three

International Research Journal of Engineering and Technology (IRJET) e ISSN 2395 0056 Volume 02 Issue 04 | July 2015 www irjet net p ISSN 2395 0072 ? 2015, IRJET ISO 9001 2008

Predictive Analytics is a branch of “Advanced Data Analytics” that incorporates many concepts from data mining, machine learning, statistics and predictive

The two mainstream deep learning approaches are Deep Belief Networks (DBNs) and Convolutional Networks (CNNs) are regarded as two basic mainstream approaches of deep

Our highly skilled Color Specialists will carry out an in-depth bespoke complimentary color consultation; advising you on the colors and techniques most suitable for you and your

CIS 695: Practicum in Data Analytics and Big Data Processing (In Spring 2016) CIS 660: Data Mining Techniques from Database, Statistics and Machin Learning EEC 525 Data Mining: Web

Hertel and Martin (2008), provide a simplified interpretation of the technical modalities. The model here follows those authors in modeling SSM. To briefly outline, if a

Second team honorees included junior catcher Mike Meeuwsen of Grand Rapids, Mich., and sophomore second baseman Matt Klein of DeWitt, Mich.. Ruby and Labbe were also named

Alternatively, does the rise of health care sector regulation that applies to for-profit and charitable participants alike make it even more important for

I We also consider a noisy variant with results concerning the asymptotic behaviour of the MLE. Ajay Jasra Estimation of

allocation across application needs, (ii) index management to facilitate indexing of data on flash, (iii) storage reclamation to handle deletions and reclamation of storage space,

The constituents are selected from a universe comprised of constituents trading on the Toronto Stock Exchange (the “TSX”) and are classified as Canadian securities by

In fact, more than one third of white students and one fi fth of Asian students in New York City attend an elementary school on the ladder to prosperity, compared to just 5% of

From 1990 through 1999 almost 3.2 billion guilders from the Netherlands’ budget for development assistance were spent on relief of the external debt of developing countries. A

In line with theoretical predictions, and despite sample size limitations, Column 1 shows a highly statistically signi…cant hump-shaped relationship between the timing of the

We extend the user-to-user setting to consider a group of users that consists of a sender (e.g., the group administrator) and multiple receivers (e.g., all other group members):

The present study aimed to understand what attributes are related to image and brand management, as well as perceived quality aspects associated with customer satisfaction..

This work studies the electrogeneration and electrochemical properties of polypyrrole/polyvinylsulfate (PPy/PVS) films after having been removed from the metal electrode over

In our scheme we use a ringed layout algorithm where the parent node is represented by a circle as described in RINGS technique and all the children are distributed inside the

The role of the Director is to work with the Executive to promote the vision, mission and goals of the NZPsS with particular emphasis on ethical and professional practice issues


family of machine learning techniques for living analytics with big data. • Present three online learning