• No results found

BIG DATA TOOLS. Top 10 open source technologies for Big Data

N/A
N/A
Protected

Academic year: 2021

Share "BIG DATA TOOLS. Top 10 open source technologies for Big Data"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

BIG DATA TOOLS

(2)

SO WHAT DO YOU DO WITH YOUR GOLD MINE OF INSIGHTS?

Happiest Minds presents TOP 10 open source technologies that are the best in the market to harness, analyze and make the most sense out of Big Data.

We are in an ever expanding marketplace!!!

With shorter product lifecycles, evolving customer behavior

and an economy that travels at the speed of light AND

Information (which we now have more than enough access

to) has gone on to be more about analytics and business

(3)

You simply can't talk about big data without mentioning

Hadoop

The Apache distributed data processing software is so pervasive that sometimes the terms "Hadoop" and "big data" get used

synonymously

Hadoop is known for the ability to process extremely large data in both structured and unstructured formats reliably replicating chunks of data to nodes in the cluster and making it available locally on the processing machine

(4)

If Hadoop is the big data mahout, MapReduce happens to be it’s lifeline

A programming model and software framework for writing applications, MapReduce works to rapidly process vast amounts of data in parallel on large clusters of compute nodes

Widely used by Hadoop, as well as many other data processing applications

(5)

GridGain is a Java based middleware for faster in-memory processing of Big Data in real time

GridGain is compatible with the Hadoop Distributed File System

Requires Windows, Linux or Mac OS X operating system GridGain offers an alternative

(6)

Developed by LexisNexis Risk Solutions, HPCC is short for "high performance computing cluster"

HPCC Systems delivers on a single platform, a single

architecture and a single programming language for data processing

Both free community versions and paid enterprise versions are available

(7)

Storm differs from other tools with it’s distributed, real-time, fault-tolerant processing system, unlike batch processing

systems of Hadoop

Real-time computation capabilities, it is fast and highly

scalable, often being described as the "Hadoop of real-time" Fault-tolerant and works with nearly all programming

languages, though typically Java is used

(8)

Cassandra is a highly scalable NoSQL database for massive data across multiple data centers and the cloud

Used by many organizations with large, active datasets, including Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco and Digg

Its commercial support and services are available through third-party vendors

Originally developed by Facebook, it is now managed by

(9)

HBase is the non-relational data store for Hadoop

Being a column-oriented database management system,

HBase is well suited for sparse data sets and is written in Java Supports writing applications such as Avro, REST and ThriftFeatures include:

 linear and modular scalability  strictly consistent reads and writes

automatic failover support and much more

Developed as part of the Apache Hadoop project, HBase runs on top of

(10)

MongoDB was originally developed by 10gen designed to support humongous databases

It's a NoSQL database written in C++ with document-oriented storage, full index support, replication and high availability and scales horizontally without compromising functionality Commercial support is available through 10gen

mongoDB literally comes from the term ‘humongous’ and is the most

(11)

Neo4j boasts performance improvements of up to 1000x or more versus relational databases

Stores data structured in graphs instead of tables and is a disk-based, fully transactional Java engine

Organizations can purchase advanced and enterprise versions from Neo Technology

Developed by Neo Technologies, this is the world’s leading graph

(12)

CouchDB stores data in JSON documents that can be accessed via the web or query using JavaScript

Offers distributed scaling with fault-tolerant storageKey featured include:

 On-the-fly document transformation  Real-time change notifications

 Easy-to-use web administration console

(13)

About Happiest Minds Technologies

Happiest Minds enables Digital Transformation for enterprises and technology providers by delivering seamless customer experience, business efficiency and actionable insights through an integrated set of disruptive technologies: big data analytics, internet of things,

mobility, cloud, security, unified communications, etc. Happiest Minds offers domain centric solutions applying skills, IPs and

functional expertise in IT Services, Product Engineering, Infrastructure Management and Security. These services have applicability across industry sectors such as retail, consumer packaged goods, e-commerce, banking, insurance, hi-tech, engineering R&D,

manufacturing, automotive and travel/transportation/hospitality.

Headquartered in Bangalore, India, Happiest Minds has operations in the US, UK, Singapore, Australia and has secured $ 52.5 million Series-A funding. Its investors are JPMorgan Private Equity Group, Intel Capital and Ashok Soota.

For more information, visit http://www.happiestminds.com

References

Related documents

The results suggest that financial development measured by broad money and domestic credit to private sector has a highly statistically significant negative effect on the

For both capital services and the capital stock, results are provided based on two different breakdowns of investment data: the 2-asset case drawing upon data for structures

On the single objective problem, the sequential metamodeling method with domain reduction of LS-OPT showed better performance than any other method evaluated. The development of

baccalaureate programs at both institutions simultaneously. The student must satisfy baccalaureate degree requirements at the University of Maryland Eastern Shore for a program of

This is achieved in four stages: (1) divide the study area into grid cells, within which it is assumed the spatial distribution of pollution is relatively homoge- neous, (2) define

We derived a general, analytical theory for any type of interaction that takes into account Beating & Mixing, and shown that our ‘Beating & Mixing’ theory correctly produces

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

Therefore, various laboratory equipment used in learning media with the help of ICT can be developed simulation application.. Particularly in the field of