• No results found

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

N/A
N/A
Protected

Academic year: 2022

Share "TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Pythian White Paper

ABSTRACT

As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify its storage, management, and analysis. The need to quickly access large amounts of data and use them competitively poses a technological challenge to organizations of all sizes.

By engaging technical experts to help them streamline complex data sources, companies can apply flexible and sophisticated technologies to extract deeper business insights, all while cutting operational costs.

This paper explores the impact of big data on today’s organizations, its challenges, and outlines the technical and business advantages of effectively managing big data, regardless of quantity, scope or speed.

INTRODUCTION

Every minute, about 2 terabytes of data are being generated globally. That’s twice the amount from three years ago and half the amount predicted for three years from now.

Added to their volume, the sources of data and the shape they take vary broadly. From government records, business transactions and social media, to scientific research and weather tracking, today’s data come in text, graphics, audio, video, and maps.

TAMING THE BIG

CHALLENGE OF BIG DATA

MICROSOFT® HADOOP

(2)

Source: Data Management for BI Research Brief. Aberdeen Group, January 2013.

If you further add big data’s growth rate to the mix, data management becomes even more complex and challenging for many mid-to-large size companies who harness data to stay competitive.

Unlike traditional data, much of today’s big data is large, unstructured, and dynamic. Companies that don’t have the in-house expertise to select and deploy an effective solution risk allocating resources that aren’t up to task, driving up operational costs and introducing inefficiencies that could have been avoided.

By contrast, companies that invest in agile, scalable, and robust data management solutions can successfully navigate the world of big data and consistently reap its benefits. Making decisions supported by hard data improves

their ability to both respond to current customer demands, and to take advantage of evolving market dynamics. But to get the most return on their big data investment, organizations must be able to access and analyze the data quickly, while ensuring its accuracy and security.

A GROWING BUSINESS ADVANTAGE

An Aberdeen Group survey of 125 companies that measured their performance results across four key data management metrics found that the year-over-year improvements for the top 20% best-in-class performers significantly outperformed the results of the remaining companies in all four categories. Their success could be attributed to several factors, including staff recruitment and training, leadership support, and proactively researching and adopting the latest technologies.

Investing in technology helps organizations spend less time searching for data and more time on value-boosting business activities.

(3)

THE BIG DATA CHALLENGE

Traditional database management tools were not designed to handle the elements that make big data so much more complex—namely its key differentiators: volume, variety, and velocity1.

Volume is the quantity of data, variety refers to the type of data collected (image, audio, video, etc.), and velocity is its expected growth rate. Many people assume that big data always includes high volume and intuitively understand the challenges that it presents. In reality, however, data variety and velocity are much more likely to prevent traditional management tools from being able to efficiently capture, store, report, analyze, and archive the data, regardless of volume.

Multiple technologies now exist for managing big data, including Apache® Hadoop, NoSQL, and Massively Parallel Processing (MPP). Choosing the best one for a particular organization depends on the type of big data they’re working

with and the organization’s specific business requirements.

The next section of this paper focuses on Hadoop MapReduce, an open source, Java- based framework developed by Apache “for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in- parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.”3

As a proprietary-free solution that doesn’t require expensive hardware, Hadoop offers a practical, scalable, and cost-cutting data processing solution.

MICROSOFT AND HADOOP

Hortonworks Data Platform (HDP) for Windows was released in May 2013, making Microsoft one of the leading players in the big data/Hadoop marketplace.

The Microsoft Hadoop solution gives organizations the ability to store and analyze both structured and unstructured data with greater usability, performance, and accuracy. The solution includes:

• An Apache-compatible distribution of enterprise-ready Hadoop for Windows Server

• HDInsight, a cloud-based Hadoop offering for Windows Azure

• Microsoft Analytics Platform System (APS)

• Integrated business intelligence tools

1. Source: 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies. Meta Group, 02/06/2001.

http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf 2. Source: $16.1 Billion Big Data Market: 2014 Predictions From IDC And IIA, Forbes Magazine, 12/12/2013.

HOW BIG IS THE BIG DATA MARKET?

“IDC predicts that the market for big data will reach $16.1 billion in 2014, growing 6 times faster than the overall IT market.”

— Forbes Magazine2

(4)

HDInsight, the cloud-based service, is built on HDP and provides real-time insight into any volume and variety of data, structured or unstructured.

Analytics Platform System (APS), available in SQL Server 2012 and later versions, uses Polybase to combine relational and non-relational data. Data can also be migrated from SQL Server into APS.

SQL SERVER/HADOOP CONNECTORS

Cloudera’s Sqoop is an open source framework for connecting multiple RDBMS with the Hadoop Distributed File System (HDFS). Microsoft has developed two versions of its Hadoop connector, one for SQL Server and the other for APS.

They are both command line tools to transfer data between SQL Server and Hadoop cluster nodes. The connectors, which support multiple data types, import and export data between SQL Server and HDFS, execute queries, and much more. The connectors are now integrated directly into Sqoop. HDInsight includes them as well, eliminating the burden of installing and configuring multiple connectors.

BUSINESS BENEFITS OF MICROSOFT’S HADOOP SOLUTION

Using the Microsoft Hadoop solution, organizations can:

IMPROVE AGILITY.

Because companies now have the ability to collect and analyze data practically in real time, they can more quickly discover which business strategies are working and which are not, and make adjustments as necessary.

INCREASE INNOVATION.

By integrating structured and unstructured data sources, the solution provides decision makers with greater insight into all the factors affecting the business, encouraging new ways of thinking about opportunities and challenges.

REDUCE INEFFICIENCIES.

Data that currently resides in conventional data management systems can be migrated into APS for faster information delivery.

4. Source: Business Intelligence Overview. PowerPoint Presentation, Microsoft Download Center.

http://download.microsoft.com/download/B/3/2/B3299FFA-620F-4658-B5BF-0F05B8A23585/SQL%202014%20BI_MarcSch.pptx 5. Source: Microsoft Big Data Solution SQL Server, Apache Hadoop and Windows Azure. Microsoft Developer Network Blog, 11/11/11

http://blogs.msdn.com/b/uk_faculty_connection/archive/2011/11/20/microsoft-big-data-solution-sql-server-apache-hadoop-and-windows-azure.asp

Source: Microsoft4

Source: Microsoft Developer Network Blog 5

(5)

BETTER ALLOCATE IT RESOURCES.

The Microsoft Hadoop solution includes a powerful, intuitive interface for installing, configuring, and managing the technology, freeing up IT staff to work on projects that provide higher value to the organization.

DECREASE COSTS.

Previously, because of the inability to effectively analyze big data, much of it was dumped into data warehouses on commodity hardware, which is no longer required thanks to Hadoop.

ACHIEVING MAXIMUM ROI FROM THE MICROSOFT HADOOP SOLUTION

With big data playing an increasingly important role in business performance, companies need to think strategically about their data management needs, both in terms of the technology and the skillsets required to work with it. The Microsoft Hadoop solution can deliver many benefits to organizations, especially those already using Microsoft’s conventional data management tools.

However, very few companies currently have the in-house expertise needed to build a Hadoop

production cluster and integrate it with their existing data infrastructure.

Because of Pythian’s unique expertise with both Hadoop and traditional relational databases, our specialists can help large and mid-sized businesses implement an effective Hadoop solution right from the start. We help organizations by:

• Setting up a proof of concept to demonstrate to key stakeholders the business benefits they can achieve with Hadoop.

• Providing technical guidance ranging from analyzing data that was previously impractical to analyze, to speeding up existing Extract-Transform-Load (ETL) processes.

• Supporting the production deployment, including cluster planning, capacity sizing, integration into existing systems and workflows, and defining and implementing new operational processes.

CONCLUSION

Organizations have more access to data than ever before. Because big data is a driving force for decision-making, simplifying its management is pivotal to their success.

Microsoft Hadoop is a viable option for organizations wishing to unify their data management and harness both structured and unstructured data effectively. With external experts to help them build and tailor this solution to their specific needs, organizations can drive innovation with in-depth data, increase operational efficiencies, and reduce uncertainty.

WHO’S BENEFITING FROM HADOOP TODAY?

These are some of the companies currently employing a Hadoop solution for their big data management.

Microsoft Platform:

Daiichi Sankyo (DSI), Halo 4, Klout, Yahoo

Other Platforms:

Adobe, AOL, Facebook, LinkedIn, Yahoo

(6)

ABOUT PYTHIAN

Founded in 1997, Pythian is a global leader in data consulting and managed services, specializing in optimizing and managing mission-critical data systems. Pythian blends the world’s leading data experts with advanced, secure service delivery processes to create the industry’s best standard of care for its clients. Since its inception, Pythian has managed some of the world’s largest, most business-critical data infrastructures for companies such as Toyota, Urban Outfitters, Huffington Post, and American Apparel. Pythian currently handles over 10,000 systems and employs more than 325 people worldwide. For the third consecutive year, Pythian ranks among Canada’s fastest growing companies, securing a place on PROFIT Magazine’s PROFIT 200 list as a Canadian innovator, trailblazer, and job creator. Learn more about Pythian and its elite data experts at www.pythian.com, follow @Pythian, and find Pythian on LinkedIn at http://linkd.in/pythian.

Produced in Canada by The Pythian Group Inc.

Pythian, The Pythian Group, “love your data”, and pythian.com are trademarks of The Pythian Group Inc.

References

Related documents

We hope you will join the Cancer Support Community Redondo Beach at our 18th Annual Celebrate Wellness – A Food & Wine Tasting Event in the Garden at the South Coast

Association Conference, Winter 2003. "Option Pricing with Stable Hyperbolic Functions" featured papers and discussions by seven CCNY students. Co-chair and

mononematous, macronematous, rarely semimacronematous, septate, straight or flexuous, geniculate at upper part, cell size decreasing towards apex, irregularly branched, cell

Nearly 250 public radio stations air this national classical music service that operates 24/7, bringing the highest quality classical music programming to more than 2

Abnormal returns for all the nonconvertible bonds of a sample of acquirer firms that announced and completed a merger/acquisition between 1979 to April 30, 1998 and matched to

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

Performance of laser fluorescence for detection of occlusal dentinal caries lesions in permanent molars: an in vivo study with total validation of the sample.. Goel A, Chawla

In this paper, we present a matrix algebra for switch-level simulation similar in many respects to circuit nodal analysis, in that the matrix formulation has