• No results found

E-Guide BRINGING BIG DATA INTO A DATA WAREHOUSE ENVIRONMENT

N/A
N/A
Protected

Academic year: 2021

Share "E-Guide BRINGING BIG DATA INTO A DATA WAREHOUSE ENVIRONMENT"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

DATA INTO A DATA

WAREHOUSE

(2)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

n many organizations

,

the growing

vol-ume and increasing complexity of data are

straining performance and highlighting

the limits of the traditional data

ware-house. Hadoop systems and related big data technologies are

pop-ping up alongside data warehouses to manage the flow of

unstructured and semi-structured data. This E-guide uncovers

how Hadoop and cohorts such as MapReduce and NoSQL

data-bases can complement existing data warehouse systems in the form

of logical or hybrid data warehouse architectures.

(3)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

DATA WAREHOUSING ARCHITECTURE TAKES LOGICAL

TURN IN BIG DATA ERA

Beth Stackpole

Hybrids are all the rage in automotive circles, but the term is also gaining cur-rency in data warehousing. A new style of hybrid or logical infrastructure, combining traditional enterprise data warehouses with emerging big data technologies, is being eyed to optimize how organizations process, manage and gain insights from their burgeoning stockpiles of both structured and unstruc-tured data.

The advent of the big data phenomenon initially prompted visions of a separate, and maybe even dominant, data management environment for un-structured information. Some analysts and big data proponents went so far as to predict the demise of the enterprise data warehouse (EDW) as a central stockpile of business intelligence (BI) and analytics data. Now, though, the expectations have shifted: Instead of one technology replacing another, co-existence between the EDW, standalone analytical databases and big data sys-tems such as Hadoop clusters and NoSQL databases is likely to be the name of

(4)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

the data warehousing architecture game going forward.

"Within the hybrid data ecosystem that we're dealing with today, the data warehouse is no longer the center of our data needs," said Shawn Rogers, vice president of research for BI and data warehousing at consultancy Enterprise Management Associates Inc. in Boulder, Colo. But, he added, the EDW "will continue to play a pivotal role in terms of storing and supplying information to make business decisions." There are various data resources that companies can make use of for BI and analytics, "and big data will just be one of them," Rogers said.

The way Rogers and other data warehousing analysts see it, big data en-vironments typically will become an extension of the EDW, with processing workloads and data storage matched to the appropriate resource pool depend-ing on business requirements and the type of data involved. For example, in many companies the EDW will continue to be the place where well-integrated and high-quality structured transactional data is accessed to enable business reporting and ad hoc querying along well-defined dimensions. Big data systems will be used to store, process and analyze rawer data -- primarily unstructured or semi-structured information, such as social media posts and activity re-ports, Internet clickstream data and machine-generated data captured from

(5)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

application and Web server logs, network monitoring devices and sensors.

BUY OR BUILD YOUR OWN ON BI AND ANALYTICS

"Think of the EDW as the retail store where people pick up data that's organized and packaged and ready for them to accept," explained Ron Bodkin, president of Think Big Analytics, a consulting and professional services firm in Moun-tain View, Calif., that focuses on big data analytics and other forms of advanced analytics. "The big data environment becomes the factory where people go in and work with raw materials to create new things and experiment to find out what's valuable."

In a research report released last year, Gartner Inc. analysts Mark Beyer and Donald Feinberg disputed the notion that the advent of big data marked the end of the road for the enterprise data warehouse, predicting instead that the EDW would morph into what Gartner is calling the logical data warehouse (LDW). In the report, they said that instead of focusing on the physical data warehousing infrastructure, the LDW concept is centered on data processing and management logic.

In an interview, Beyer characterized the LDW as an information manage-ment and access engine more so than a data repository -- a scenario that he

(6)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

said requires a complete rethinking of how data is managed and where in a company's technology architecture different types of data should be processed to best support transformation, integration and analysis processes.

"The shift moves the focus from being a repository first and a data services engine second to being an information services platform first and a repository that's just one way to [manage and store data]," Beyer said. In an LDW setup, he explained, processing would take place in a separate data management layer as opposed to the traditional manner of doing that within individual systems.

EDW-BIG DATA MIX MEANS MAJOR CHANGES

That kind of extended data warehousing architecture can provide organiza-tions with much greater flexibility for orchestrating the storage and use of their data assets, according to Gartner -- but companies likely will have to make big changes to implement the new approach.

"The basic premise behind the new data warehouse is that it will combine the strengths of every engineering approach previously used to create a variety of architectural styles into a new model that supports easy switching between styles or a hybrid of diverse delivery approaches," Beyer and Feinberg wrote in their report. "Existing architectures must be altered radically to meet these

(7)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

new demands."

Most companies, even the early adopters of Hadoop and other big data technologies, are still in the formative stages of their big data management and analytics initiatives. At this point, many are feeling their way around an inte-gration strategy in an effort to keep their EDW, analytical database and big data environments from becoming separate data silos that don't adequately serve the information needs of the business.

The real challenge, Rogers said, lies with masking all of the integration complexity from business users, who ultimately just want to be able to access data that can help them make better and more informed decisions. "The issue is how to maintain a view for end users that makes it as transparent as possible, because they don't want to know whether they're talking to big data systems, Hadoop clusters or the EDW," he said. "They simply don't care."

BETH STACKPOLE is a freelance writer who has been covering the intersection of technology and business

(8)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

HADOOP HELPS BRING BIG DATA INTO A DATA

WARE-HOUSE ENVIRONMENT

Jack Vaughan

In many organizations, the growing volume and increasing complexity of data are straining performance and highlighting the limits of the traditional data warehouse. IT and data management professionals can respond by tweaking and tuning existing system implementations, but the rush to incorporate a va-riety of unstructured information into the data warehouse environment may call for new technologies that help power big data analytics applications.

In particular, Hadoop systems and related big data technologies are pop-ping up alongside data warehouses to manage the flow of unstructured and semi-structured data, including Web server and other system and network log files, text data, sensor readings and social network activity logs. Hadoop and cohorts such as MapReduce and NoSQL databases can complement data warehouse systems in such cases, creating what analysts describe as a logical or hybrid data warehouse architecture that puts processing workloads on the platforms best able to handle them.

(9)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

The available building blocks for high-performance business intelli-gence (BI) and data warehouse environments also include a selection of other technologies that can fill specific roles -- for example, data warehouse ap-pliances, columnar databases and in-memory databases. Used together, the various tools can boost warehousing speed, but they also challenge an organi-zation's data architecture and integration skills.

''Architecture is becoming increasingly important. You thread everything together with architecture,'' said William McKnight, president of McKnight Consulting Group in Plano, Texas. Companies need to think about pushing some data warehouse workloads out to technologies that can better handle them, especially when unstructured or semi-structured data is involved, he said.

Most companies do see a need for warehousing speed, according to a report published by The Data Warehousing Institute (TDWI) in October 2012. Sixty-six percent of 278 IT professionals, business users and consultants surveyed by TDWI said getting high levels of performance from data warehouses and re-lated platforms was ''extremely important.'' Only 6% said performance wasn't a pressing issue for them.

(10)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

most from high-performance data warehousing were advanced analytics, cited by 62% of the people surveyed, and the use of big data for analytics, chosen by 40%.

GOING TO EXTREMES WITH HADOOP, BIG DATA

Report author Philip Russom, research director for data management at TDWI in Renton, Wash., wrote that Hadoop has come into prominence in no small part due to its ability to manage and process the extremes of big data. Massively parallel Hadoop clusters can scale out to meet the demands of ever-larger workloads, Russom said, adding that what he described as Hadoop's "data-type-agnostic file system'' makes it a better fit for unstructured and semi-structured data than relational databases are.

Yet Hadoop big data systems should be viewed as part of a larger picture, he asserted. ''Hadoop is a wonderful complement to the data warehouse, but no one that has worked their way through it would see it as a replacement for the data warehouse,'' Russom wrote in the report.

''Until Hadoop came along there really wasn't a good way to handle unstructured data,'' said Wayne Eckerson, director of the BI Leadership Research unit at TechTarget Inc., the Newton, Mass., parent company of

(11)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

SearchDataManagement.com. Organizations had to use text mining tools to parse the data into rows and columns and then load that into fields in a data warehouse, Eckerson said. But, he added, ''it was a two-step process, and a lot of people just didn't use it.''

Hadoop, MapReduce and related tools enable developers to automate the data parsing process, according to Eckerson and other consultants. In addition, a variety of Hadoop, data warehouse and data integration vendors have released software connectors that make it easier to transfer data between Hadoop and data warehouse systems.

Ben Harden, a managing director at consultancy CapTech Ventures Inc. in Richmond, Va., sees Web server logs as a good example of data that's best chan-neled to Hadoop to offload processing from conventional systems and improve the overall performance of a data warehouse environment.

SIDE-BY-SIDE ON BIG DATA

Instead of loading Web logs directly into a data warehouse, they can be stored on a Hadoop system and crunched there, Harden said. Aggregated results then can be fed into a relational model in the data warehouse for analysis by busi-ness users, he said. Again, that scenario places upstart Hadoop alongside the

(12)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

venerable data warehouse. ''The relational database doesn't go away,'' Harden said, adding that the ''hardcore processing'' of BI and analytics data still has to be done there.

''Everyone is suddenly very log-happy. That's where Hadoop comes in: We need a place to put this stuff, then we have to make sense of it,'' said Joe Caserta, president of Caserta Concepts LLC, a New York-based data warehouse consult-ing and trainconsult-ing company. He is also co-author -- with BI and data warehousconsult-ing consultant Ralph Kimball -- of The Data Warehouse ETL Toolkit.

Caserta and other consultants caution that there are still barriers to wider Hadoop use in data warehouse architectures. The open source technology requires advanced programming skills and can benefit from the addition of custom-built tools and functionality, they said. Moreover, Hadoop is a batch-oriented technology that doesn't intrinsically lend itself to real-time process-ing of big data. That has led to the use of a variety of advanced messagprocess-ing and event-oriented technologies to help Hadoop systems keep up with the rapid velocity of data updates, Caserta said.

Overall, though, the pieces are available to extend a data warehouse envi-ronment to deal with big data, said Colin White, president and founder of con-sultancy BI Research in Ashland, Ore. Nowadays, ''I don't think it's practical

(13)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

to put everything in the data warehouse,'' he said. ''The key will be to make all the different pieces work together.''

(14)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

FREE RESOURCES FOR TECHNOLOGY PROFESSIONALS

TechTarget publishes targeted technology media that address your need for information and resources for researching prod-ucts, developing strategy and making cost-effective purchase decisions. Our network of technology-specific Web sites gives you access to industry experts, independent content and analy-sis and the Web’s largest library of vendor-provided white pa-pers, webcasts, podcasts, videos, virtual trade shows, research reports and more —drawing on the rich R&D resources of technology providers to address market trends, challenges and solutions. Our live events and virtual seminars give you ac-cess to vendor neutral, expert commentary and advice on the issues and challenges you face daily. Our social community IT Knowledge Exchange allows you to share real world information in real time with peers and experts.

WHAT MAKES TECHTARGET UNIQUE?

TechTarget is squarely focused on the enterprise IT space. Our team of editors and net-work of industry experts provide the richest, most relevant content to IT professionals and management. We leverage the immediacy of the Web, the networking and face-to-face op-portunities of events and virtual events, and the ability to interact with peers—all to create compelling and actionable information for enterprise IT professionals across all industries and markets.

(15)

Home

Data warehousing architecture takes logical turn in big data era

Hadoop helps bring big data into a data warehouse environ-ment

References

Related documents

Nurses feel that both the software and the nurse are essential to clinical decision-making, and describe a process of ‘dual decision- making’, with the nurse as active decision

The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item.. Where records

human body can persist through death is equally a reason to suppose that a. human animal can persist through death, and any reason to deny

In this scenario total energy consumed is above the “target” energy demand for the transport sector for this scenario of 403 TWh (Table 3) and again, it was not possible to push

The 10 resident domains cluster into three groups : universal requirements for older people living in residential settings (privacy, the ability to personalise their

to the Convention for the Protection of Human Rights and Dignity of the Human Being with Regard to the Application of Biology and Medicine, on the Prohibition of Cloning Human

The role of dopamine in chemoreception remains to be fully established, but it is clear that stimulus evoked transmitter release from type I cells on to afferent nerve endings is a

The International Board held that any organisation composed of at least 75% of Business and Professional Women was eligible for membership in the Federation, and the Council of