• No results found

Data Solutions with Hadoop

N/A
N/A
Protected

Academic year: 2021

Share "Data Solutions with Hadoop"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Solutions with Hadoop

Reducing Costs using Open Source Software

Aaryan Gupta Darshil Shah Mark Williams

Contact:

Aaryan Gupta [email protected] Darshil Shah [email protected] Mark Williams [email protected] November  16,  2014  

(2)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 2  

CONTENTS

EXECUTIVE SUMMARY……… 3

INTRODUCTION……….. 4

PROBLEMS FACING CURRENT SYSTEM……… 4

ADVANTAGES OF HADOOP……… 5

COST BENEFIT ANALYSIS……….. 6

CONCLUSION……….. 7

(3)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 3  

EXECUTIVE SUMMARY

With the advent of the internet and subsequent rise of online banking, IT infrastructure has become the backbone of the modern finance and banking systems. Deutsche Bank’s current data warehouses are fragmented across many different legacy

systems that have been patch worked together over the past twenty years (du Preez, 2013). The amount of trading, operations, and finance data being created is expected to keep growing, and the legacy systems are struggling to handle the increased volume. It is also increasingly important to be able to generate reports for risk

assessments and audits using the most up to date information possible (Information for Success, 2012).

Updating these systems in the near future will be crucial if Deutsche Bank is going to remain competitive in the era of big data. Hadoop is open source data management software that increases data processing capacity without having to convert data from legacy systems (Hadoop Deployment, 2013). Switching to Hadoop will lead to reduced costs of database infrastructure investments in the future and create a system that will remain scalable in the long-term (Kurth & Wendt, 2013).

(4)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 4  

INTRODUCTION

Currently, Deutsche Bank holds the largest share of the foreign exchange market at 20.96% of all transactions (FX Poll Results, 2009). The enormous amount of data generated from these transactions needs to be stored in an efficient manner while still being readily accessible for analysis. Our current legacy system runs on large

mainframe servers that are very costly to expand and are barely able to keep up with the current level of data being generated (du Preez, 2013). We can continue to add servers to our legacy system, but this does not address the underlying issue of the lack of scalability that exists in our current system.

Many Fortune 500 companies are beginning to adopt the open source software Hadoop as their data-warehousing platform (Information for Success, 2012). Hadoop creates a platform that is simpler and cheaper to expand and much more cost

effective for daily operations. Breaking up the data into smaller sizes enables the system to distribute the data across cheaper commodity servers. Hadoop also creates a much more flexible system that is able to handle different types of data while storing them efficiently and enabling the system to handle more fault tolerance because of built-in redundancy (Hadoop for Enterprise). Moving from the current legacy system to Hadoop would be a huge step forward for Deutsche Bank in data warehousing and processing while reducing long-term costs.

(5)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 5  

ADVANTAGES OF HADOOP

There are many advantages of using Hadoop, but the four most relevant to Deutsche Bank’s needs are its cost effectiveness, scalability, flexibility, and fault tolerance.

Cost Effective - With the current system, any upgrades require a large

investment and a lot of time to implement. Hadoop clusters are inexpensive because they run on open source software that can be downloaded from the Apache Hadoop distribution for free. Hadoop cluster can be built using

commodity servers, which removes the dependency on large server hardware and further reduces costs. It also enables the use of parallel computing, which results in a decrease of the cost per terabyte of processing data (Hadoop for Enterprise).

Scalable - The size of

the server clusters are not an issue now, but we can add any number of nodes independent of the type of data we have. This increase in cluster processing power helps retrieve the data more efficiently while reducing the cost of further

expansion as shown in Figure 1 (Hadoop for Enterprise).

Flexible - Hadoop is able to work with any schema, meaning it can handle any

kind of data, structured or unstructured, from any source (Norris, 2013). The data can be joined and aggregated in many ways, making financial analysis and audits easier. This means Deutsche Bank’s servers will be able to deal with

Figure  1  

(6)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 6  

Fault Tolerant - Hadoop works on parallel processing. It replicates data to

other nodes in the cluster. When any node in the cluster fails, the system will automatically redirect its work to another node and continue its processing without any delay, so there would be no data loss due to node failure (Nemschoff, 2013).

(7)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 7  

CONCLUSION

With the current problems facing Deutsche Bank’s data infrastructure, it is critical to upgrade to a better and more efficient methodology. Hadoop provides a solution to integrate data scattered over multiple servers into a single cluster and organizes the data effectively by providing consistent structure. In addition to the multiple instances where data has been lost from servers, Hadoop solves this issue by providing

automatic redundancy. Parallel computing enables saving on various different nodes, which leads to data protection against hardware failures. The investment needed is minimal since Hadoop is open source. Compared to the high data warehousing costs the company is facing, Hadoop is capable of reducing server operation costs by 70%. The adoption of Hadoop as Deutsche Bank’s data warehouse software will reduce costs in the short term as well a reducing the costs of further server expansions due to its scalability. Switching from our current legacy systems will also reduce the amount of time spent auditing and generating risk reports. These factors enable our

employees to be more productive while reducing the long-term costs of infrastructure expansion and operation.

(8)

Deutsche Bank | 60 Wall Street NEW YORK, NY 10005 | www.db.com 8  

Dasteel, J. (2012, June 1). Information for Success. Retrieved November 16, 2014, from http://www.oracle.com/us/solutions/datawarehousing/dw-reference-booklet-1705275.pdf

du Preez, D. (2013, February 12). Deutsche Bank: Big data plans held back by legacy systems. Retrieved November 16, 2014, from

http://www.computerworlduk.com/news/applications/3425725/deutsche-bank-big-data-plans-held-back-by-legacy-systems/

Hadoop for Enterprise with IBM. (n.d.). Retrieved November 16, 2014, from http://www-01.ibm.com/software/data/infosphere/hadoop/enterprise.html

Integrating Hadoop into your Enterprise IT Environment. (2014, July 11). Retrieved November 16, 2014, from http://www.slideshare.net/MapRTechnologies/integrating-hadoop-into-your-enterprise-it-environment

Kurth, S., & Wendt, M. (2013). Hadoop Deployment Comparison Study. Retrieved November 16, 2014, from

http://www.accenture.com/sitecollectiondocuments/pdf/accenture-hadoop-deployment-comparison-study.pdf

Nemschoff, M. (2013, December 20). Big Data: 5 Major Advantages of Hadoop. Retrieved November 16, 2014, from http://www.itproportal.com/2013/12/20/big-data-5-major-advantages-of-hadoop/

Norris, J. (2013). Saving Millions through Data Warehouse Offloading to Hadoop. Retrieved November 16, 2014, from

http://www.snia.org/sites/default/files2/ABDS2013/presentations/MainStage/Jac kNorris_Saving_Missions_Hadoop.pdf

FX poll 2009: Euromoney’s 31st annual FX survey. (2009, May 6). Retrieved

November 16, 2014, from http://www.euromoney.com/Article/2191629/Whats-included-in-the-full-2009-FX-poll-results-Press-release.html  

References

Related documents

We base our so-called G-Net on a vehicle detection followed by a wheel localization phase on the cropped image of the vehicle, both based on a recurrent neural network [7, 11]

survival in the shade house output, in height and stem diameter, dry biomass weight of aerial part and root system, in vegetative propagation of teak by mini-cuttings rooting due to

Park University 17-411 Ellen Finley Earhart Nursing Program 8700 N.W.. MISSOURI APPROVED NURSING PROGRAMS

It also became evident that there is a link between the effectiveness of the portfolio management capability to be able to distinguish between different demand profiles for groups

ANSWER: See Addendum #2, responses to Questions 1 and 7. Page 14, B #3 How will the Access Center facilitate service on demand for SUD consumers? ANSWER: See Addendum #2, responses

The following documents, which have previously been published or are published simultaneously with this Base Prospectus and have been filed with the CSSF, or,

In 2011 he joined the Clean Energy team as an Investment Director where he manages the Ingenious Solar EIS and also the Renewable Energy EIS Fund, deploying retail investors’ Funds

In this paper, we provide a unified set of results concerning the existence, compar- ison, and computation of Bayesian Nash equilibria in a broad class of large games with