• No results found

How to Do a Big Data Project

N/A
N/A
Protected

Academic year: 2021

Share "How to Do a Big Data Project"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

How to Do a Big Data Project

Big Data is sweeping the business world – and while it

FDQPHDQGL΍HUHQWWKLQJVWRGL΍HUHQWSHRSOHRQHWKLQJ

always rings true: Data-driven decisions and applications

create immense value by utilizing data sources to

GLVFRYHUSUHVHQWDQGRSHUDWLRQDOL]HLPSRUWDQWEXVLQHVV

insights.

While there is broad industry consensus on the value of

%LJ'DWDWKHUHLVQRVWDQGDUGL]HGDSSURDFKIRUKRZWR

EHJLQDQGFRPSOHWHDSURMHFW7KHPDQ\WRROVYHQGRUV

DQGWUHQGVLQWKHPDUNHWSODFHPXOWLSOLHGE\GL΍HUHQW

XVHFDVHVDQGSRWHQWLDOSURMHFWVFDQOHDGWRGHFLVLRQ

SDUDO\VLVΖQDGGLWLRQVRPHFRPSDQLHVPLVWDNHQO\IRFXV

RQWHFKQRORJ\ȴUVWLQVWHDGRIEXVLQHVVREMHFWLYHV

At Infochimps

, we want to share

our experiences after guiding

many enterprises through

successful Big Data projects. This

project guideline should empower

you to tackle the discussion

and decide on build versus buy

when it comes to achieving your

defined business objectives across

various technical environments.

All of these factors put every Big Data project at risk. In a

recent survey of IT professionals

it was found that

QHDUO\RI%LJ'DWDSURMHFWVGRQȇWJHWFRPSOHWHG7KHVDPHPHWULFIRUΖ7SURMHFWVLQJHQHUDOLVRQO\

:KLOHKRZ\RXPDQDJH\RXU%LJ'DWDSURMHFWZLOOYDU\GHSHQGLQJRQ\RXUVSHFLȴFXVHFDVHDQGFRPSDQ\

SURȴOHWKHUHDUHNH\VWHSVWRVXFFHVVIXOO\LPSOHPHQWD%LJ'DWDSURMHFW

1.

2.

3.

4.

'HȴQLQJ\RXUEXVLQHVVXVHFDVH

ZLWKFOHDUO\GHȴQHGREMHFWLYHVGULYLQJ

business value.

3ODQQLQJ\RXUSURMHFW

- a well managed plan and scope will lead to success.

'HȴQLQJ\RXUWHFKQLFDOUHTXLUHPHQWV

- detailed requirements will ensure you

build what you need to reach your objectives.

&UHDWLQJDȊ7RWDO%XVLQHVV9DOXH$VVHVVPHQWȋ

- a holistic solution

comparison will take the politics (and emotion) out of the choices.

(2)

Define Your Business Use Case

$VHQWHUSULVHVH[SORUH%LJ'DWDWKHEXVLQHVVGULYHUVYDU\ZLGHO\IURPUHYHQXHJURZWKWRPDUNHWGL΍HUHQWLDWLRQ :HȇYHVHHQFRPSDQLHVUHDOL]HWKHPRVWVLJQLȴFDQWEHQHȴWVIURP%LJ'DWDSURMHFWVZKHQWKH\VWDUWZLWKDQ inventory of business challenges and goals and quickly narrow them down to those expected to provide the highest return.

1.

At Infochimps

, we have seen Big Data business objectives ranging from “creating and

deploying a SaaS application product” to “increasing revenue by providing the sales team

prioritized leads leveraging customer service data.”

HIGH IMPACT T I M E T O I M P L E M E N T BUSINESS V ALUE GA IN MOD. IMPACT MOD. IMPACT LOW IMPACT

Cost & Risk Mitigation New Customer Facing Software Product

{

{

Customer Insights Engine

{

{

Customer Risk Algorithm

{

{

{

Enhanced Analytics for Supply Chain

{

New Product Delivery Channel

{

{

Executive Financial Dashboard

{

{

Market Differentiation Product/Service Quality

Revenue Growth

New Customer Facing Software Product

{

{

Customer Insights Engine

{

{

Customer Risk Algorithm

{

{

Enhanced Analytics for Supply Chain

{

{

New Product Delivery Channel

{

{

Executive Financial Dashboard

{

{

(3)

In order to explore your organization’s expectations of Big Data, we recommend answering these

TXHVWLRQVȴUVW

• What is the project goal?

• What direction is the business headed? • What are the obstacles to getting there?

• Who is are the key stakeholders and what are their roles?

• :KDWLVWKHȴUVW%LJ'DWDXVHFDVHGHWHUPLQHGE\NH\VWDNHKROGHUV"

7KHVHTXHVWLRQVEXLOGDJRRGIRXQGDWLRQIRU\RXUSURMHFW7KHPRUHVSHFLȴFDQGFRQQHFWHGWR\RXUEXVLQHVVJRDOV \RXUDQVZHUVDUHWKHPRUHOLNHO\\RXUSURMHFWZLOOVXFFHHG7KHVHDUHWKHW\SHVRIGLVFXVVLRQSRLQWVZHDVNRXU FOLHQWVZKHQȴUVWOHDUQLQJDERXWWKHLU%LJ'DWDSURMHFWV+HUHDUHPRUHVSHFLȴFTXHVWLRQVWKDWHODERUDWHRQHDFK foundational point.

What Direction is the Business Headed?

• Determine the company’s high level objectives and how Big Data can support these objectives. • Understand the company’s perception of Big Data and any relative historical context.

• ΖGHQWLI\WKHSUREOHPDUHDVXFKDVPDUNHWLQJFXVWRPHUFDUHRUEXVLQHVVGHYHORSPHQWDQGWKHPRWLYDWLRQV behind the project.

• Describe the problem and obstacles in non-technical terms.

• Inventory any solutions and tools currently used to address the business problem. • Weigh the advantages and disadvantages of the current solutions.

• Navigate the process for initiating new projects and implementing solutions.

Identify Stakeholders and Business Use Case

• ΖGHQWLI\WKHVWDNHKROGHUVWKDWZLOOEHQHȴWIURPWKH%LJ'DWDSURMHFW

• Interview individual stakeholders to determine their project goals and concerns. • 'RFXPHQWVSHFLȴFEXVLQHVVREMHFWLYHVGHFLGHGXSRQE\NH\GHFLVLRQPDNHUV • Assign priorities to the business objectives

• Architect a business use case.

At Infochimps

, we create these types of business use cases:

Our company intends to generate insights on how customers interact with their customer service portal and improve their services for happier customers and more streamlined business.

Faster path to ROI with both tech and services Ability to prove the value of Big Data internally Scalability to more data sources and use cases

Support Clickstreams &

Downloads

User Generated

Content BI Queries & Dashboards Customer Behavior Data Science TM C L O U D F O R B I G D A T A

(4)

At Infochimps

, we typically see project teams that include these roles: Executive Sponsor,

Technical Sponsor, Project Manager, Architect, Data Engineer, Data Scientist/Analyst, Lead

Test, Lead QA, Lead Developer, IT Lead.

([DPSOH%XVLQHVV8VH&DVH&KHFNOLVW

Plan Your Project

7KLVLVZKHUHWKLQJVJHWVSHFLȴF$VDUHVXOWRI\RXUUHVHDUFKDQGPHHWLQJV\RXPRVWOLNHO\KDYHDQHEXORXV

REMHFWLYHOLNHȊUHGXFLQJFXVWRPHUFKXUQȋ7KLVVHFWLRQLQWHQGVWRFRQVWUXFWDFRQFUHWHDQGVSHFLȴFREMHFWLYHDJUHHG upon by the project sponsors and stakeholders.

Specify expected goals in measurable business terms. Identify all business questions as precisely as possible. 'HWHUPLQHDQ\RWKHUTXDQWLȴDEOHEXVLQHVVUHTXLUHPHQWV

'HȴQHZKDWDVXFFHVVIXO%LJ'DWDLPSOHPHQWDWLRQZRXOGORRNOLNH

7KHJRDOPD\QRZEHFOHDUEXWKRZZLOO\RXNQRZRQFH\RXȇYHDFKLHYHGLW"ΖWȇVLPSRUWDQWWRGHȴQHZKDWEXVLQHVV success for your Big Data project looks like before proceeding further.

2.

Item

XYZ Bank

Executive Sponsor: SVP IT

Pilot (Yes/No - how long): Yes (3 months)

Budget (Pilot/Production): $100,000 (Pilot) - $500,000 (Production)

Project Lead: Senior Architect

Use Case: Information security, account security, ID

theft monitoring and prevention

Success Criteria: faster reporting, more robust reporting,

improved operation efficiency

Company Objective: Enhance internal audit & reporting capabilities

Determine the Project Team

• ΖGHQWLI\WKHSURMHFWȊVSRQVRUȋWRUHPRYHREVWDFOHVȴQGWKHEXGJHWSURYLGHRUJDQL]DWLRQDOVXSSRUWDQG champion the cause.

• (VWDEOLVKWKHSURMHFWPDQDJHUDQGWKHWHDP'HȴQHWKHUROHVDQGUHVSRQVLELOLWLHVRIHDFKWHDPPHPEHU • Understand the team’s availability and resource constraints for the project.

(5)

6HW6SHFLȴF2EMHFWLYH6XFFHVV&ULWHULD

:KHQGHWHUPLQLQJVXFFHVVFULWHULDLWȇVLPSRUWDQWWRSLFNFULWHULDWKDWDUHPHDVXUDEOHVXFKDVDVSHFLȴFNH\ performance metric.

The following are tasks and considerations you can use to ensure you have properly captured the success criteria: • $VSUHFLVHO\DVSRVVLEOHGRFXPHQWWKHVXFFHVVFULWHULDIRUWKLVSURMHFW

• 0DNHVXUHHDFKLGHQWLȴHGEXVLQHVVREMHFWLYHKDVDPHDVXUDEOHFULWHULDWKDWZLOOGHWHUPLQHLIWKDWREMHFWLYHKDV been met successfully.

• Share and gain approval of your business success criteria among key stakeholders.

Tie the Project Plan to Success

• 'HWHUPLQHSURSHUVFRSHVSHFLȴFDOO\ZKDWLVLQFOXGHGDQGZKDWLVQRWLQFOXGHG • Develop a rough budget.

• 6HWDWLPHOLQHDQGVXFFHVVIXOPLOHVWRQHVDWPRQWKVPRQWKVDQGD\HDU

At Infochimps

, we like to see an objective like “Leveraging data from our CRM, Customer

Support and Finance applications and using cloud architecture, create an application to score

customers on a ranking scale, based on the likelihood we’ll lose their business. The app will

be used by the Account Services team, who will employ a special customer service strategy to

increase retention, thereby reducing churn.”

At Infochimps

, we use a Gantt chart template as a starting point for Big Data

implementations:

(6)

Define Your Technical Requirements

The technical requirements phase involves taking a closer look at the data available for your Big Data project. This step will enable you to determine the quality of your data and describe the results of these steps in the project documentation.

&XUUHQW7HFKQLFDO(QYLURQPHQW

ΖWȇVLPSRUWDQWWRXQGHUVWDQGZKDWWRROVDUHXVHGDQGWKHDUFKLWHFWXUHWKH\DUHXVHGLQDVLWVLWVWRGD\ • Inventory all tools used today.

• Sketch the current architecture.

Identify Data Sources

Consider what data sources you’ll need to take advantage of. Most data that’s relevant to a production application’s XVHLVJRLQJWRFRPHIURPDOLYHVWUHDPRUIHHG

Existing data sources. 7KLVLQFOXGHVDZLGHYDULHW\RIGDWDVXFKDVWUDQVDFWLRQDOGDWDVXUYH\GDWD:HEORJV

etc. Consider whether your existing data sources are enough to meet your needs.

Purchased data sources.'RHV\RXURUJDQL]DWLRQXVHVXSSOHPHQWDOGDWDVXFKDVGHPRJUDSKLFV"ΖIQRW

consider whether something like a Gnip or Datasift social media and news stream would complement your current data to create additional project value.

Additional data sources.ΖIWKHDERYHVRXUFHVGRQȇWPHHW\RXUQHHGV\RXPD\QHHGWRFRQGXFWVXUYH\VRU

begin additional tracking to supplement your existing data stores.

:KHQH[DPLQLQJ\RXUGDWDVRXUFHVDVN

• Which attributes from the database(s) seem most promising? • Which attributes seem irrelevant and can be excluded?

• Is there enough data to draw generalizable conclusions or make accurate predictions? • Are there too many attributes for your analytics method of choice?

• $UH\RXPHUJLQJYDULRXVGDWDVRXUFHV"ΖIVRDUHWKHUHDUHDVWKDWPLJKWSRVHDSUREOHPZKHQPHUJLQJ" • +DYH\RXFRQVLGHUHGKRZPLVVLQJYDOXHVDUHKDQGOHGLQHDFKRI\RXUGDWDVRXUFHV"

ΖWLVHVVHQWLDOWRDYRLGFRPPRQSLWIDOOVOLNHVFRSHFUHHSXQFOHDUJRDOVSDVVLYHRUQRQH[LVWHQWSURMHFW PDQDJHPHQWSRRUFRPPXQLFDWLRQVWDUWLQJWRRVPDOOHWF

Why do Big Data projects fail 30% more often than other IT projects?

%\IDUWKHELJJHVWUHDVRQZK\WKHVHSURMHFWVIDLOHGZDVLQDFFXUDWHVFRSH5HTXLUHPHQWVH[SDQGHGRXWRI SURSRUWLRQRUWKHUHZDVQȇWDȴUPVHWRISURMHFWREMHFWLYHV:LWKRXWKDYLQJȴUPVXFFHVVFULWHULRQSURMHFWV FRQWLQXHWRH[SDQGZLWKRXWUHDOLJQHGWLPHOLQHVIDLOHGWRGHPRQVWUDWHSRVLWLYHUHWXUQRQLQYHVWPHQWRULQWKH worst case -- failed to meet objectives and provide business value.

$QRWKHUNH\SRLQWWKDWFDPHXSZDVODFNRIFRRSHUDWLRQEHWZHHQGHSDUWPHQWV%\QDWXUH%LJ'DWDLVQRWMXVW JRLQJWRKHOSRQHVWDNHKROGHUΖWȇVRIWHQVRPHWKLQJWKDWLPSURYHVSHUIRUPDQFHIRUDORWRIGL΍HUHQWSDUWLHV VXFKDV\RXUΖ7WHDP\RXUDSSOLFDWLRQGHYHORSHUV\RXUGDWDVFLHQWLVWVDQGDQDO\VWVDQGSHRSOHDFURVVWKH organization from line of business managers to executives.

)RUPRUHRQWKLVUHSRUWVHHhttp://www.infochimps.com/resources/white-papers/cios-big-data

3.

(7)

At Infochimps

, we utilize the following template to gather and compile data sources,

documenting each of the dimensions of a data source.

7KHUHDUHPDQ\ZD\VWRGHVFULEHGDWDEXWPRVWGHVFULSWLRQVIRFXVRQWKHTXDQWLW\DQGTXDOLW\RIWKHGDWD/LVWHG below are some key characteristics to address when describing data:

Volume of data.)RUPRVWDQDO\WLFDOWHFKQLTXHVWKHUHDUHWUDGHR΍VDVVRFLDWHGZLWKGDWDVL]H/DUJHGDWDVHWV

FDQSURGXFHPRUHDFFXUDWHPRGHOVEXWWKH\FDQDOVRLQFUHDVHSURFHVVLQJWLPH

Velocity of data.7KHUHDUHDOVRWUDGHR΍VDVVRFLDWHGZLWKZKHWKHUWKHGDWDLVDWUHVWRULQPRWLRQVWDWLFRU

real-time). Velocity translates into how fast the data is created within any given period of time.

Variety of data.'DWDFDQWDNHDYDULHW\RIIRUPDWVVXFKDVQXPHULFFDWHJRULFDOVWULQJRU%RROHDQWUXHIDOVH

3D\LQJDWWHQWLRQWRYDOXHW\SHFDQSUHYHQWSUREOHPVGXULQJODWHUDQDO\WLFV)UHTXHQWO\YDOXHVLQWKHGDWDEDVHDUH UHSUHVHQWDWLRQVRIFKDUDFWHULVWLFVVXFKDVJHQGHURUSURGXFWW\SH)RUH[DPSOHRQHGDWDVHWPD\XVH0DQG)WR UHSUHVHQWPDOHDQGIHPDOHZKLOHDQRWKHUPD\XVHWKHQXPHULFYDOXHVDQG1RWHDQ\FRQȵLFWLQJVFKHPHVLQ the data.

Time to action.'DWDFDQEHXVHGWRWDNHLPPHGLDWHDFWLRQDVZHOODVEHVWRUHGIRUIXWXUHQRQWLPHFULWLFDO

DQDO\VLVΖWȇVLPSRUWDQWWRLGHQWLI\ZKLFKGDWDZLOOPRVWOLNHO\EHXVHGIRUUHDOWLPHDFWLRQVPVQHDUUHDO WLPHDFWLRQVVHFRQGVRUQRQWLPHFULWLFDODFWLRQVPLQXWHVWRKRXUV Name Example Data Source

Description of the data source Listener Type

HTTP Req./Stream, Syslog, Batch Upload, etc.

Data Fields (Attributes)

Data Type

JSON, XML, CSV/TSV/Delimited, Fixed Width, SQL, Binary

# of Channels

Either 1 or many; Gnip is 1 source with possibly 100s of diff. URLs to listen to

Data Velocity Avg. Events / Sec*

Max Sustained Events / Sec (10 min) 12-Month Growth Factor*

How much is volume expected to grow in next 12-mos.

Avg Event Size Time to Action

Real-time actions (<150ms), near real-time actions (seconds), or non-time critical actions (minutes to hours)

POS System

HTTP Request

timestamp, item, purchase price, item ID, inventory ID, purchase ID, customer ID, cashier ID JSON 1 125 500 5x 0.4KB

(8)

At Infochimps

, we recommend tracking how you plan to use and consume the data

generated by your data sources. Your data sources will eventually “feed” other processes in

your Big Data environment (e.g. directly to a customer application, to a RDBMS powering a BI

tool, a NoSQL data store, Hadoop, data archive, etc.)

Identify How You’ll Work with the Data

Consider what interfaces and tools are necessary for your company to work with your data sources. Infochimps SURYLGHVWKHDELOLW\WRFUHDWHFXVWRPDSSOLFDWLRQVDQGDQDO\WLFVXVLQJQDWLYH$3ΖVDQGWRROVDVSDUWRI+DGRRS GDWDEDVHVDQGVWUHDPSURFHVVLQJȂDVZHOODVDEVWUDFWHGDQGXQLȴHGLQWHUIDFHVIRULPSURYHGXVHUH[SHULHQFH ΖQIRFKLPSVDOVRSURYLGHVXVHUVZLWKWKHDELOLW\WRSURGXFHWDEOHVFKDUWVDQGRWKHUYLVXDOL]DWLRQHOHPHQWVXVLQJ %ΖWRROVVXFKDV%XVLQHVV2EMHFWV0LFURVWUDWHJ\&RJQRV7DEOHDX'DWDPHHURURWKHUVLPLODUWRROV6XFKYLVXDO DQDO\VHVFDQKHOSWRDGGUHVVWKH%LJ'DWDSURMHFWJRDOVGHȴQHGGXULQJWKHEXVLQHVVXQGHUVWDQGLQJSKDVH2WKHU WLPHVLWLVPRUHDSSURSULDWHWRXWLOL]HVWDWLVWLFDOWRROV56$663660DWODEHWFDQGSDFNDJHGDSSOLFDWLRQV&50 326(53HWF

• Who needs to work with the data? • What are their skills and techniques? • Will training be required?

• What tools do you currently have in your enterprise that you’d like to take advantage of? • Do those tools have Big Data connectors or proven interface methods?

• :KDWQHZWRROVFRXOGKHOSZLWK\RXUGDWDPLQLQJDQDO\VLVYLVXDOL]DWLRQUHSRUWLQJHWF" • +RZDQGZKHUHZLOOWKHGDWDEHVWRUHG"

• What are the reporting and visualization tools necessary to achieve success in your end users’ eyes?

Name

Example

End User

Data Scientist, Data Engineer Data Analyst, Business Analyst, Statistician, BI Specialist, LOB User,

Field Manager, Executive, etc

Data Analyst

End User Tool

Hive/Hue, SQL Server / MySQL / Oracle, Tableau, Cognos, Microstrategy, SAP Business Objects, SAS, SPSS, R, Excel, custom application, custom dashboard, etc.

Tableau

Analysis Activities

What is the end user doing as part of their analysis? Exploratory queries, simple data mashups and calculations, creating visual reports with Tableau’s report builder

(9)

Create a Total Business Value

Assessment

(YDOXDWH\RXURSWLRQVZLWKDȊ7RWDO%XVLQHVV9DOXH $VVHVVPHQWȋ7KLVPHDQVWKDW\RXSHUIRUPDWOHDVWD \HDUWRWDOFRVWRIRZQHUVKLSDQDO\VLVEXW\RXDOVR LQFOXGHWKLQJVOLNHWLPHWREXVLQHVVYDOXHHDVHRIXVH VFDODELOLW\VWDQGDUGVEDVHGDQGHQWHUSULVHUHDGLQHVV +RZHYHUEHIRUH\RXJHWVWDUWHGRQHYDOXDWLQJ\RXU VROXWLRQRSWLRQVLWLVLPSRUWDQWWRNQRZ\RXUȊEX\LQJ WHDPȋ%X\LQJWHDPVJHQHUDOO\FRQVLVWRIVWDNHKROGHUV from multiple organizational levels and sometimes PXOWLSOHGLYLVLRQVRXWVLGHRIΖ7$WDPLQLPXPWKHUH VKRXOGEHDQH[HFXWLYHVSRQVRUSURMHFWFKDPSLRQRU SURMHFWWHDPOHDGWHFKQLFDOGHFLVLRQPDNHUDQGDQ economic decision maker.

Your entire buying team needs to be involved in HYDOXDWLQJWKHRSWLRQV2SWLRQVVWDUWZLWKZKR\RXȇUH relying on to implement your project such as: doing LW\RXUVHOIZLWKLQWHUQDOUHVRXUFHVDQGRUOHYHUDJLQJ VRIWZDUHYHQGRUVZRUNLQJZLWKV\VWHPLQWHJUDWRUV GHSOR\LQJZLWKFORXGVHUYLFHVSURYLGHUVRUXVLQJ HPHUJLQJERXWLTXH%LJ'DWDFRQVXOWLQJȴUPV:H recommend looking at each implementation option and ZHLJKLQJWKHPDJDLQVW\RXUVSHFLȴFEXVLQHVVSULRULWLHV %XWGRQȇWIRUJHWWRLQFOXGH7LPHWR%XVLQHVV9DOXH(DVH RI8VH6FDODELOLW\6WDQGDUGVEDVHGDQG(QWHUSULVH 5HDGLQHVV$V\RXHYDOXDWHVROXWLRQVGRFXPHQWKRZ each solution performs on these and other important dimensions.

Time to Business Value.

As projects require VLJQLȴFDQWXSIURQWLQYHVWPHQWLWLVLPSRUWDQWWR understand how long before the solution will start generating value. As many of these projects can

RYHUH[WHQGYHQGRUDJUHHGWLPHOLQHVLWLVUHFRPPHQGHG to request information on similar scope projects that have already been completed for other clients to better determine whether the vendor delivers on set milestones. If your new Big Data application generates 0SHUPRQWKDQG\RXODXQFKPRQWKVDKHDGRI VFKHGXOHWKDWȇVZRUWK0:KDWȇV\RXUWLPHWR business value?

Ease of Use.:KLOHFRQVLGHULQJQHZ%LJ'DWDVROXWLRQV

it is imperative to consider how the solution will

D΍HFW\RXUQHHGWRDXJPHQWLQWHUQDOUHVRXUFHVIRU LPSOHPHQWDWLRQDQGRQJRLQJPDLQWHQDQFH6SHFLȴFDOO\ QHZUHVRXUFHVVXFKDVGDWDVFLHQWLVWVPD\EH

QHFHVVDU\WRKDQGOHDSURMHFWLQWHUQDOO\ΖQDGGLWLRQ corporate IT resources will be needed to maintain the project once implemented. Ease of use also translates into ease of integration within your internal technical infrastructure. This can have a big impact on time-to-business value.

6FDODELOLW\

Ideal solutions will expand with evolving business needs. While one use case may be the initial GULYHUWKHLGHDOVROXWLRQZLOOVXSSRUWIXWXUHXVHVFDVHV $OVRFRQVLGHUDQ\FRVWVDVVRFLDWHGZLWKVFDOLQJ6RPH VROXWLRQVR΍HULQLWLDOFDSDFLW\DWDORZFRVWEXWWKHFRVW quickly increases as expansion occurs.)

Standards-Based.ΖQWRGD\ȇVZRUOGSRSXODU

technologies come and go. Consider whether your SURSRVHGVROXWLRQVXVHRSHQVWDQGDUGVEDVHGEHVW in-class technologies to drive innovation and revenue IRU\RXUEXVLQHVVQHHGV$GGLWLRQDOO\H[SORUHDQ\ULVNV associated with being able to tailor your solution and your ability to act quickly when you need to do so.

Enterprise Readiness.

An important aspect of selection is whether the solution can operate within an enterprise setting. Ideal solutions have high availability DQGGLVDVWHUUHFRYHU\LQSODFH$GGLWLRQDOO\WKH\ZLOO VXSSRUW\RXUEXVLQHVVFRPSOLDQFHDQGVHFXULW\QHHGV DQGPHHW\RXUFXUUHQWFRPSDQ\GHȴQHG6/$V7REHWWHU understand how solutions operate in an enterprise HQYLURQPHQWFRQWDFWWKHVROXWLRQSURYLGHUȇVHQWHUSULVH customer references.

\HDU7RWDO&RVWRI2ZQHUVKLS

You need to add up your costs over at least three years to appreciate some of the recurring costs which may not be completely REYLRXVWKHȴUVW\HDU&RVWVLQFOXGHDOORFDWHGGDWD FHQWHULQIUDVWUXFWXUHȵRRUUDFNVSRZHUFRQQHFWLYLW\ KDUGZDUHFRPSXWHVWRUDJHDQGQHWZRUNLQJVRIWZDUH %LJ'DWDVWDFN26$GPLQ6:VHFXULW\6:DQDO\WLFV 6:SHUVRQQHOFRVWVV\VWHPVPDQDJHPHQW12& GHYHORSPHQWFRQVXOWLQJ 5HVHDUFKVKRZVWKDWQHDUO\RI%LJ'DWDSURMHFWV IDLO$WΖQIRFKLPSVZHDUHH[SHUWVLQ%LJ'DWD SURMHFWVLPSOHPHQWLQJQXPHURXVHQWHUSULVHSURMHFWV WKDWGHOLYHULQVLJKWVZLWKLQGD\V7KLV%LJ'DWD guideline is based on a culmination of our successes; we KRSH\RXȴQGYDOXHLQLWΖI\RXZRXOGOLNHWRKHDUPRUH DERXWWKHΖQIRFKLPSV&ORXGSOHDVHrequest a demo. If \RXZRXOGOLNHWRGLVFXVV\RXUSURMHFWplease request a complimentary consultation.

4.

At Infochimps

, we’ve see customers enduring

time to business value of 18-24 months with

legacy solutions, and where time-to-value

can be as short as 30 days with cloud-based

deployments.

(10)

Project Overview

Accountable Executive

Department

Impact

Objective

Approach

Expected Outcome

Phase 1 : Pilot

(11)

Category

Phase

Personnel

Initiation & Planning Software Execution Hardware Closure Training Consulting Time to Value

3yr TCO

Timing

Description

Major Activities

Initial Cost

Success Measures

Estimated Cost

Activity & Timing

Pilot Success

(12)

Dependencies, Assumptions & Constraints

Name

Project Plan Infrastructure Functional Spec Implementation Spec

Timing

Description

Deliverables

(13)

Technical Requirements

Solution Description

Data Inputs

Name

Listener Type HTTP Req./Stream, Syslog, Batch upload

Data Type

JSON, XML, CSV/TSV/ Delimited, Fixed Width, SQL, Binary

# of Channels

1 or many; Gnip is 1 source with possibly 100s

RIGL΍85/VWROLVWHQWR

12-Month Growth Factor of Data

How much volume is expected to grow in next year?

Avg. Events / Sec

Avg. Event Size Streaming Aggregation? Non-Trivial Decorators

Calls to external APIs/ data stores, complex algorithms/logic, &c. Max Sustained Events / Sec (10 min) HTTP Request JSON 1 5x 125 0.4KB By Minute Sentiment Analyzer 500

Feed 1

Feed 3

(14)

Batch Jobs

Name

Type

Description

What does this job do?

Hadoop Development Method

What method should be used to perform Hadoop jobs?

Frequency

How often does it run?

Input(s)

Where is data from?

Requires Persistent Data?

Do we have to store any data on the Hadoop cluster itself to perform the job?

Output(s)

Where does data write to?

# of Records

Avg. # of records in each run

Take last week’s raw data and bin by hour Nightly S3 No MySQL 1 B

Job 1

Wukong

Job N

Pig

Example

Hive

...

Java M/R

(15)

Type

Enviroment

Production Nodes 4 4 (ElasticSearch) 1 Elastic Cluster

Staging Nodes 2 2 (ElasticSearch) n/a

Virtual Private Cloud

Databases

Hybrid Cloud

Public Cloud

ETL + Stream Processing

Private Cloud

Hadoop

Data Stores Examples

Additional Tools

Big Data Systems & Deployment Enviroment

Type

Purpose

Any Additional Tools? Avg. Events / Sec Max. Sustained Events / Sec (10 min)

4XHU\3URȴOH

What kinds of queries/ usage/volume should be supported? Retention Policy 12-month Growth Rate Support a customer-facing web app

Keep all events from last 30 days ~10 simple searches / sec. Fewer complex/ faceted queries 50 200 3x Support a Tableau installation

Keep all events from last 7 days

~5 joins and time series queries / sec to support Tableau 10 50 3x Support an internal BI tool

Keep all events from last 90 days

~20 key/value lookups / sec. Many short table scans to create time series 200 500 3x Allow ad hoc queries in Hive

Keep all events from last 6 months Ad hoc queries running over large amounts of historical data 200 500 3x Archival/Disaster Recovery

Keep all events from last 12 months. Last 3 months in Glacier

Occasional batch jobs from Hadoop 200

500

3x

HBase

HDFS

(16)

System Uptime 99.99% Query Latency Response Time

Data Retention <200ms 99.99%

Production

Example

Staging

SLA Requirements

Solution Diagram

(17)

Contact Us

ΖQIRFKLPSVΖQF :WK6W6XLWH $XVWLQ7; www.infochimps.com info@infochimps.com Twitter: @infochimps kΖQIRFKLPSVTMΖQF

5HTXHVWD)UHH'HPR

See Infochimps Cloud for Big Data.

Infochimps Cloud is a suite of

enterprise-ready cloud services that

PDNHLWVLPSOHUIDVWHUDQGIDUOHVV

complicated to develop and deploy

%LJ'DWDDSSOLFDWLRQVLQSXEOLFYLUWXDO

private and private clouds.

2XUFORXGVHUYLFHVSURYLGHD

FRPSUHKHQVLYHDQDO\WLFVSODWIRUP

LQFOXGLQJGDWDVWUHDPLQJVWRUDJH

queries and administration. With

ΖQIRFKLPSV\RXIRFXVRQWKH

analytics that drive your business

LQVLJKWVQRWEXLOGLQJDQGPDQDJLQJD

References

Related documents

The preferential categories for social adaptation programs are: not newcomers to the labor market, with redundancy pre-history - for females, and having junior professional

2) State of the art: First step of improving the energy efficiency is to upgrade network equipment by implementing power saving modes and adaptive transmission rates, in order

Sanjeev Churiwala: So I think the overall Diageo portfolio to the overall USL portfolio still a small amount but if you look at our Press Release what we are really talking

This mu~ltiplier effect is illustrated in Figure 1, which also shows how the long-run nominal interest rate and the long-run average inflation rate implied by the monetary policy

Meeting participants included representatives (both data producers and data managers) from international projects and programmes (CLIVAR, GEOHAB, GLOBEC, IGBP, IMAGES, IMBER,

The three basic regimes of drying kinetics in porous media assumes that the drying rate will decrease as the capillary forces are no longer able to provide water to the

assumes the principal and the agent to know not only the distribution and the expectation of the stochastic influence from the environment θ and the agent’s costs C’’(e) of acting

These 275 participants represent 76% of the total population of 362 students who were eligible to participate, including 38 students participating in the CTSP during the