• No results found

Agile Requirement Gathering Process to Avoiding Common Data Problems

N/A
N/A
Protected

Academic year: 2021

Share "Agile Requirement Gathering Process to Avoiding Common Data Problems"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Agile Requirement Gathering Process to

Avoiding Common Data Problems

Brought to you by:

(2)

©2013 Lavastorm Analytics. All rights reserved. 2

Agenda

New challenges in data quality

Using agile requirements gathering to improve

data quality

Demonstration

The importance of business/IT collaboration

Q&A

(3)

Data Quality Errors Are Pervasive and Substantial

“poor data quality costs US businesses at least 30%

of revenues—a staggering $700 billion per year”1

“poor quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead”2

“Poor data quality is a primary reason for 40% of all business initiatives failing to achieve their targeted benefits (growth, agility, competitiveness).3

Data quality effects overall labor productivity by as much as a 20%.4

1. Ovum press release, “Bad data costing US businesses $700 billion a year”, Madan Sheina, 2010 2. TDWI report, “Data Quality and the Bottom Line”, Wayne Eckerson, 2002.

3. Gartner report, “Measuring the Business Value of Data Quality”, Ted Friedman and Michael Smith, October 10, 2011 4. Gartner report, “Measuring the Business Value of Data Quality”, Ted Friedman and Michael Smith, October 10, 2011

(4)

©2013 Lavastorm Analytics. All rights reserved. 4

Data Quality Issues - In the News

$1.2 trillion lost as Dow plunges nearly 1,000 points in 15 minutes - triggered by trader who entered a “b” for billion instead of “m”

for million; disconnect between two systems, May 20101 The US Federal Reserve - $4 Billion spreadsheet error in calculation of Consumer Revolving Credit, August 20102

JP Morgan Chase reveals it miscalculated Value at Risk in their financial/compliance reports because of spreadsheet error, 20133,4

Harvard University – spreadsheet error by leading economists miscalculated relationship between debt and economic growth, wrongly influenced austerity policies in European, 2013.5

1. http://money.cnn.com/2010/05/11/pf/expert/market_crash.moneymag/

2. http://www.zerohedge.com/article/blatant-data-error-federal-reserve

3. http://www.zerohedge.com/news/2013-02-12/how-rookie-excel-error-led-jpmorgan-misreport-its-var-years 4. http://files.shareholder.com/downloads/ONE/2261602328x0x628656/4cb574a0-0bf5-4728-9582-

5. http://www.iqtrainwrecks.com/

(5)

Criteria for Evaluating Data Quality

Accuracy: Do the data represent reality or a verifiable source?

Integrity: Is the structure of data and relationships among entities and attributes maintained consistently?

Consistency: Are data elements consistently defined and understood?

Completeness: Is all necessary data present?

Validity: Do data values fall within acceptable ranges?

Timeliness: Is data available when needed?

Accessibility: Is the data accessible, understandable, and usable?

1. TDWI report, “Data Quality and the Bottom Line”, Wayne Eckerson, 2002.

(6)

©2013 Lavastorm Analytics. All rights reserved. 6

The Challenges You Find

Transformations

Incorrect Values

Duplicate Data

System Logs

Inaccurate Data

Unknown Data

(7)

Data quality isn’t a new issue. Why is this a problem

now?

Relational Database Silos, Structured Data, Data Warehouses

New Databases

& Sources

Enterprise Data Outside The DW

Unstructured, Semi- structured Documents

Big Data Analytics

Unify Silos, More Data

Traditional Analytics

Third-Party Data

(8)

©2013 Lavastorm Analytics. All rights reserved. 8

Understanding The Data = Data Integrity

How do I interpret field C_NUM = 6173545422? Is it:

– 6173545422, a customer identification number?

– 617-354-5422, a cellular phone number?

– 61,735,5422, a count of numbered contracts?

– $61,735,454.22, a capital expense amount?

What if it’s 6173545422? What about 6173545422?

Data meaning is in the eye of the beholder (i.e. the

business user)

Data quality comes down to getting the right data

requirements

(9)

Why It’s Hard To Fix – The Operational Reality

Ownership - Traditionally, the onus for data quality is upon centralized IT data brokers who may not see some data quality issues because they lack the SME to recognize them. Decision makers should have some ownership of the data which inform their actions.

Understanding - Disconnect between business owner and data owner can be massive and difficult to close, as they speak different languages, use different tools, have different priorities. Both need the right tools and methodology.

Control - Optimally, all relevant systems across silos should have built-in controls which preempt errors at data entry, but this takes time and money and lags behind the business, which itself wants to constantly implement

changes which obviate controls. Rules will be broken, so new rules need to be implemented quickly.

(10)

©2013 Lavastorm Analytics. All rights reserved. 10

Why It’s Hard To Fix – The Operational Reality

Two choices:

Wait for the curated data set to adequately reflect business

requirements and produce dirty data (and make bad decisions) in the interim, working around the infirmities in the data through best efforts

Adopt a new, agile approach to implement the validations and data hygiene necessary to produce accurate analytical results, creating a solid data foundation in advance of the necessary centralized system changes (and helping to define them)

(11)

The Agile Requirement Gathering Approach

Ownership - Decentralization fosters Agility

IT department cedes some control over the data by providing business users with tools and access, then letting them work with greater self-sufficiency

Decentralization is a fundamental requirement for an agile approach to data analysis, but first requires trust and experimentation

Understanding - Collaboration improves Agility

IT data brokers and business analysts abandon protracted traditional “waterfall”

approach (meetings, documents, meetings, coding, etc.) for a data-centric ongoing conversation to match business rules with system data

Control – Tools enable Agility

Unified toolset which matches complementary skill sets, incorporates all sources of knowledge, provides traceability, and can be modified as business rules and data change – maintaining all of the necessary data quality controls

(12)

©2013 Lavastorm Analytics. All rights reserved. 12

The Agile Requirement Gathering Approach

Accuracy

“Which is really the system of record here?”

“Trust me, it’s the billing system – CRM data needs a bath. Don’t trust those SSNs in the CRM system, we made some bad assumptions during a migration.”

Integrity

“I’m thinking these two records should match, but they don’t.”

“Yes, can we try some fuzzy matching to find the exceptions and work them back into the analysis?”

Consistency

“It’s going to take some time to get all of those different customer IDs in the same format across databases”

“Fine, but let’s fix them locally and get something working, then worry about the warehouse later.”

Completeness, Validity, Timeliness, Accessibility…

Joe Smith 123-45-6789 Joe Smith 916-34-8239

Joe Smith 10/1/1980 Joe Smith 10/1/1981

Joe Smith C-916348239 Joe Smith 00916348239-1

???

???

???

(13)

Demonstration

(14)

©2013 Lavastorm Analytics. All rights reserved. 14

The Importance of Business/IT Collaboration

Organizations that have achieved lasting benefits from

formal data quality improvement programs tend to take

a holistic approach involving people, repeatable

processes, and appropriate technology.

An agile approach is predicated upon decentralization,

moving the ownership of data closest to those who

understand the data and are impacted by quality

control over the data.

All of this requires trust, which is fueled by increased

agility of analyses and accuracy of results.

(15)

A Virtuous Cycle Of Agility, Accuracy, And Trust

Agility

Accuracy

Trust

Agile collaboration between self-

sufficient business SMEs and data brokers yields better, faster results

Accurate results increase trust,

lowering objections to further

decentralization

Trust fosters collaboration between IT departments and business users, starting with the data-driven requirements gathering process which is essential to trustworthy analyses

(16)

©2013 Lavastorm Analytics. All rights reserved. 16

Lavastorm’s software makes business analysts heroes by giving them a new, agile approach to analyze,

optimize, and control data and processes

Contact Us

Mark Marinelli +1 617-948-6244

[email protected]

Follow Us

www.lavastorm.com

Lavastorm_News

Lavastorm Analytics Group Lavastorm Analytics

Brought to you by:

Get Lavastorm Analytics Engine Public Edition (FREE)

http://www.lavastorm.com/resources /software-downloads-trials/

References

Related documents

To allow the disconnection between perpendicular walls, Model 4 allows a physical discontinuity in the perimetral wooden beams at the roof (Figure 7.9). The rear and front

After creating the metadata for an entity type, you can use the Generate Jobs option from the entity type editor toolbar to create and publish jobs to the DataFlux Data

FREE V BUCKS GENERATOR NO HUMAN VERIFICATION - FREE V BUCKS EASY 2020.. #RBBattles: The player will be given 100

We recognise that some pupils, for example some pupils with special educational needs and disabilities (SEND), may not be able to access remote education without support from

■ All personnel involved in any aspect of the handling of cytotoxic drugs must receive an orientation to cytotoxic drugs, including their known potential risks, relevant techniques

In addition to the taxation of unrelated business taxable income, tax-exempt organizations and individual retirement accounts are also subject to taxation on any

a) Transecto de la Sierra Madre Oriental 1. Esta importante provincia domina el paisaje y los recursos naturales del NE de México. En el cañón de La Huasteca tendremos la oportunidad

The present study was undertaken to assess the effects of hot air drying on phenolic compositions, total phenolic (TP) content, total anthocyanin (TA) content, as well