• No results found

The Data Quality Planning Guide

N/A
N/A
Protected

Academic year: 2021

Share "The Data Quality Planning Guide"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

A practical guide to selecting

Data Quality Software

The Data Quality

Planning Guide

(2)

You Are Here

If you’re here, you are somewhere on the road to dealing with your organiza7on’s data quality challenges and are

trying to determine the best strategy to help you get there. If you’re feeling lost, then you’re in luck - The Data

Quality Planning Guide is designed to help you easily understand your current challenges, establish a plan and carry

out an effec7ve evalua7on process so you can ul7mately find the data quality strategy or tool that best meets your

needs.

This document is complete with worksheets and checklists, your personalized Planning Guide includes Sec7ons for:

Sec4on 1: Assessing Your Data Quality Needs

Sec4on 2: Defining Your Project Scope

Sec4on 3: Conduc4ng an Effec4ve Evalua4on

Like a guidebook, feel free to print it, write on it, dog ear it, fold it, scan it, copy it, put it in a binder, add to it, share

it and use it as your guide to get you from point A to data quality success.

(3)

Table of Contents

SECTION 1: ASSESSING YOUR DATA QUALITY NEEDS

a. Profiling your current data

b. Iden4fying basic system requirements

c. Understanding your data quality needs

SECTION 2: DEFINING YOUR PROJECT SCOPE

a. Evalua4ng product func4ons

b. Understanding processing modes

c. Selec4ng desired product features

d. Establishing project parameters

SECTION 3: CONDUCTING AN EFFECTIVE EVALUATION

a. Crea4ng a vendor shortlist

b. Developing sample data

c. Evalua4ng specific vendors and tools

d. Interpre4ng the results

(4)

Section 1: Assessing Your Data Quality Needs

a.  Profiling Your Current Data

Having a clear view of your current data quality challenges and the processes and system structure you will have to work within is a cri7cal first step to developing a data quality strategy that will work for your organiza7on. There are a wide range of issues that can reside within the data, many of which may not be immediately apparent but could be the root cause of issues. Use the worksheet below to ask important ques7ons and gather the right data to inform the next step of the process - Defining Your Project Scope. Current Data Sources (CRM, Accounts, Legacy Systems, Lists, etc.) ________________________________________________________________________________________________________ ________________________________________________________________________________________________________ Current Points of Entry (CRM, Website, POS, Call Center, Batch Feeds, etc) ________________________________________________________________________________________________________ ________________________________________________________________________________________________________ Average number of records processed _______________________ Frequency of processing _________________________

Data Profile

Standard Data Elements

q  Name q  Address q  Phone Number q  Email Address q  Date of Birth q  Social Security Number q  Customer ID Number

Common Data Errors

q  Name Misspellings q  Incorrect Addresses q  Duplicate Records q  Missing Data q  Incorrect Data q  Inconsistent Data q  Login / Password q  Product or Port Numbers q  Price q  TransacNon Data q  Order Reference Number q  Shipping / Billing Addresses q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  _______________________ q  Unlimited TransacNons q  Incomplete TransacNons q  Garbage Data q  Incorrect FormaQng q  Nicknames / Aliases q  _______________________

(5)

Section 1: Assessing Your Data Quality Needs

b.  Basic System Requirements

There will be some cri7cal technical and prac7cal informa7on that may seem tedious but will be worth your 7me to collect and organize. Some components will only be relevant to the technical integrator working to get the tools installed, but others may be deal breakers for certain applica7ons. Recruit your technical department to provide the following: System Profile CRM/ERP Systems ______________________________________________________________________________ Data Warehouse PlaSorm (e.g. SQL Server, Oracle, etc.) ________________________________________________ Data Feed Types (i.e. Excel, CSV, XML, etc.) __________________________________________________________ Extract File Types (i.e. Excel, CSV, XML, etc.) __________________________________________________________ User Profile The user base for the data quality tools you select will impact the needs and features of the applica7on and will also influence the decision to purchase either desktop soQware (suitable for non-technical users) or an integrated version (opera7ng at the database level and more appropriate for the administrator or other technical representa7ve). Consider the following when iden7fying which users will be responsible for day-to-day interac7on with the data quality solu7on: q  Marke7ng Department end-user q  Mail-house staff q  Admin level staff with limited technology training q  Database Administrator q  Other: _____________________________________________

(6)

Section 1: Assessing Your Data Quality Needs

c.  Understanding Your Data Quality Needs

Once you have established a clear view of the tangible quality issues within your current database(s), it will be important to spend 7me considering the business needs of the organiza7on and how cleaner data will enable you to make beXer business decisions. Is the impetus behind the project to decrease the marke7ng spend or improve targe7ng? Are there service issues related to poor data quality? Is the organiza7on undertaking data migra7on or warehousing ini7a7ves that require cleansing and integra7on of disparate data sources? As you seek to document the goals for your evalua7on, consider these sugges7ons for developing an accurate picture of what your organiza7on needs from a data quality solu7on: Look beyond the pain. In most cases, a specific concern will be driving the urgency of the ini7a7ve but it will be well worth the effort to explore beyond the immediate pain points to other areas where data is essen7al. Plan to involve a cross-sec7on of the departments including IT, marke7ng, finance, customer service and opera7ons to understand the global impact that poor data quality could be having on your organiza7on. Look back, down and forward. Consider the data quality challenges you’ve had in the past, the ones you face today and the ones that have yet to come. Is a merger on the horizon? Is the company migra7ng to a new plaZorm? Do you an7cipate significant staffing changes? Looking ahead in this way will ensure that the investment you make will have a reasonable shelf-life. Look at the data you don’t have. As you review the quality of the data you have, also consider what’s missing and what informa7on would be valuable to customer service reps or the marke7ng department. It may exist in another data silo somewhere that just needs to be made accessible or it could require new data be collected. Be the customer. Call the Customer Service Department and put them through the paces. Sign up for marke7ng materials online. Place an order on the website. Take good notes on the places where poor data impacts your experience and then look at the data workflow through fresh eyes. Draw out the workflow. Even in small organiza7ons, there is tremendous value in mapping out the path your data takes through your business. Where it is entered, used, changed, stored and lost. Doing this will uncover business rules that are likely impac7ng the data, departments with complementary needs and or places in the workflow where improvements can be made (and problems avoided). Think big and small. Management and C-Level execu7ves tend to think big. Data analysts and technical staff tend to think granularly and departmental users usually fall somewhere in the middle. Ul7mately, the best solu7on can only be iden7fied if you consider the global, technical and strategic business needs. The challenges with iden7fying, evalua7ng and implemen7ng an effec7ve data quality solu7on are fairly predictable but problems almost always begin with incorrect assump7ons and understanding of the overall needs of the organiza7on. In some cases, the right data quality vendor can help you move through this process but ul7mately, failure to broaden the scope in this way can result in the purchase of a solu7on that does not meet all the requirements of the business.

(7)

Business Needs Worksheet

Technical Data Objec0ves q  Cleanse and standardize data as part of an exisNng data warehousing iniNaNve q  Support enterprise data governance, MDM or other global BI iniNaNves q  Data enrichment & profiling q  Data integraNon and migraNon q  Eliminate unnecessary IT resource strain q  ____________________________________________________________________________________________________ q  ____________________________________________________________________________________________________ Strategic Data Objec0ves q  Send more targeted communicaNons based on customer mail preferences q  Reduce wasted adverNsing spend of accurate mailing lists q  Improve sales and checkout process (Web, Store, Call Center) q  Improve customer service with be[er access to global customer data q  Generate more accurate view of campaign ROI q  Develop a global demographic picture q  AutomaNon and enforcement of approved business rules q  Remain in compliance with industry data requirements q  Reduce delivery complicaNons and associated overhead q  Make informed operaNonal and merchandising decisions q  Maintain a posiNve brand percepNon q  ____________________________________________________________________________________________________ q  ____________________________________________________________________________________________________ Data Quality Objec0ves q  Basic single or two-file deduplicaNon of files q  Matching of mulNple records q  Address ValidaNon q  Front-end data capture q  Batch cleansing of records q  AutomaNon of Data Quality Processes q  Establish a single customer view q  ____________________________________________________________________________________________________ q  ____________________________________________________________________________________________________

(8)

Section 2: Defining Your Project Scope

a.  Evaluating Product Functions

Data Quality product suites tend to span a broad range of func7ons and in varying combina7ons. While one company may do everything on a modular scale, some may only provide one or two func7ons. Yet others will work with partners that can carry out complementary tasks. Without a complete understanding of these “big buckets” of features and how they apply to your business needs, it’s easy to get confused or be subject to a biased opinion on what will work for you. Below is a brief descrip7on of the main func7ons offered by standard data quality packages, in order of where they typically occur in a process flow: q  Standardiza4on Many general ‘cleansing’ func7ons actually fall under the category of data standardiza7on including fixing misspellings, inconsistencies, transposi7ons and the like. Standardiza7on also applies when moving data across columns, adding state names, zip codes or 7tles in places where they are missing. q  Address Valida4on (Verifica4on) Matching contact data to standard Postal Address Files (PAF) or USPS and NCOA Data to validate and update addresses is known as Address Valida7on (or verifica7on). Here again, the datasets will vary by country but the same process is employed and driven by the organiza7on’s address matching engine. q  Data Enrichment Another broad func7on includes expanding and enhancing your exis7ng contact data with addi7onal datasets. The variety of datasets is extensive and varies by region but could include names data, date of birth, length of residency, phone and fax numbers, SIC Codes, geocoding data and more. q  Matching/Deduplica4on One of the most basic func7ons of data cleansing soQware, standard deduplica7on involves matching records within a file or between mul7ple files for merging and purging duplicate records, iden7fying your best customers or a mul7plicity of other reasons. There are a wide range of match strategies employed in deduplica7on with as wide a variety of results. The cri7cal thing to remember is that a simple count of duplicates, suppressions or records matched is essen7ally meaningless – it is the number of true and false matches that is significant. q  Record-Linking (Single Customer View) Beyond basic data cleansing is a sophis7cated matching process that allows you to ‘link’ specific records to one another, specifically for the purpose of crea7ng a single master record (or golden record). This master record would include all the relevant data for a specific contact including mail preferences, transac7ons and customer service history. This process is some7mes considered the holy grail of data cleansing because it generates the elusive Single Customer View (or 360 Degree View). The func7onal categories above represent all of the main data quality tasks an organiza7on would need to perform. There are varying methods and environments in which these tasks can be carried out and a wide range of features that any vendor would provide to handle each of these tasks. If you look back at the business objec7ves developed in Sec7on I-C, you will find that they align themselves with one or more of these tasks.

(9)

Section 2: Defining Your Project Scope

b.  Understanding Processing Modes

Another considera7on beyond the main func7ons of data cleansing soQware is how those func7ons are carried out, as not every vendor will be able to handle all the applica7ons. The main processing modes that you should consider are: q  Batch (Exis4ng Data) OQen this will be referred to as “batch data cleansing”, although this term can also be used for some of the other scenarios listed below. Here we’re talking about batch cleansing of data already in your database, to iden7fy duplicates and incorrect or insufficient data and make appropriate correc7ons. This is a cura7ve measure. q  Batch (Data Load) Batch processing is also used to match across files e.g. to match a new data feed against your exis7ng database or data warehouse so that you can add the new records without crea7ng new duplicates. Another example is to remove exis7ng customers from a marke7ng list so that you can contact the non-customers on the list. OQen, this process will be automated. Whether automated or not, this is a preventa7ve measure. q  Real Time (Interac4ve) Once you’ve got a clean database, it is far more effec7ve to keep your Data Quality standards up by u7lizing appropriate tools at point of capture, rather than let new bad data enter the database. Here, we mean tools that work interac7vely, warning the person entering the data if the address is invalid or if the record they are trying to add is already on the database. Examples of real 7me data cleansing are address verifica7on for a web inquiry form and duplicate preven7on in a CRM system. This is a preventa7ve measure. q  Real Time (Firewall) In this mode, new records are captured but the person entering the data is not prompted to correct any problems – instead, the record is validated in real 7me but any errors are either corrected in the background, or are logged for manual aXen7on off-line by someone else. An example of this is a web inquiry from a visitor to your web site which is checked against your exis7ng database in the background, so that it can be flagged as a new or exis7ng customer. This is a preventa7ve measure. With this background, the objec7ve now is to iden7fy what your ideal solu7on looks like based on the business objec7ves and the data quality func7ons you will need to achieve them. Remember to think ahead to your an7cipated needs, both granularly and globally. Consider larger data projects such as a planned data integra7on, that may impact the needs of the tools you invest in. Processing needs: _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________ _________________________________________________________________________________________________________________________

(10)

Section 2: Defining Your Project Scope

c.  Selecting Desired Product Features

Once you have made some of the broader decisions about your immediate business needs, the key func7ons you require and the methods in which you an7cipate managing your data cleansing processes, your evalua7on will turn to the granular features of the data quality tools you choose to evaluate. When it comes to features, we suggest pujng them into two categories (or columns) - ‘Needs’ and ‘Wants’. This is a cri7cal step because ‘Needs’ are not nego7able and will be a great way to quickly iden7fy which applica7ons you should put on your short list for evalua7on, while ‘Wants’ are valuable for 7pping the scale when two applica7ons come close in value. In addi7on, ‘Wants’ also give you bargaining power in cases where features are modular. Because there is oQen so much overlap in the broader data quality conversa7on and varia7on in terminology, we find it useful to discuss soQware features within the main func7onal headings previously established: •  Standardiza7on •  Address Valida7on •  Data Enrichment •  Matching/Deduplica7on •  Record-Linking Then the four processing modes: •  Batch (Exis7ng Data) •  Batch (Data Load) •  Real 7me (Interac7ve) •  Real 7me (Firewall) Before diving into the actual features list broken up accordingly, here are some other items to consider when developing your list of Required Features: •  Some companies use different terminology for the same feature. Make sure you fully understand those ‘proprietary’ phrases or processes so that when it comes 7me to evaluate features, you can do so fairly. •  Some data quality tools are modular and will offer features or sets of features in individual components with different price points and installa7ons. Take note of which features are/are not included in the modules you are considering. •  Consider the applica7ons or processes you use internally that may replicate part of all of a specific feature and how you will integrate the two, or where a new and improved applica7on or process would be the best direc7on to go in.

(11)

Features Worksheet

(12)

Processing Modes Worksheet

(13)

Section 2: Defining Your Project Scope

d.  Establishing Project Parameters

While you are knee deep in func7ons, features, vendor searches and the like, don’t ignore the need for some prac7cal planning so that when you are ready to start your evalua7on, there are some strategies and guidelines in place to keep both your vendors and your organiza7on on track. Of course, it will be important to be flexible as you go through the evalua7on process, especially when it comes to moving parts like budget and 7meframe, but having a plan and some goal parameters in place will be priceless and may mean the difference between gejng the project off the ground or lejng iner7a win out. An4cipated budget So how do you even begin to guess7mate what it should cost you to get the right solu7on in place? Two things: poten7al savings and average range. First, do the best you can to ballpark the poten7al cost savings of improving your data. In some cases, the vendor can help you with this process based on a data analysis. Typically there are as many as 10% duplicates within a database. Assume you have a rela7vely modest amount of duplicates at 5% and start there. Without gejng scien7fic, try calcula7ng wasted adver7sing spend, the resources needed to handle customer shipping complaints or how much MORE money you’d make if you had more control over your marke7ng. Second, just take a look at the high and the low end of vendors on the shortlist you will develop in Sec7on 3. Rather than randomly call a data quality organiza7on and ask a price, con7nue through with your project, develop that shortlist and then create your price range based on the func7ons and features you need. Timeframe At the early stages, this will be more of an awareness than an actual goal, and it will be one of the areas, along with budget, that will evolve over the course of your evalua7on. Be realis7c about what you can expect here and seek input from vendors and your internal team to make sure you are not cujng yourself short. If you have internal business ini7a7ves that will drive your goal date, such as an an7cipated data migra7on project or large marke7ng ini7a7ve, you can work backwards from that date, but do make sure to budget 7me for all the key steps including: •  Internal planning •  Searching for vendors •  Ini7al review •  Demoing the shortlist •  Internal decision-making •  Nego7a7on •  Implementa7on and Training Review and Approval Team This is a broader discussion in some cases as it overlaps with the developing of a Data Governance team, but the main objec7ve is to make sure you are aware of the necessary influencers, decision-makers and budget approvers that will need to be part of this process. Knowing this early on is important and it is some7mes helpful to communicate this to your vendors so that they can work with you through the approvals process. This may mean reques7ng presenta7ons to all influencers on the team, making demo soQware available to all the poten7al users, and asking the vendor to help you with documenta7on to help make the case for a C-Level execu7ve.

(14)

Section 2: Defining Your Project Scope

d.  Establishing Project Parameters (continued)

Evalua4on Strategy With this phrase, we do not mean the Evalua7on itself, but instead the process you will use to evaluate the applica7ons selected. There are several op7ons that you can take within this process and knowing in advance your strategy will help you communicate expecta7ons and guidelines to your vendors and yet again, inform your internal staff and approvals team so that the process is orderly, streamlined and stays on track. Some considera7ons for this strategy include: •  To RFP or Not to RFP: One op7on, preferably decided at the outset, would be to distribute a Request for Proposals (RFP/RFQ) to a shortlist of vendors to help with your evalua7on. This is common for state or government bids but can also be used as a valuable tool in the commercial sector. Aside from taking up a significant chunk of 7me, submijng a formal bid obligates you to perform a completely fair, balanced and unbiased evalua7on that follows a set of rules and guidelines set out in the bid. This may mean that referrals, the unexpected and sheer gut ins7nct cannot play a part, which ul7mately may mean you do not get to choose your preferred vendor. •  Demo Data or Real Data: Knowing this ahead of 7me as part of your strategy is cri7cal because this will likely be the first ques7on asked of you when making contact with a vendor. While we will always suggest that you evaluate a solu7on on your own data, in some cases this may not be 100% necessary or possible right away. You may be in the midst of a data migra7on project or could have such basic needs, such as strict address valida7on, that preparing your own data is not necessary. In either event, you should plan for this step in advance and prepare your sample data accordingly to do a thorough and efficient test of the soQware. •  Who is Driving the Ship? Business or Technology? This is the big ques7on the Data Quality industry as a whole has been asking lately and it is relevant here because it will determine the shape of your evalua7on. If you are from a business department but aQer iden7fying your requirements, decide that the organiza7on is likely to take an integrated approach, it may be best to hand off the lead role to a technology representa7ve (or vice versa). Here again, the key is to ask the ques7ons before star7ng the evalua7on because knowing your strategy at the outset is half the baXle. Appropriate Documenta4on & Files Lastly, there are some cri7cal documents that you should plan to gather before and during this process, some of which this Guide will help you to plan for. A brief list includes: •  Request for Proposal (if appropriate) using the func7onal and feature requirements outlined here •  Required Features List (with columns outlined for your individual shortlist vendors) •  Demo Data •  Review/approval forms for the members of your team •  Budget Spreadsheet

(15)

Section 3: Conducting an Effective Evaluation

a.  Creating Your Shortlist

This sounds like an easy task but in reality, the current informa7on quality industry is saturated with White Papers, Webinars, YouTube Channels and the like - all with different messages, focus areas, product features and terminology. Making sense of it can be a challenge to even the most DQ-savvy buyer but if you’ve been following the steps up un7l this point, you should be able to easily employ some of the following best prac7ces to narrow down a reasonable short list that is op7mal for evalua7on. •  Finding the Vendors Some of this may be obvious but there are a few tricks to digging up the key vendors within the industry. Google is certainly your first good bet but remember to use varying search terms because different vendors use different terminology interchangeably. While you’re surfing, don’t just look for vendor sites but user groups, blogs and analyst pages as well, because these may reveal vendors that are not coming up in the searches. •  Func4on First Once you have a name in hand, start your ini7al review by going back to your Func7onal Requirements and choosing vendors that can fill those needs. Don’t worry at this point about finding a vendor that does everything under one roof - that can be a deciding factor later on. For now, concentrate on choosing those that provide the majority of the Func7onal Requirements you are looking for. •  Features Second Once you have your big list of vendors that are in your func7onal ballpark, start narrowing down your list based on the specific features within each category. Now is the 7me to remember your Needs vs. Wants and abandon anyone who truly cannot service the basic necessi7es. •  Cross-Reference the Buzz While industry hype is not the best way to choose the perfect vendor, it is best used to eliminate companies from the compe77on based on awful press or truly nega7ve customer reviews. Keep in mind that some7mes the very best product for the job may not be the one with the brightest lights. This is the place where you simply want to rule out companies based on clear signs that they cannot provide service. •  Add Yourself to the Shortlist We don’t recommend this step because it’s a good op7on, but because you are likely to consider this anyway. At some point in the process, someone will suggest internally that you already have the resources or an ini7al price point will scare you into asking - ‘do we really need this anyway?’ We suggest looking at this step proac7vely, as though you are one of the vendors on your short list. In this way, you can truly evaluate your poten7al to carry out data quality ini7a7ves internally.

(16)

Section 3: Conducting an Effective Evaluation

b.  Developing Your Sample Data

The first word of advice – use Real Data

Many soQware trials will come preinstalled with sample or demo data designed primarily to showcase the features of the soQware. While this sample data can give you examples of generic match results, they will not be a clear reflec7on of your match results. This is why it is best to run an evalua7on of the soQware on your own data whenever possible. Using the guidelines below, we suggest ‘iden7fying’ a real dataset that is representa7ve of the challenges you will typically see within your actual database. That dataset will tell you if the soQware can find your more challenging matches, and how well it can do that. For fuzzy matching features, you may like to consider whether the data that you test with includes these situa7ons: •  Phone7c matches (e.g. Naughton and Norton) •  Reading errors (e.g. Horton and Norton) •  Typing errors (e.g. Notron, Noron, Nortopn and Norton) •  One record has 7tle and ini7al and the other has first name with no 7tle (e.g. Mr J Smith and John Smith) •  One record has missing name elements (e.g. John Smith and Mr J R Smith) •  Names are reversed (e.g. John Smith and Smith, John) •  One record has missing address elements (e.g. one record has the village or house name and the other address just has the street •  number or town) •  One record has the full postal code and the other a par7al postal code or no postal code When matching company names data, consider including the following challenges: •  Acronyms e.g. IBM, I B M, I.B.M., Interna7onal Business Machines •  One record has missing name elements e.g. 1.  The Crescent Hotel, Crescent Hotel 2.  Breeze Ltd, Breeze 3.  DeloiXe & Touche, DeloiXe, DeloiXes.

Don’t!

…create a “fake” dataset from scratch. This is not advisable because it could include unnatural scenarios that may present unreal challenges to the

(17)

Section 3: Conducting an Effective Evaluation

b.  Developing Your Sample Data (Continued)

You should also ensure that you have groups of records where the data that matches exactly, varies for pairs within the group. For example: If you don’t have these scenarios all represented, you can doctor your real data to create them, as long as you start with real records that are as close as possible to the test cases and make one or at the most two changes to each record. In the real world, matching records will have something in common – not every field will be slightly different. With regard to size, it’s beXer to work with a reasonable sample of your data than a whole database or file, otherwise the mass of informa7on runs the risk of obscuring important details and test runs take longer than they need to. We recommend that you take two selec7ons from your data – one for a specific postal code or geographic area, and one (if possible) an alphabe7cal range by last name. Join these selec7ons together and then eliminate all the exact matches – if you can’t do this easily, one of the solu7ons that you’re evalua7ng can probably do it for you. Ul7mately, you should have a reasonable size sample without so many obvious matches, which should contain a reasonable number of fuzzier matches (e.g. matches where the first character of the postal code or last name is different between two records that otherwise match, matches with phone7c varia7ons of last name, etc.) There are two clusters here, one containing three records with the same email address and another one containing three records with the same phone number. In both of these examples, clusters based on email address and the clusters based on phone number should all be grouped into one set by the matching soQware.

(18)

Section 3: Conducting an Effective Evaluation

c.  Evaluating Specific Vendors and Tools

If you made it past all the due diligence it takes to get to this point, you are in a great posi7on to conduct an effec7ve evalua7on of your data quality vendor shortlist. It means you understand your current data challenges, you have documented your basic system, made decisions on the func7ons and features you require, iden7fied a relevant shortlist of vendors and have established all the project parameters and strategy you need to guide you through the process. It has all been prepara7on for this stage. So you are probably asking yourself: now what? When it comes to actually performing the evalua7on, you can either download a free trial and evaluate the soQware yourself or engage the vendor to walk you through the process. While it may seem temp7ng to conduct an ini7al review yourself, it is not advisable because the best data quality soQware has a plethora of features and op7ons designed to help you deliver the best possible matches. The only way to truly iden7fy these op7ons and learn how to fine tune them to meet your individual data quality objec7ves is to engage a knowledgeable salesperson and have them walk you through the soQware. During this process, you will also likely be introduced to members of the technical support or integra7on teams which will provide you further exposure to the way the company works and the level of support they can provide you with the matching process. So the boXom line is to engage a company representa7ve early and oQen during your evalua7on to properly determine the soQware’s true matching capabili7es. Notes:

(19)

Section 3: Conducting an Effective Evaluation

d.  Interpreting the Results

When it comes to evalua7ng the results, remember that a simple count of duplicates, suppressions or records matched to your Postal Address File (PAF) or USPS Data is meaningless – it is the number of true and false matches that is significant, so it is important to be able to view all the matches found. When deduping, suppressing or matching across files, a good way of comparing results from two systems is as follows: 1.  Remove all the matches from the file to be cleaned using system A. 2.  Perform the same level of matching using system B and see what matches system B finds in the supposedly “clean” file. 3.  Review each match (or a reasonable propor7on) found by system B but not found by system A and count the number of true matches, the number of false matches and the number that can not be classed objec7vely as definitely true or definitely false. 4.  Repeat this process the other way round i.e. clean the raw file using system B first and then see what matches system A finds in the “clean” file. 5.  Count the number of true, false and debatable matches in this file. 6.  Compare the counts in the two “clean” files. It may be that your business requirement places more emphasis on a high match rate and that a certain level of false matches is acceptable. Alterna7vely, keeping the false match count to a minimum or even elimina7ng false matches en7rely may be the overriding objec7ve. Of course, if one system wins whichever criteria you use, the choice is easy. If not, and one system finds more true matches but also more false matches than the other, you should be able to experiment with the matching op7ons to try and reduce the number of false matches, and then repeat the process outlined above. It is likely that you will need to involve the vendor’s support team to 7me the matching, which also gives you the opportunity to see just how effec7ve the support is. When matching to a PAF file for address verifica7on, you can adopt a similar approach, but checking the results is more 7me consuming, as you need some independent way of looking up the addresses that have been matched by one system but not the other – the postal authority usually provides an online lookup facility, but some7mes the number of daily lookups is limited. One final trick concerns evalua7on using the demo data supplied with each system – you would expect the system to work well on its own demo data files, but you could also try matching the demo data file from system A in system B and vice versa. These tests are much easier to conduct when you have reduced your shortlist to two solu7ons. Additional Notes:

(20)

Today, more than ever, good business decisions depend on accurate data. Bad data means customer service suffers, opportuni7es are missed and marke7ng spend is wasted. Clean and accurate data gives you the advantage of knowing your customers so you can service them well, market to them appropriately and drive greater sales. Unfortunately, most data quality ini7a7ves are limited to simply checking the boxes. That is, they make shallow improvements to the data but never actually offer any genuine business value. Welcome to helpIT systems. Armed with unparallelled intelligent match technology, a deeply sophis7cated knowledgebase and streamlined address valida7on driving both front-end and batch cleansing solu7ons, helpIT systems goes beyond just checking the boxes. For more than 20 years we’ve been helping customers trust their data so they can use it to strengthen their business. Isn’t that what data quality is all about? Don’t just 4ck boxes. Demand more. Expect more.

Now make an

informed decision…

UK Headquarters helpIT systems ltd. 15-17 The Crescent Leatherhead KT22 8DY Tel: +44 (0) 1372 360070 US Headquarters helpIT systems inc. 600 W 28th Street Suite 201 Aus7n, TX. 78705 Tel: (866) 332.7132

References

Related documents

Oleh yang demikian, iMEC-APCOMS 2015 secara langsung telah menyediakan platform kepada para penyelidik, pelajar pascasiswazah serta pihak industri dalam bidang kejuruteraan

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

Lewis S Mills High School Litchfield High School Lyman Hall High School Manchester High School Mercy High School Naugatuck High School New Britain High School New Milford High

Before the Annual Conference begins, institutions currently accredited by DETC should plan to attend the 88th Annual Business Meeting?. Note: The Annual Business Meeting only is

When the markup is variable, " 6= 0, market power again a¤ects indeterminacy and sunspot equilibria arise for procyclical and countercyclical markups.. In Figure 3, the steady

Interference rejection in BPF for GPS applications using RF MEMS-enabled adaptable notches, and hairpin shape resonator based dual-band band pass filter designed for WLAN and

The court held that the one-year continuous presence requirement should apply to children born abroad to both unwed citizen fathers and mothers, and thereby recognized that

To test the hypothesis that HED increases positive reciprocity in humans, we used a trust game variant measuring trust and positive reciprocity taken from previous research (