BODS30 - SAP Data Services - Data Quality Management
Full text
(2) Copyright Copyright © 2011 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.. Trademarks •. Microsoft®, WINDOWS®, NT®, EXCEL®, Word®, PowerPoint® and SQL Server® are registered trademarks of Microsoft Corporation.. •. IBM®, DB2®, OS/2®, DB2/6000®, Parallel Sysplex®, MVS/ESA®, RS/6000®, AIX®, S/390®, AS/400®, OS/390®, and OS/400® are registered trademarks of IBM Corporation.. •. ORACLE® is a registered trademark of ORACLE Corporation.. •. INFORMIX®-OnLine for SAP and INFORMIX® Dynamic ServerTM are registered trademarks of Informix Software Incorporated.. •. UNIX®, X/Open®, OSF/1®, and Motif® are registered trademarks of the Open Group.. •. Citrix®, the Citrix logo, ICA®, Program Neighborhood®, MetaFrame®, WinFrame®, VideoFrame®, MultiWin® and other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.. •. HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.. •. JAVA® is a registered trademark of Sun Microsystems, Inc.. •. JAVASCRIPT® is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.. •. SAP, SAP Logo, R/2, RIVA, R/3, SAP ArchiveLink, SAP Business Workflow, WebFlow, SAP EarlyWatch, BAPI, SAPPHIRE, Management Cockpit, mySAP.com Logo and mySAP.com are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other products mentioned are trademarks or registered trademarks of their respective companies.. Disclaimer THESE MATERIALS ARE PROVIDED BY SAP ON AN "AS IS" BASIS, AND SAP EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS OR APPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THESE MATERIALS AND THE SERVICE, INFORMATION, TEXT, GRAPHICS, LINKS, OR ANY OTHER MATERIALS AND PRODUCTS CONTAINED HEREIN. IN NO EVENT SHALL SAP BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES OF ANY KIND WHATSOEVER, INCLUDING WITHOUT LIMITATION LOST REVENUES OR LOST PROFITS, WHICH MAY RESULT FROM THE USE OF THESE MATERIALS OR INCLUDED SOFTWARE COMPONENTS.. g201192861428.
(3) About This Handbook This handbook is intended to complement the instructor-led presentation of this course, and serve as a source of reference. It is not suitable for self-study.. Typographic Conventions American English is the standard used in this handbook. The following typographic conventions are also used. Type Style. Description. Example text. Words or characters that appear on the screen. These include field names, screen titles, pushbuttons as well as menu names, paths, and options. Also used for cross-references to other documentation both internal and external.. 2011. Example text. Emphasized words or phrases in body text, titles of graphics, and tables. EXAMPLE TEXT. Names of elements in the system. These include report names, program names, transaction codes, table names, and individual key words of a programming language, when surrounded by body text, for example SELECT and INCLUDE.. Example text. Screen output. This includes file and directory names and their paths, messages, names of variables and parameters, and passages of the source text of a program.. Example text. Exact user entry. These are words and characters that you enter in the system exactly as they appear in the documentation.. <Example text>. Variable user entry. Pointed brackets indicate that you replace these words and characters with appropriate entries.. © 2011 SAP AG. All rights reserved.. iii.
(4) About This Handbook. BODS30. Icons in Body Text The following icons are used in this handbook. Icon. Meaning For more information, tips, or background. Note or further explanation of previous point Exception or caution Procedures. Indicates that the item is displayed in the instructor's presentation.. iv. © 2011 SAP AG. All rights reserved.. 2011.
(5) Contents Course Overview ............................................................................. vii Course Goals.................................................................................vii Course Objectives ...........................................................................vii. Unit 1: Describing Data Services ...........................................................1 Defining Data Services....................................................................... 2. Unit 2: Using Data Quality Transforms .................................................. 21 Data Quality Transforms ................................................................... 22. Unit 3: Using Address Cleanse Transforms............................................ 51 Using the Address Cleanse Transforms ................................................ 52. Unit 4: Using Data Cleanse Transforms ................................................. 93 Using the Data Cleanse Transform....................................................... 94. Unit 5: Matching and Consolidating Data..............................................129 Using the Match Wizard .................................................................. 131 Configuring the Match Transform........................................................ 156 Performing Post-Match Processing ..................................................... 204 Consolidating Matching Records ........................................................ 230 Using Advanced Match Strategies ...................................................... 261. 2011. © 2011 SAP AG. All rights reserved.. v.
(6) Contents. vi. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(7) Course Overview SAP BusinessObjects Data Quality enables you to parse, cleanse, standardize, consolidate, and enhance records. In this two-day course, you will learn about cleansing address and firm data, and match and consolidate records. As a business benefit, by being able to create efficient data quality projects, you can use the transformed data to help improve operational and supply chain efficiencies, enhance customer relationships, create new revenue opportunities, and optimize return on investment from enterprise applications.. Target Audience This course is intended for the following audiences: • •. Consultants and project team members responsible for implementing data quality projects Customer users responsible for administering and managing data quality projects. Course Prerequisites Required Knowledge • • • • •. Thorough knowledge of ETL (Extraction, Transformation and Loading) data processes Thorough knowledge of defining source and target metadata in the SAP BusinessObjects Designer Thorough knowledge of creating batch jobs Thorough knowledge of using functions, scripts, and variables Thorough knowledge of using Data Integrator Platform transforms. Course Goals This course will prepare you to: • •. 2011. Use Data Quality transforms to parse, standardize, cleanse, enhance and match records Consolidate matching records to form a best record. © 2011 SAP AG. All rights reserved.. vii.
(8) Course Overview. BODS30. Course Objectives After completing this course, you will be able to: • • • • •. viii. Complete an Address Cleanse transform Work with global address data Complete a Data Cleanse transform Set up matching using the Match Wizard Consolidate matching records to form a best record. © 2011 SAP AG. All rights reserved.. 2011.
(9) Unit 1 Describing Data Services Unit Overview Lesson introduction Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes After completing this lesson, you will be able to: • • • •. Describe the purpose of Data Services Describe Data Services architecture Define Data Services objects Use the Data Services Designer interface. Unit Objectives After completing this unit, you will be able to: • •. Define Data Services objects Use the Data Services Designer interface. Unit Contents Lesson: Defining Data Services ..................................................... 2. 2011. © 2011 SAP AG. All rights reserved.. 1.
(10) Unit 1: Describing Data Services. BODS30. Lesson: Defining Data Services Lesson Overview Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes.. Lesson Objectives After completing this lesson, you will be able to: • •. Define Data Services objects Use the Data Services Designer interface. Business Example For reporting in SAP NetWeaver Business Warehouse, your company needs data from diverse data sources, such as SAP systems, non-SAP systems, the Internet, and other business applications. You should therefore examine the technologies that SAP NetWeaver BW offers for data acquisition.. Describing Data Services BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location. Note: Although Data Services can be used for both real-time and batch jobs, this course covers batch jobs only. Data Services combines both batch and real-time data movement and management with intelligent caching to provide a single data integration platform for information management from any information source and for any information use.. 2. © 2011 SAP AG. All rights reserved.. 2011.
(11) BODS30. Lesson: Defining Data Services. Figure 1: Data Services Architecture. This unique combination allows you to: • • • •. Stage data in an operational data store, data warehouse, or data mart. Update staged data in batch or real-time modes. Create a single environment for developing, testing, and deploying the entire data integration platform. Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.. Data Services performs three key functions that can be combined to create a scalable, high-performance data platform. It: •. • •. Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes. Creates routing requests to a data warehouse or ERP system using complex rules. Applies transactions against ERP systems.. Data mapping and transformation can be defined using the Data Services Designer graphical user interface. Data Services automatically generates the appropriate interface calls to access the data in the source system. For most ERP applications, Data Services generates SQL optimized for the specific target database (Oracle, DB2, SQL Server, Informix, and so on). Automatically-generated, optimized code reduces the cost of maintaining data. 2011. © 2011 SAP AG. All rights reserved.. 3.
(12) Unit 1: Describing Data Services. BODS30. warehouses and enables you to build data solutions quickly, meeting user requirements faster than other methods (for example, custom-coding, direct-connect calls, or PL/SQL). Data Services can apply data changes in a variety of data formats, including any custom format using a Data Services adapter. Enterprise users can apply data changes against multiple back-office systems singularly or sequentially. By generating calls native to the system in question, Data Services makes it unnecessary to develop and maintain customized code to manage the process. You can also design access intelligence into each transaction by adding flow logic that checks values in a data warehouse or in the transaction itself before posting it to the target ERP system.. The Data Services Architecture Data Services relies on several unique components to accomplish the data integration and data quality activities required to manage your corporate data. Data Services includes the standard components: • • • • • • • • • •. Designer Repository Job Server Engines Access Server Adapters Real-time Services Address Server Cleansing Packages, Dictionaries, and Directories Management Console. This diagram illustrates the relationships between these components: The Data Services Designer Data Services Designer is a Windows client application used to create, test, and manually execute jobs that transform data and populate a data warehouse. Using Designer, you create data management applications that consist of data mappings, transformations, and control logic.. 4. © 2011 SAP AG. All rights reserved.. 2011.
(13) BODS30. Lesson: Defining Data Services. Figure 2: Data Services Designer Interface. You can create objects that represent data sources, and then drag, drop, and configure them in flow diagrams. Designer allows you to manage metadata stored in a local repository. From the Designer, you can also trigger the Job Server to run your jobs for initial application testing. The Data Services Repository The Data Services repository is a set of tables that holds user-created and predefined system objects, source and target metadata, and transformation rules. It is set up on an open client/server platform to facilitate sharing metadata with other enterprise tools. Each repository is stored on an existing Relational Database Management System (RDBMS).. 2011. © 2011 SAP AG. All rights reserved.. 5.
(14) Unit 1: Describing Data Services. BODS30. Figure 3: Data Services Repository. There are three types of repositories: •. •. •. A local repository (known in Designer as the Local Object Library) is used by an application designer to store definitions of source and target metadata and Data Services objects. A central repository (known in Designer as the Central Object Library) is an optional component that can be used to support multiuser development. The Central Object Library provides a shared library that allows developers to check objects in and out for development. A profiler repository is used to store information that is used to determine the quality of data.. The Data Services Job Server Each repository is associated with at least one Data Services Job Server, which retrieves the job from its associated repository and starts the data movement engine. The data movement engine integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions from ERP systems and other sources. The Job Server can move data in batch or real-time mode and uses distributed query optimization, multithreading, in-memory caching, in-memory data transformations, and parallel processing to deliver high data throughput and scalability.. 6. © 2011 SAP AG. All rights reserved.. 2011.
(15) BODS30. Lesson: Defining Data Services. Figure 4: Data Services Architecture – Job Server. While designing a job, you can run it from the Designer. In your production environment, the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data Services Access Server. In production environments, you can balance job loads by creating a Job Server Group (multiple Job Servers), which executes jobs according to overall system load. Data Services provides distributed processing capabilities through the Server Groups. A Server Group is a collection of Job Servers that each reside on different Data Services server computers. Each Data Services server can contribute one, and only one, Job Server to a specific Server Group. Each Job Server collects resource utilization information for its computer. This information is utilized by Data Services to determine where a job, data flow, or subdata flow (depending on the distribution level specified) should be executed. The Data Services Engines When Data Services jobs are executed, the Job Server starts Data Services engine processes to perform data extraction, transformation, and movement. Data Services engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability. The Data Services Cleansing Packages. 2011. © 2011 SAP AG. All rights reserved.. 7.
(16) Unit 1: Describing Data Services. BODS30. Figure 5: Data Services Architecture–Cleansing Packages-. The Data Quality Cleansing Packages files provide referential data for the Data Cleanse and Address Cleanse transforms to use when parsing, standardizing, and cleansing name and address data. Cleansing Packages enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules. Directories provide information on addresses from postal authorities; cleansing package files are used to identify, parse, and standardize data such as names, titles, and firm data. Dictionaries also contain acronym, match standard, gender, capitalization, and address information. The Data Services Management Console. 8. © 2011 SAP AG. All rights reserved.. 2011.
(17) BODS30. Lesson: Defining Data Services. The Data Services Management Console provides access to these features: •. Administrator Administer Data Services resources, including:. •. – Scheduling, monitoring, and executing batch jobs – Configuring, starting, and stopping real-time services – Configuring Job Server, Access Server, and repository usage – Configuring and managing adapters – Managing users – Publishing batch jobs and real-time services via web services – Reporting on metadata Auto Documentation View, analyze, and print graphical representations of all objects as depicted in Data Services Designer, including their relationships, properties, and more.. •. Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data Services batch jobs to quickly review, assess, and identify potential inconsistencies or errors in source data.. •. Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns, and SAP BusinessObjects Business Intelligence platform objects such as universes, business views, and reports.. •. Operational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one or more repositories over a given time period.. •. Data Quality Reports Use data quality reports to view and export SAP Crystal Reports for batch and real-time jobs that include statistics-generating transforms. Report types include job summaries, transform-specific reports, and transform group reports. To generate reports for Match, US Regulatory Address Cleanse, and Global Address Cleanse transforms, you must enable the Generate report data option in the Transform Editor.. Other Data Services Tools There are also several tools to assist you in managing your Data Services installation.. 2011. © 2011 SAP AG. All rights reserved.. 9.
(18) Unit 1: Describing Data Services. BODS30. The Data Services Repository Manager allows you to create, upgrade, and check the versions of local, central, and profiler repositories. The Data Services Server Manager allows you to add, delete, or edit the properties of Job Servers. It is automatically installed on each computer on which you install a Job Server. Use the Server Manager to define links between Job Servers and repositories. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (for example, separating test and production environments). The License Manager displays the Data Services components for which you currently have a license. The Metadata Integrator allows Data Services to seamlessly share metadata with SAP BusinessObjects Intelligence products. Run the Metadata Integrator to collect metadata into the Data Services repository for Business Views and Universes used by SAP Crystal Reports, Desktop Intelligence documents, and Web Intelligence documents. Note: These other tools are not discussed in this course.. Defining Data Services Objects Data Services provides you with a variety of objects to use when you are building your data integration and data quality applications.. Figure 6: Data Services Object Types. Data Services Objects. 10. © 2011 SAP AG. All rights reserved.. 2011.
(19) BODS30. Lesson: Defining Data Services. In Data Services, all entities you add, define, modify, or work with are objects. Some of the most frequently used objects are: • • • • • •. Projects Jobs Work flows Data flows Transforms Scripts. This diagram shows some common objects.. Figure 7: Data Services Objects. All objects have options, properties, and classes. Each can be modified to change the behavior of the object. Options control the object. For example, to set up a connection to a database, the database name is an option for the connection. Properties describe the object. For example, the name and creation date describe what the object is used for and when it became active. Attributes are properties used to locate and organize objects. Classes define how an object can be used. Every object is either reusable or single-use. Single-use objects appear only as components of other objects. They operate only in the context in which they were created. You cannot copy single-use objects.. 2011. © 2011 SAP AG. All rights reserved.. 11.
(20) Unit 1: Describing Data Services. BODS30. A reusable object has a single definition and all calls to the object refer to that definition. If you change the definition of the object in one place, and then save the object, the change is reflected to all other calls to the object. Most objects created in Data Services are available for reuse. After you define and save a reusable object, Data Services stores the definition in the repository. You can then reuse the definition as necessary by creating calls to it. For example, a data flow within a project is a reusable object. Multiple jobs, such as a weekly load job and a daily load job, can call the same data flow. If this data flow is changed, both jobs call the new version of the data flow. You can edit reusable objects at any time independent of the current open project. For example, if you open a new project, you can open a data flow and edit it. However, the changes you make to the data flow are not stored until you save them. Defining Relationship between Objects Jobs are composed of work flows and/or data flows: • •. A work flow is the incorporation of several data flows into a sequence. A data flow process transforms source data into target data.. Figure 8: Data Services Object Relationships. 12. © 2011 SAP AG. All rights reserved.. 2011.
(21) BODS30. Lesson: Defining Data Services. A work flow orders data flows and the operations that support them. It also defines the interdependencies between data flows. For example, if one target table depends on values from other tables, you can use the work flow to specify the order in which you want Data Services to populate the tables. You can also use work flows to define strategies for handling errors that occur during project execution, or to define conditions for running sections of a project. A data flow defines the basic task that Data Services accomplishes, which involves moving data from one or more sources to one or more target tables or files. You define data flows by identifying the sources from which to extract data, the transformations the data should undergo, and targets. Defining projects and jobs A project is the highest-level object in Designer. Projects provide a way to organize the other objects you create in Designer. A job is the smallest unit of work that you can schedule independently for execution. A project is a single-use object that allows you to group jobs. For example, you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. Projects have these characteristics: • • •. Projects are listed in the Local Object Library. Only one project can be open at a time. Projects cannot be shared among multiple users.. The objects in a project appear hierarchically in the project area. If a plus sign (+) appears next to an object, you can expand it to view the lower-level objects contained in the object. Data Services displays the contents as both names and icons in the project area hierarchy and in the workspace. Jobs must be associated with a project before they can be executed in the project area of Designer. Using Work Flows Jobs with data flows can be developed without using work flows. However, one should consider nesting data flows inside of work flows by default. This practice can provide various benefits. Always using work flows makes jobs more adaptable to additional development and/or specification changes. For instance, if a job initially consists of four data flows that are to run sequentially, they could be set up without work flows. But what if specification changes require that they be merged into another job instead? The developer would have to replicate their sequence correctly in the other job. If these had been initially added to a work flow, the developer could then have simply. 2011. © 2011 SAP AG. All rights reserved.. 13.
(22) Unit 1: Describing Data Services. BODS30. copied that work flow into the correct position within the new job. There would be no need to learn, copy, and verify the previous sequence. The change can be made more quickly with greater accuracy. Even if there is one data flow per work flow, there are benefits to adaptability. Initially, it may have been decided that recovery units are not important; the expectation being that if the job fails, the whole process could simply be rerun. However, as data volumes tend to increase, it may be determined that a full reprocessing is too time-consuming. The job may then be changed to incorporate work flows to benefit from recovery units to bypass reprocessing of successful steps. However, these changes can be complex and can consume more time than allotted for in a project plan. It also opens up the possibility that units of recovery are not properly defined. Setting these up during initial development when the full analysis of the processing nature is preferred. Note: This course focuses on creating batch jobs using database Datastores and file formats.. Using the Data Services Designer The Data Services Designer interface allows you to plan and organize your data integration and data quality jobs in a visual way. Most of the components of Data Services can be programmed with this interface. Describing the Designer Window The Data Services Designer interface consists of a single application window and several embedded supporting windows. The application window contains the menu bar, toolbar, Local Object Library, project area, tool palette, and workspace. Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. The table shows the tab on which the object type appears in the Local Object Library and describes the Data Services context in which you can use each type of object. You can import objects to and export objects from your Local Object Library as a file. Importing objects from a file overwrites existing objects with the same names in the destination Local Object Library. Whole repositories can be exported in either .atl or .xml format. Using the .xml file format can make repository content easier for you to read. It also allows you to export Data Services to other products. Using the Tool Palette. 14. © 2011 SAP AG. All rights reserved.. 2011.
(23) BODS30. Lesson: Defining Data Services. The tool palette is a separate window that appears by default on the right edge of the Designer workspace. You can move the tool palette anywhere on your screen or dock it on any edge of the Designer window. The icons in the tool palette allow you to create new objects in the workspace. Disabled icons occur when there are invalid entries to the diagram open in the workspace. To show the name of each icon, hold the cursor over the icon until the tool tip for the icon appears. When you create an object from the tool palette, you are creating a new definition of an object. If a new object is reusable, it is automatically available in the Local Object Library after you create it. If you select the data flow icon from the tool palette and define a new data flow called DF1, you can later drag that existing data flow from the Local Object Library and add it to another data flow called DF2. Using the Workspace When you open a job or any object within a job hierarchy, the workspace becomes active with your selection. The workspace provides a place to manipulate objects and graphically assemble data movement processes. These processes are represented by icons that you drag and drop into a workspace to create a diagram. This diagram is a visual representation of an entire data movement application or some part of a data movement application. You specify the flow of data by connecting objects in the workspace from left to right in the order you want the data to be moved.. 2011. © 2011 SAP AG. All rights reserved.. 15.
(24) Unit 1: Describing Data Services. BODS30. Lesson Summary You should now be able to: • Define Data Services objects • Use the Data Services Designer interface. 16. © 2011 SAP AG. All rights reserved.. 2011.
(25) BODS30. Unit Summary. Unit Summary You should now be able to: • Define Data Services objects • Use the Data Services Designer interface. 2011. © 2011 SAP AG. All rights reserved.. 17.
(26) Unit Summary. 18. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(27)
(28)
(29) Unit Summary. 19. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(30) Unit Summary. 20. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(31) Unit 2 Using Data Quality Transforms Unit Overview Data Quality transforms are used to parse, standardize, cleanse, enhance, match, and consolidate your records. After completing this lesson, you will be able to: •. Describe Data Quality transforms. Unit Objectives After completing this unit, you will be able to: • • •. Define Data Quality processes Use Data Quality transforms Configure Data Quality transforms. Unit Contents Lesson: Data Quality Transforms.................................................. 22 Exercise 1: Create Source and Target Datastores.......................... 41. 2011. © 2011 SAP AG. All rights reserved.. 21.
(32) Unit 2: Using Data Quality Transforms. BODS30. Lesson: Data Quality Transforms Lesson Overview SAP BusinessObjects Data Services offers several Data Quality and Address transforms, which allows you to save time and resources in the production of mailings.. Lesson Objectives After completing this lesson, you will be able to: • • •. Define Data Quality processes Use Data Quality transforms Configure Data Quality transforms. Business Example Mail that is undeliverable generate a considerable cost to the mailer: wasted effort, wasted materials, and wasted time. Two of the major reasons for undeliverable mail are incorrect addresses and mail recipients moving to a new address. Incorrect addresses are the result of missing apartment or suite numbers, incorrect or missing street numbers, missing box numbers, incorrect street names. and incorrect recipients. On an average, in one year in the United States, there are 46 million people move and over one million new addresses are created. According to the United States Postal Service (USPS) National Postal Forum dated, May 17 – 20, 2009, the reason “Customer has Moved” is responsible for over 75% of undeliverable mail. Consider one mailing of 100,000 addresses sending at full rate. The postage for a 1 ounce piece of mail is 44¢. The average automation discount is 5¢. A customer would save $5,000 in postal service discounts with address cleansing. If a customer also deduplicates the list and the list has 2% duplicates, they could save an additional $780 on postage. If the piece costs 50¢ to create, the customer could save an additional $1000 in creation/printing costs. Imagine if this were a monthly or weekly billing!. Describing the data quality framework The term “Data Quality” is used often, but what does it mean?. Figure 9: What Is Data Quality?. 22. © 2011 SAP AG. All rights reserved.. 2011.
(33) BODS30. Lesson: Data Quality Transforms. The most common name and address data quality errors are seen below.. Figure 10: What Are Common Data Quality Errors?. Before you can implement an effective data quality project, you must first understand the data quality framework, which is a continuous cycle of activities that starts with measuring: 1. Measuring. 2011. © 2011 SAP AG. All rights reserved.. 23.
(34) Unit 2: Using Data Quality Transforms. BODS30. Figure 11: Data Quality – Assessment. Start the process by using a data profiling application to quantify the number and types of defects in your data. 2. Analyzing. Figure 12: Data Quality – Measure and Analyze. Once the data has been measured, assess the nature and cause of data defects. This information enables you to inform stakeholders of your findings and develop a data quality strategy. 3. Parsing. 24. © 2011 SAP AG. All rights reserved.. 2011.
(35) BODS30. Lesson: Data Quality Transforms. Figure 13: Data Quality – Cleansing. After the data has been assessed, identify and isolate the individual elements that make up the data. 4. Standardizing. 2011. © 2011 SAP AG. All rights reserved.. 25.
(36) Unit 2: Using Data Quality Transforms. BODS30. Figure 14: Data Cleansing – Parse, Standardize, and Correct. Next, normalize and format the parsed data according to business rules and third-party referential data. 5. Cleansing After standardization, verify and cleanse the data according to business rules and third-party referential data. 6. Enhancing. 26. © 2011 SAP AG. All rights reserved.. 2011.
(37) BODS30. Lesson: Data Quality Transforms. Figure 15: Data Quality – Enhancement. Append additional information that may increase the value of the data based on referential data.. Figure 16: Data Enhancement 1. An example of data enhancement can be seen in the below graphic.. 2011. © 2011 SAP AG. All rights reserved.. 27.
(38) Unit 2: Using Data Quality Transforms. BODS30. Figure 17: Data Enhancement 2. 7. Matching Once the data has been fully cleansed, perform the matching process to identify duplicate records within multiple tables or databases. Matching after cleansing increases the likelihood of discovering overlaps. 8. Consolidating Once matches have been identified, consolidate the data to populate a best record in a match set or update all records in a match set. 9. Continuous Monitoring Use data profiling software to measure ongoing data quality scores and provide alerts when data scores fall outside of defined thresholds. Where Is Data Quality Used? Data quality is needed throughout an organization and should be used everywhere that data is managed. This includes not only batch processing data extracted from a current system, but also transactional processing on point of entry to the system.. 28. © 2011 SAP AG. All rights reserved.. 2011.
(39) BODS30. Lesson: Data Quality Transforms. Figure 18: Where Does Data Quality Fit?. The following application areas can benefit from the implementation of data quality: Application Area. Use Examples. Data Warehousing/Business Intelligence. Enterprise Resource Planning/Supply Chain Customer Relationship Management (Call Centers, SFA). • • • •. Supplier management. • • • • • •. 2011. Reporting Analysis Metrics Business intelligence. Direct mail/customer communication Campaign management Loyalty program management Cross and up sales Lifetime value 360 degree view. © 2011 SAP AG. All rights reserved.. 29.
(40) Unit 2: Using Data Quality Transforms. Consumer and Corporate Householding. BODS30. • • •. Fraud Detection/Compliance. Distribution Projects. Business Process Automation. Mergers/Questions/Divestitures. Multi-level customer, supplier, and patient databases Cross selling for expanded offerings Identification of multiple organizations as a single company. • •. BASEL II Sarbanes-Oxley Act. • • • •. Distribution of goods Publishers Printers Service bureaus. • • • •. Real-time decisions Order entry Web forms Self service. Market bounce/EU. Defining Data Quality Processes You can use Data Services to improve the quality of your data by using Data Quality transforms to parse, standardize, cleanse, enhance, and match records. Parsing Data Parsing identifies data and breaks it down into its component parts. It rearranges data elements in a single field or moves multiple data elements from a single data field to multiple discrete fields. For example, a name can be parsed into new fields such as prename, first name, last name, and postname.. 30. © 2011 SAP AG. All rights reserved.. 2011.
(41) BODS30. Lesson: Data Quality Transforms. Input Record. Output fields Prename:. Mr.. First Name:. Dan. Middle Name:. R.. Mr. Dan R. Smith, Jr., CPA Account Mgr.. Last Name:. Smith. Maturity Postname:. Jr.. Jones Inc.. Other Postname:. CPA. Dept. of Accounting. Title:. Account Mgr.. PO Box 567. Firm:. Jones Inc.. Biron, WI. Department:. Dept. of Accounting. 54494. Locality:. Biron. Region:. WI. Primary Number:. 567. Postal Code:. 54494. Standardizing Data Even the most stringent data entry controls can permit data with an unacceptable level of variability to enter your data store. To make your records more consistent, you can use data quality to standardize customer and operational data. Formats, abbreviations, acronyms, punctuation, greetings, casing, order, and pattern matching are examples of elements you can control. Input Record. Output Fields Purchase order:. 34-5654. Purchase date:. 03-01-2002. Purchase date: 3–01–02. Part number:. R2-0113-245500. Part number: R20113245500. Diameter:. 3". Length:. 52'. Purchase order: PO345654. Part desc: 3” diameter, 1 @ 52 ft Cleansing Data. 2011. © 2011 SAP AG. All rights reserved.. 31.
(42) Unit 2: Using Data Quality Transforms. BODS30. Data with incorrect elements is known as “dirty data”. Cleansing dirty data involves correcting it and adding missing elements like adding directions and primary types to addresses. Cleansing can occur on a wide variety of data. Depending on the type of data, you can remove or correct incorrect data using sophisticated algorithms and rules in conjunction with referential data. For example, you can use address information obtained from a national postal service to correct address data. Input Record. Mark Kessler 117–138 St W Harlem NY 10030. Output Records Name:. Mark Kessler. Address:. 117 Odell Clark Pl. Locality:. New York. Region:. NY. Postal Code:. 10030-2342. Enhancing Data Data can increase in value when additional related data is attached to it. Data Quality can: •. •. •. •. 32. Provide full international postal codes. For example, ZIP+4 can be appended to the primary five-digit postal code for US addresses. When these additional values are used on mailing labels, they assist postal services to route the mail more quickly and efficiently. Append other valuable information obtained from the directory services. For example, you can attach assignment, delivery, quality, geocensus, and geospatial codes and information. In addition, you can append match name standards, gender codes, and customized greetings to also increase the value of that record. Assign longitude and latitude information to records. By assigning latitude and longitude information to records with a physical mailing address, your data immediately becomes more valuable. Address interpolation allows you to look at trends in your data based on these geographic demographics. Marketing initiatives by geography (for example, targeted marketing, predictive modeling) become immediately more feasible. Append other geospatial information used for census, marketing, and insurance. Information codes can be appended for the following: US Census Bureau Minor Civil Division (MCD) or Census County Division (CCD) codes; Federal Information Processing Standard (FIPS) place, state, and country codes; US Census Tract and Block Numbering Area (BNA) codes; Core–Based Statistical. © 2011 SAP AG. All rights reserved.. 2011.
(43) BODS30. Lesson: Data Quality Transforms. Area (CBSA) codes; Metropolitan Statistics Area codes (MSA); and Section codes for US Census information. Some of these codes can be used to match to demographic coding databases for census purposes, insurance rating territories, and so on for US address data. The record below is an example of geocoding information that has been appended to a record containing an address: Input Record. Appended Information Address Latitude:. 40.723175. Address Longitude:. -74.004970. Centroid Latitude:. 40.723195. Centroid Longitude:. -74.004977. Match quality:. Highest quality address. 101 Avenue of the Americas. FIPS State Code:. 36 New York. FIPS County Code:. 061 New York. New York, NY 10013–1933 (222) 922–9922. FIPS Place Code:. 51000 New York. MCD Code:. 44919. BSA:. 35620. Metro code:. 5600. Section Code:. 0051001012. Margaret Smith–Kline, Ph.D. Future Electronics. A number of enhancements are available for your data: Status and Error Coding Status and error codes indicate what has changed or is incorrect with an address. They enable you to identify issues with the data and determine the best way to resolve them. Enhanced Line of Travel (eLOT) Enhanced Line of Travel (eLOT) is available for the USA Regulatory Address Cleanse transform only. It is used for sorting to achieve mailing discounts. The original line of travel (LOT) narrowed the mail carrier's delivery route to the block face level (ZIP+4 level) by discerning whether an address resided on the odd or even side of a street or thoroughfare. eLOT narrows the mail carrier's delivery route walk sequence to the house (delivery point) level. This allows mailings to be sorted on a more precise level. Geocoding. 2011. © 2011 SAP AG. All rights reserved.. 33.
(44) Unit 2: Using Data Quality Transforms. BODS30. Geocodes provide latitude and longitude processing to a roof or block level. Geocoding assigns this information to records with a physical mailing address. It enables you to look at trends in your data based on these geographic demographics. Delivery Point Validation (DPV) Delivery Point Validation (DPV) determines if the address is deliverable. With DPV, you can identify addresses that are undeliverable as addressed and whether an address is a Commercial Mail Receiving Agency (CMRA). DPV can be useful in the following areas: • • • •. Mailing: Screen out undeliverable-as-addressed (UAA) mail and reduce mailing costs. Information quality: Verify an address down to the individual house, suite, or apartment rather than block face to increase the data's level of accuracy. Increased assignment rate: Use DPV tiebreak mode to resolve a tie when other tie-breaking methods are not conclusive may increase assignment rates. Preventing mail-order-fraud: Verify valid delivery addresses and Commercial Mail Receiving Agencies (CMRA). Eliminate shipping of merchandise to individuals who place fraudulent orders.. LACS Link (Rural Route Conversion) LACS Link is available for USA records with the USA Regulatory Address Cleanse transform only and is required for CASS certification. It converts emergency addresses. LACS Link updates addresses when the physical address does not move but the address has changed, such as when the municipality changes rural route addresses to street–name addresses. Rural route conversions make it easier for police, fire, ambulance, and postal personnel to locate a rural address. LACS Link also converts addresses when streets are renamed or post office boxes renumbered. Residential Delivery Indicator (RDI) The Residential Delivery Indicator (RDI) feature is available in the USA Regulatory Address Cleanse transform. RDI determines whether a given address is for a residence. This is because, unlike the United States Postal Service (USPS), some delivery services charge more for residential deliveries. You can use RDI if you are processing your data for CASS certification or if you are processing in a noncertified mode. In addition, RDI does not require that you use DPV processing. Matching and Consolidating Data. 34. © 2011 SAP AG. All rights reserved.. 2011.
(45) BODS30. Lesson: Data Quality Transforms. Matching is used once the data has been parsed, standardized, cleansed, and enhanced to determine whether records refer to the same entity, such as a customer. Once matches are identified, consolidation salvages data from records in a match set to create a “best” record or post data to all the records in a match set based on the priority of the records. Input Record Ms Margaret Smith–Kline Ph.D. Future Electronics. Consolidated Record Name:. Ms. Margaret Smith-Kline Ph.D.. Company name:. Future Electronics Co. LLC. SSN:. 001-12-4367. Hire date:. 5/23/2003. Address:. 101 Avenue of the Americas, New York NY 10013-1933. Phone:. (222) 922-9922. Email:. maggie.kline@future_electronics.com. 101 Avenue of the Americas New York NY 10013–1933 maggie.kline@future_electronics.com May 23, 2003 Maggie Smith Future Electronics Co. LLC 101 6th Ave. Manhattan, NY 10012 maggie.kline@future_electronics.com 001–12–4367 Ms. Peg Kline Future Elect. Co. 101 6th Ave.. 2011. © 2011 SAP AG. All rights reserved.. 35.
(46) Unit 2: Using Data Quality Transforms. BODS30. New York NY 10013 01–12–4367 (222) 922–9922 5/23/03. Describing the Components that Support Data Quality Data Quality transforms are supported by several unique components in the architecture of Data Services. Country–specific engines process Australian, Canadian, Japanese, and USA address data. An EMEA engine processes European data, and the Global Address engine processes the additional ISO countries. The Address Server is used to process European addresses using the Global Address Cleanse transform. It provides access to detailed address line information for most European countries. Note that the Address Server is resource-intensive and should be disabled until required. It takes time to initialize, which may impact performance when it is first started. Cleansing Packages are one of the four types of repositories. They enhance the ability of Data Services to accurately process various forms of global data by including language-specific reference data and parsing rules. Cleansing packages are used in conjunction with rule files to identify and parse name, title, and firm data. They contain entries for words and phrases. Each entry specifies how the word or phrase might be used. You can tailor a cleansing package to better suit your needs. Directories are used to define how to cleanse address data. Optional USA directories are available for specialized address data. The US Postal Service requires the software disable itself when a user attempts to use expired US directories.. Describing Data Quality Transforms Data Quality transforms are used to improve the quality of your data. They are used to parse, standardize, cleanse, enhance, match, and consolidate data. The following Data Quality transforms are available on the Transforms tab of the Local Object Library:. 36. © 2011 SAP AG. All rights reserved.. 2011.
(47) BODS30. Lesson: Data Quality Transforms. Transform. Description. Associate. Works downstream from Match transforms to provide a way to combine, or associate, the match results by using the Match transform-generated Group Number fields.. Country ID. Parses input data and then identifies the country of destination for each record.. Data Cleanse. Identifies, parses, standardizes, and enhances name, title, and firm data, phone numbers, Social Security numbers, dates, and e-mail addresses. Assigns gender, adds prenames, generates Match standards, and creates personlized greetings. Can also be used for international, operational, and product data.. Global Address Cleanse. Identifies, parses, validates, and corrects global address data, such as primary number, primary name, primary type, directional, secondary identifier, and secondary number.. Global Suggestion List. Offers a way to complete and populate addresses with minimal data, and can make suggestions for possible matches. Is ideal for use in call center and other transactional environments to cleanse and standardize data at the point of entry.. Match. Matches data based on user-defined business rules.. USA Regulatory Address Cleanse. Identifies, parses, validates, and corrects USA address data according to the USA Coding Accuracy Support System (CASS). Outputs many useful codes and creates USPS Form 3553. Can also run in non-certified mode.. User Defined. 2011. Provides custom processing in a data flow using full Python scripting. © 2011 SAP AG. All rights reserved.. 37.
(48) Unit 2: Using Data Quality Transforms. BODS30. language.User–Defined transforms should only be used for those tasks that cannot be accomplished with other transforms or with the Data Services scripting language in the Query transform. Note: The User–Defined transform lacks scalability and performance compared to other transforms and Data Services scripting language in the Query transform. Use the User–Defined transform for functions that extend beyond these tasks.. Using Transform Configurations Transform configurations are customized versions of base Data Quality transforms. They use preconfigured input fields, output fields, and options that include best practices and can be used in multiple data flows. Once you place an instance of the transform configuration in a data flow, you can override the preset defaults. You can also create your own transform configurations, either by replicating an existing transform configuration or creating a new one from scratch. Note: Transform configurations are only available for Data Quality transforms as part of the Data Services platform. This feature does not apply to Data Integrator or Platform transforms.. Using the Transform Editor When you are working with Data Quality transforms, the parameters area of the Transform Editor provides access to some unique functionality. For each tab, you can choose to filter to show all available options, only those that are currently in use, or those that are designated as "best practice" for that particular transform configuration. Note: When you create your own transform configurations, you can specify which fields and options are best practice to take advantage of these filters.. 38. © 2011 SAP AG. All rights reserved.. 2011.
(49) BODS30. Lesson: Data Quality Transforms. Tab. Description. Input tab. The Input tab displays the available field names that are recognized by the transform. Mapping input fields to field names that the transform recognizes tells the transform how to process that field.. Options tab. The Options tab contains business rules that determine how the transform processes your data. Each transform has a different set of available options. If you change an option value from its default value, a green triangle appears next to the option name to indicate that you made an override.. Output tab. The Output tab displays the field names that can be output by the transform. Data cleansing transforms can generate fields in addition to the input fields that transform processes, so that you can output many fields.. Configuring Data Quality Transforms The primary difference configuring Data Quality transforms versus configuring platform transforms is that Data Quality transforms have passthrough columns versus columns that are used by the transform as is the case with Platform transforms. You can map passthrough, input, and output columns in the transform editor by dragging the columns from one pane to another. Passthrough columns are not affected by the transform, but are available to other transforms downstream. To map a passthrough column, click and drag it from the input schema area to the output schema area. Input columns are used by the transform to generate parsed, standardized, and cleansed output. To map an input column, click and drag it from the input schema area to the appropriate field in the Input tab. Columns for which a mapping type have been identified are mapped automatically. Output columns contain parsed, standardized, and cleansed data from the transform. To map a column to the output scheme, select the check box on the Output tab. To replace an existing field with the new data, drag the column from the Output tab to the column in the output schema area and select Remap Column from the popup menu.. 2011. © 2011 SAP AG. All rights reserved.. 39.
(50) Unit 2: Using Data Quality Transforms. 40. © 2011 SAP AG. All rights reserved.. BODS30. 2011.
(51) BODS30. Lesson: Data Quality Transforms. Exercise 1: Create Source and Target Datastores Exercise Objectives After completing this exercise, you will be able to: • Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR Data Mart and Omega databases. Business Example You have been hired as a Data Services designer for Alpha Acquisitions. Alpha has recently acquired Beta Businesses, an organization that develops and sells software products and related services. In an effort to consolidate and organize the data, and simplify the reporting process for the growing company, the Omega data warehouse is being constructed to merge the data for both organizations, and a separate data mart is being developed for reporting on Human Resources data. You also have access to a database for staging purposes called Delta. To start the development process, you must create datastores and import the metadata for all of these data sources.. Task: Create Datastores and import metadata for the Beta Businesses and Delta databases.. 2011. 1.. Log in to the Data Services Designer.. 2.. In your Local Object Library, create a new source Datastore for the BETA Businesses database.. 3.. In your Local Object Library, create a new Datastore for the DQSTANDALONE database.. 4.. In your Local Object Library, create a new Datastore for the DELTA staging database.. © 2011 SAP AG. All rights reserved.. 41.
(52) Unit 2: Using Data Quality Transforms. BODS30. Solution 1: Create Source and Target Datastores Task: Create Datastores and import metadata for the Beta Businesses and Delta databases. 1.. 2.. Log in to the Data Services Designer. a). From the Windows Terminal Server (WTS) training environment desktop, choose Start → Programs → SAP Business Objects Data Services 4.0 SP1 → Data Services Designer.. b). In the dialog box, enter your assigned User ID.. c). Enter your password, which is the same as your User ID.. d). Choose the Log on button.. e). In the list of repositories, select your repository DSREPO##, where ## is the number portion of your User ID.. f). Choose OK.. In your Local Object Library, create a new source Datastore for the BETA Businesses database. a). In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.. b). In the resulting dialog box, use the options: Field. Value. Datastore name. BETA. Datastore type. Database. Database type. Microsoft SQL Server. Database version. Microsoft SQL Server 2005. Database server name. Supplied by the Instructor. Database name. BETA. User name. sourceuser (lower case). Password. sourcepass (lower case). Continued on next page. 42. © 2011 SAP AG. All rights reserved.. 2011.
(53) BODS30. Lesson: Data Quality Transforms. Choose OK to finish. c). In the Local Object Library, right-click the BETA datastore to select the Open option from the context menu. The tables from the BETA database are displayed on the Metadata window on the right.. d). Import the metadata for the BETA database source tables by selecting all the tables, right-click on them and from the menu, choose the option Import. • • • • • • • • • • •. source.addrcodes source.categories source.country source.customer source.employees source.orderdetails source.orders source.products source.region source.suppliers source.usa_customers Note: This database is a read-only database.. e). 3.. View the data for the USA_Customers table and confirm that Jane Hartley from Planview Inc. is the first customer record by right-clicking on the table in the Local Object Library and choose the option View data.. In your Local Object Library, create a new Datastore for the DQSTANDALONE database. a). In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.. b). In the resulting dialog box, use the options: Field. Value. Datastore name. DQSTANDALONE. Datastore type. Database. Continued on next page. 2011. © 2011 SAP AG. All rights reserved.. 43.
(54) Unit 2: Using Data Quality Transforms. BODS30. Database type. Microsoft SQL Server. Database version. Microsoft SQL Server 2005. Database server name. Supplied by the Instructor. Database name. DQSTANDALONE. User name. sourceuser (lower case). Password. sourcepass (lower case). Choose OK to finish. c). In the Local Object Library, right-click the DQSTANDALONE datastore to select the Open option from the context menu. The tables from the DQSTANDALONE database are displayed on the Metadata window on the right.. d). Import the metadata for the DQSTANDALONE database source tables by selecting all the tables, right-click on them and from the menu, choose the option Import. • • • • • • •. AddressCleansed_USA DataCleansed_USA DataCleansed2_USA Match_Target Match_Target2 Match_Target3 Match_Target4 Note: This database is a read-only database.. 4.. In your Local Object Library, create a new Datastore for the DELTA staging database. a). In the Local Object Library, select the Datastores tab and right-click in the Datastores workspace to select the New option from the context menu.. b). In the resulting dialog box, use the options: Field. Value. Datastore name. DELTA. Continued on next page. 44. © 2011 SAP AG. All rights reserved.. 2011.
(55) BODS30. Lesson: Data Quality Transforms. Datastore type. Database. Database type. Microsoft SQL Server. Database version. Microsoft SQL Server 2005. Database server name. Supplied by the Instructor. Database name. DELTA## (where ## is the group number assigned by your instructor). User name. student## or consult your instructor. Password. student## or consult your instructor. Choose OK to finish. c). You do not have to import any metadata. Note: This database is a read/write database and will be used as the target for your template tables.. 2011. © 2011 SAP AG. All rights reserved.. 45.
(56) Unit 2: Using Data Quality Transforms. BODS30. Lesson Summary You should now be able to: • Define Data Quality processes • Use Data Quality transforms • Configure Data Quality transforms. 46. © 2011 SAP AG. All rights reserved.. 2011.
(57) BODS30. Unit Summary. Unit Summary You should now be able to: • Define Data Quality processes • Use Data Quality transforms • Configure Data Quality transforms. 2011. © 2011 SAP AG. All rights reserved.. 47.
(58) Unit Summary. 48. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(59)
(60)
(61) Unit Summary. 49. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(62) Unit Summary. 50. BODS30. © 2011 SAP AG. All rights reserved.. 2011.
(63) Unit 3 Using Address Cleanse Transforms Unit Overview This lesson describes the address cleansing capabilities of Data Services. After completing this lesson, you will be able to: •. Describe the Address Cleanse transforms. Unit Objectives After completing this unit, you will be able to: • • • •. Prepare your input data for Address Cleanse transforms Use the Address Cleanse transforms Work with global address data Work with transactional address data. Unit Contents Lesson: Using the Address Cleanse Transforms ............................... 52 Exercise 2: Use the Address Cleanse Transform ........................... 73. 2011. © 2011 SAP AG. All rights reserved.. 51.
(64) Unit 3: Using Address Cleanse Transforms. BODS30. Lesson: Using the Address Cleanse Transforms Lesson Overview The Address Cleanse transforms parse, correct, complete, and standardize address components. Address Cleanse contains engines to process Australian, Canadian, Japanese, and USA address data. An EMEA engine processes European data, and the Global Address engine processes the additional International Organization for Standardization (ISO) countries. For example, postal codes can be corrected and added for some countries with the Global Address Cleanse and USA Regulatory Cleanse transforms.. Lesson Objectives After completing this lesson, you will be able to: • • • •. Prepare your input data for Address Cleanse transforms Use the Address Cleanse transforms Work with global address data Work with transactional address data. Business Example Mail, which is undeliverable generates a considerable cost to the mailer: wasted effort, wasted materials, and wasted time. Two of the major reasons for undeliverable mail are incorrect addresses and mail recipients moving to a new address. Incorrect addresses are the result of missing apartment or suite numbers, incorrect or missing street numbers, missing box numbers, incorrect street names and incorrect recipients. On an average, in one year in the United States, there are 46 million people move and over 1,000,000 new addresses are created. According to the United States Postal Service (USPS) National Postal Forum dated, May 17 – 20, 2009, the reason “Customer has Moved” is responsible for over 75% of undeliverable mail. Consider one mailing of 100,000 addresses sending at full rate. The postage for a 1 ounce piece of mail is 44¢. The average automation discount is 5¢. A customer would save $5,000 in postal service discounts with address cleansing. If a customer also deduplicates the list and the list has 2% duplicates, they could save an additional $780 on postage. If the piece costs 50¢ to create, the customer could save an additional $1000 in creation/printing costs. Imagine if this were a monthly or weekly billing!. 52. © 2011 SAP AG. All rights reserved.. 2011.
(65) BODS30. Lesson: Using the Address Cleanse Transforms. Understanding the Business Need for Address Cleanse Transforms. Figure 19: Why Cleanse Addresses?. There are two primary reasons for cleansing your address data: 1.. 2.. You can reduce costs from misdirected mail. Compliance with a defined set of standards can enable your organization to attain certification that allows you to take advantage of discounts offered by many national postal organizations, such as the United States Postal Service. You can properly prepare your data for matching. When comparing two addresses to see if they match, the smallest data entry error could keep you from finding matches. You must have standardized data before you look for matches. Address cleansing can parse an address down into its smallest component parts (for example, separating the house number from the street name) so that your matching process can compare individual items and improve match results.. Figure 20: Standardization: USPS Address Preferences. The various address components need to be parsed before address cleansing can take place. For example:. 2011. © 2011 SAP AG. All rights reserved.. 53.
(66) Unit 3: Using Address Cleanse Transforms. BODS30. Figure 21: Parsing Address Components. Address cleansing tasks include: •. •. •. •. •. Parsing address components (such as locality, region, and postal code) and generating discrete fields for improved match results. The level of parsing and correction varies by country, and the correction depends on the amount of input address information available in the input record. Standardizing address fields based on business rule options set in the transform, correcting the format of address components, and changing the way the address line looks. For example, Data Services can add or strip punctuation and abbreviate or spell out address components. Verifying that locality, region, and postal codes agree with one another. If a record contains only a locality and region, Data Services can add the postal code and vice versa, depending on the country. Appending other information, such as address quality measurement in regard to whether region and postal code agree with the corrected address (for USA, Canada, and Australia) and demographic information (if the GeoCensus Option is purchased). Identifying any undeliverable addresses, such as vacant lots and condemned buildings (USA records only).. • •. 54. Assigning diagnostic codes to help determine why addresses are not assigned or how they are corrected. Generating address-related reports.. © 2011 SAP AG. All rights reserved.. 2011.
(67) BODS30. Lesson: Using the Address Cleanse Transforms. Preparing your Input Data for Address Cleanse Transforms. Figure 22: Preparing Input Data. You can cleanse address data that is stored in discrete, multiline, and multiline hybrid address line formats. Before setting up the cleansing process, you must identify the format of your data. The Address Cleanse transform accepts discrete, multiline, and multiline hybrid address line formats. The discrete format is a database record format in which pieces of data appear in exactly the same field arrangement in every record. For this reason, this option is the best performing format.. Figure 23: Discrete Line Format. Note: The discrete line formats are not recommended for processing multi–country databases. They are more appropriate for use with USA, Canadian, and Australian addresses.. 2011. © 2011 SAP AG. All rights reserved.. 55.
(68) Unit 3: Using Address Cleanse Transforms. BODS30. Fields. Address1. Address2. Address3. Firm. Micro Elektronic Ges.m.b.H. Address1. Baumschulengasse 250. Nybrovej 120. 225 rue de Bragance. Locality1. Wein. Lyngby. Luxembourg. Postcode. 1010. 2800. 1449. Country. Germany. Denmark. Luxembourg. Dupong S.A.. Region1. The multiline format is a database record format in which address data is not consistently located in the same arrangement in all records. That is, data items float among fields. For example, field one can contain the name for one record, the company for another, and the address for another.. Figure 24: Multiline Format. 56. © 2011 SAP AG. All rights reserved.. 2011.
(69) BODS30. Lesson: Using the Address Cleanse Transforms. Fields. Address1. Multiline1. Lawrence Robinson. Multiline2. Information Services Plc. Gran Via de Carles S.A. Multiline3. Cornwall House. Ibarra Campillo. Multiline4. 56 Henley Road. Multiline5. Address2. Peter Grant. United Car Rental, Ltd 48010 Bilbao. Multiline6. London. Multiline7. NW2Y 5DM. 3150 S. Larkin Rd., Suite 10. Multiline8 Country. Address3. Etobicoke ON M9C4T9 United Kingdom. Spain. Canada. The multiline hybrid format is a record format that contains multiline and discrete fields.. Figure 25: Multiline Hybrid Format. 2011. © 2011 SAP AG. All rights reserved.. 57.
(70) Unit 3: Using Address Cleanse Transforms. Fields. Address1. Multiline1. Jonas Lancaster. Multiline2. Woodworks Ltd. Multiline3. Wetherby House. Multiline4. 20 Oakley Road. Multiline5. BODS30. Address2. Address3 Akzo Chemicals. Gran Via de Carles S.A. Moreelsepk 24 Ibarra Cambillo 16-3-A. Multiline6 Multiline7 Locality1. Bilbao. Locality2. Harlow. Region1. Essex. Postcode. CM19 5AE. 48010. Lastline Country. 3511 EP Utrecht United Kingdom. Spain. Netherlands. Defining the Address Cleanse Transforms The Address Cleanse transforms identify, parse, validate, standardize, and correct address data. An extensive number of new fields are generated by the Address Cleanse transform. Address type, primary type, primary number, unit number, locality, region, and postal code are just a few of the output fields that are commonly used. If you perform both address cleansing and data cleansing, the Address Cleanse transform should be placed before the Data Cleanse transform in the data flow. The are four available Address Cleanse transforms:. 58. © 2011 SAP AG. All rights reserved.. 2011.
(71) BODS30. Lesson: Using the Address Cleanse Transforms. Transform. Description. Global Address Cleanse. Cleanses your address data from any of the supported countries (not for US certification). You must use one or more of the Global Address Cleanse transform configurations Australia (AMAS), Canada (SERP), Japan, multicountry, or USA. With this transform, you can create a Statement of Address Accuracy Report (SERP) as well as an Australian Address Matching Processing report (AMAS) report.. USA Regulatory Address Cleanse. Cleanses your US address data for certification. You can also use any of the add-on options: Delivery Point Validation (DPV), Enhanced Line of Travel (eLOT), Early Warning System (EWS), Locatable Address Conversion system (LACSLink), Residential Delivery Indicator (RDI), suggestion lists (not for certification), or Z4Change. With this transform, you can create a USPS Form 3553.. Global Suggestion List. Offers suggestions for possible address matches for your global address data. This transform is used for transactional processing and does not standardize addresses. In a project, use a Country ID transform before this transform.. Country ID. Identifies the country of destination for the record and outputs an ISO code. Use this transform before the Global Suggestion Lists transform (it is not necessary before the Global Address Cleanse transform or the USA Regulatory Address Cleanse transform).. Note: This course focuses only on the Global Address Cleanse and USA Regulatory Address Cleanse transforms.. 2011. © 2011 SAP AG. All rights reserved.. 59.
(72) Unit 3: Using Address Cleanse Transforms. BODS30. Describing the Global Address Transform The Global Address Cleanse transform cleanses your address data from any supported countries. You must use one or more of the Global Address Cleanse transform configurations to accomplish certification for Australia (AMAS), Canada (SERP), Japan, or multi-country records. The Global Address Cleanse transform does not support CASS certification or produce a USPS Form 3553. If you want to certify your USA address data, you must use the USA Regulatory Address Cleanse transform, which supports CASS. Input fields There are recognized Data Services input fields that you can use in the Global Address Cleanse transform. Input fields may vary based on the engine(s) that you enable: • • • • • •. Australia (A) Canada ( C) Japan (J) EMEAU (E) Global Address (G) USA (U). See the fields listed on the Input tab in the transform editor to view each field's properties. Input fields are used by the engine to understand the data elements anticipated. For example, locality1 informs the engine what column contains the city or locality information. Options The Global Address Cleanse transform offers several option groups. Report and Analysis options The Generate Report Data option specifies whether to generate report data for this transform: • •. Yes: Generates report data for this transform. No: Turns off report data generation.. If you do not need to generate reports (during testing, for example), you should set this option to No to improve performance. Reference files Reference files are directories required by the Global Address Cleanse transform to process your data. Set the location for the individual countries based on whether the engine is activated in your job.. 60. © 2011 SAP AG. All rights reserved.. 2011.
(73) BODS30. Lesson: Using the Address Cleanse Transforms. Use this substitution variable if you have installed the directories in the default reference data directory of the install. The substitution variable for reference files is $$RefFilesAddressCleanse. Use this substitution variable if you have accepted the default installation directory when you installed Data Services. If you installed to a location that was not the default location, you can change the substitution variable dynamically. Country ID Options Specifies whether or not to use Country ID processing. This option group is required. Option. Description Specifies whether to always use the specified Country Name or to run Country ID processing. The options are available: •. • Country ID Mode. Constant: Assumes all of your input data is for the specified Country Name and does not run Country ID processing. Choose this option only if all of your data is from one country. This option may save processing time Assigned: Runs Country ID processing. Choose this option if the input data is from more than one country. For example, choose Country ID Mode if one of these scenarios applies: –. –. 2011. You use two or more of the engines and your input addresses contain country data. You use only one engine and your input source contains many addresses that cannot be processed by that engine. Any addresses that cannot be processed are not sent to. © 2011 SAP AG. All rights reserved.. 61.
Related documents
Working with the Mayor and Denver City Council to support several ordinances that better connect the homeless population with housing and services. What do you believe is the
technology becoming embedded in all facets of a business, perhaps the students at JCC should be given choices as to which path they take, application programming, mac
Organizing a tour by medium or style doesn't do the trick. There's too much and too li)le of everything. So how about a theme? Light. Basically, light is what art is made
- Initial planning for Multi -beam and Drop coring program is underway.. - Scoping underway for a 2D seismic program
This work shows again that high temperature hot gas emission spectroscopy is a very sensitive experimental method for the investigation of transitions between highly ex- cited
KALLIWODA | RESEARCH GmbH employees or other persons that were involved in the preparation of this document or related parties are possibly holders of instruments that are
We use Global Competitiveness data to measure degree of customer orientation of firms, level of buyer sophistication, market size, knowledge resources, ineffectiveness of
Although Gérard Genette mentioned the possibility of a collective narrator as a ver- sion of the narrator as witness only in a footnote of his Narrative discourse, narra- tive