• No results found

The Istat Methodology

Methodologies for Data Quality Measurement and Improvement

7.3 Comparative Analysis of General-purpose Methodologies

7.3.4 The Istat Methodology

The Istat methodology (see [74] and [73]) has been designed for Italian pub-lic administration. Specifically, it concerns address data of citizens and busi-nesses. Notwithstanding these limitations, it is characterized by a rich spec-trum of strategies and techniques that allow for its adaptation to many other domains. The principal reason for this is the complexity of the structure of the Italian public administration, as of many others, characterized typically by at least three tiers of agencies:

1. central agencies, located close to each other, usually in the capital city of a country;

2. peripheral agencies, corresponding to organizational structures dis-tributed thorough the territory, hierarchically dependant on central agen-cies;

3. local agencies, that are usually autonomous from central agencies, and cor-respond to districts, regions, provinces, municipalities, and other smaller administrative units. Sometimes they are functionally specialized, e.g., hospitals.

The above is an example of the organizational structure of public admin-istration; it has many variants in different countries. The common aspects to many administrative, organizational, and technological models concern

• their complexity, in terms of interrelations, processes, and services in which they are involved, due to the fragmentation of competencies among agen-cies. This frequently involves information flows exchanged between several agencies at the central and local level;

• their autonomy, which makes it difficult to enforce common rules; and

• the high heterogeneity of meanings and representations that characterize databases and data flows, and the high overlapping of usually heteroge-neous records and objects.

Improving DQ in such a complex structure is usually a very large and costly project, needing an activity that may last several years. In order to solve the most relevant issues related to data quality, in the Istat methodol-ogy attention is primarily focused on the most common type of data exchanged between agencies, namely, address data. When compared to previously exam-ined methodologies, this methodology is innovative since it addresses all the coordinates introduced in Section 7.1.2, specifically, data vs process-driven, and intraorganizational vs interorganizational. A synthetic description of the Istat methodology is shown in Figure 7.12, where the three main phases are represented, together with the information flows between them.

Phase 1 DQ assessment

Phase 2 Internal DQ improvement

Phase 3 Inter administrative

improvement Improvement activities

on local databases

Improvement activities on global databases, processes,

data flows Phase 1

DQ assessment

Phase 2 Internal DQ improvement

Phase 3 Inter administrative

improvement Improvement activities

on local databases

Improvement activities on global databases, processes,

data flows

Fig. 7.12. General view of the Istat methodology

The assessment made in Phase 1 identifies the most relevant activities to be performed in the improvement process. These activities are:

1. Phase 2, activities on databases locally owned by agencies under their responsibility. Tools are distributed for performing these types of activities autonomously, and courses are offered for learning more on DQ issues.

2. Phase 3, activities that concern the overall cooperative information sys-tems of administrations, in terms of exchanged data flows, and central databases set up for possible coordination purposes. These activities are centrally planned and coordinated.

7.3 Comparative Analysis of General-purpose Methodologies 179 1. Global assessment and improvemement

1.1 Global assessment

DQ Requirements analysis – Isolate from a general process analysis relevant qualities for address data: accuracy, completeness.

Find critical areas, using statistical techniques Choose a national database

Choose a representative sample Find critical areas

Find potential causes of errors

Communicate results of assessment to single agencies 1.2 Global improvement

Design improvement solutions on data

Perform record linkage between relevant national databases Establish a national data owner for specific fields

Design improvement solutions on processes – Use the results of the global assessment to decide specific interventionson processes

Choose tools and techniques – Make or buy, and adapt, tools for most relevant DQ activities to deliver to agencies

2. Internal DQ improvement (foreach agency, autonomous initiative) Design improvement solutions on processes

Standardize acquisition format

Standardize internal exchange format using XML Perform specific local assessments

Design improvement solutions on data and processes in critical areas

Use the results of the global assessment and local assessment to decide specific interventions on internal processes

Use the results of the global assessment and the acquired tools to decide specific interventions on data, e.g. perform record linkage between internal databases 3. DQ improvement of interadministrative flows

Standardize inter administrative flows format using XML

Redesign exchange flows, using a public and subscribe event-driven architecture 1. Global assessment and improvemement

1.1 Global assessment

DQ Requirements analysis – Isolate from a general process analysis relevant qualities for address data: accuracy, completeness.

Find critical areas, using statistical techniques Choose a national database

Choose a representative sample Find critical areas

Find potential causes of errors

Communicate results of assessment to single agencies 1.2 Global improvement

Design improvement solutions on data

Perform record linkage between relevant national databases Establish a national data owner for specific fields

Design improvement solutions on processes – Use the results of the global assessment to decide specific interventionson processes

Choose tools and techniques – Make or buy, and adapt, tools for most relevant DQ activities to deliver to agencies

2. Internal DQ improvement (foreach agency, autonomous initiative) Design improvement solutions on processes

Standardize acquisition format

Standardize internal exchange format using XML Perform specific local assessments

Design improvement solutions on data and processes in critical areas

Use the results of the global assessment and local assessment to decide specific interventions on internal processes

Use the results of the global assessment and the acquired tools to decide specific interventions on data, e.g. perform record linkage between internal databases 3. DQ improvement of interadministrative flows

Standardize inter administrative flows format using XML

Redesign exchange flows, using a public and subscribe event-driven architecture

Fig. 7.13. Detailed description of the Istat methodology

A more detailed description of the methodology is shown in Figure 7.13;

the innovative aspects concern

• the assessment phase, initially performed on central databases, with the goal of detecting a priori critical areas. For example, within addresses of some regions, such as New Mexico in the US or Alto Adige in Italy, the names of streets are bilingual or they have a different spellings in their original and official languages, leading to errors. In our example, the original languages are, respectively, Spanish and German, and the official languages are English and Italian. New Mexico and Alto Adige are potentially critical areas for the assessment phase;

• the application of a variety of simple but effective statistical techniques in quality measurement steps;

• the definition of data owners at a very detailed granularity level, corresponding to single attributes, such as MunicipalityCode and SocialSecurityNumber;

• the arrangement of tools and techniques for the most relevant cleaning activities produced and distributed to single agencies, assisting them in tailoring the activities to specific territorial or functional issues;

• the standardization of address data formats and their expression in a com-mon XML schema, implemented to minimize internal changes to agencies and to allow interoperability in flows between agencies;

• the redesign of exchanged data flows, using a publish and subscribe event-driven technological architecture, an example of which we will see in the case study at the end of the chapter.