• No results found

Integrated Data Collection System on business surveys in Statistics Portugal

N/A
N/A
Protected

Academic year: 2021

Share "Integrated Data Collection System on business surveys in Statistics Portugal"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Integrated Data Collection System on business surveys in Statistics Portugal

Paulo SARAIVA DOS SANTOS

Director, Data Collection Department, Statistics Portugal Carlos VALENTE

IS advisor, Data Collection Department, Statistics Portugal

Statistics Portugal started in 2004 an integrated and process driven approach aiming at re-engineering the production architecture, improving its efficiency and flexibility. A data collection department was created, and the need of increased integration, standardization and homogeneous processes and tools was considered essential for a new way of statistical production. A multidisciplinary working group was established, resulting in an Integrated Survey Management System, which covered firstly the business, and later the social surveys. Intrastat was the pioneer in 2008, offering relevance and critical mass for the project. In 2013 all of our surveys will be supported by this system. The benefits and lessons learned are substantial, and there are opportunities for further developments.

Key words: Business surveys; data collection; official statistics production; process integration;

respondent management; statistical production processes; survey management.

1. Introduction

Integrating information systems is one major concern among companies and institutions, being widely accepted as a critical success factor of any organization.

After an extensive internal discussion prior to 2005, Statistics Portugal has decided to re-engineer its production architecture from a traditional stovepipe approach to an integrated one. Business statistics were the first area to benefit from this new system, followed by social statistics.

(2)

2

This paper offers an historical view of this journey and lists the results achieved. It starts with a brief presentation of the context, explaining the drivers of these enormous changes. Afterwards, part 3 presents the Integrated Survey Management System, especially its components and features. Part 4 presents the benefits achieved, sharing some lessons learned. Finally, part 5 presents a view of the next envisaged developments.

2. Background and context

Statistics Portugal is the Portuguese central authority for the production of statistics. Its main task is to develop and supervise the national statistical system.

Survey’s data collection is a core function of Statistics Portugal, consuming around 40% of its annual budget and 30% of its human resources. A Data Collection department assures mainly the operation of statistical production phases of collection, processing and analysis of collected microdata, covering all business and social surveys. Data collection staff is spread all over the country (mainland and islands), especially in Lisbon, Oporto, Coimbra, Évora, and Faro, but under centralized system. The Autonomous regions of Madeira and Azores have their own authorities for the production of regional specific statistics, while being the data collection centres for those areas for Statistics Portugal, under common technical requirements and infrastructure.

Business surveys involve around 125.000 different companies, 99% of them are considered as small and medium enterprises. In 2011, 66 business surveys were carried out, and 640.000 self-completed questionnaires were collected.

Like many other National Statistical Institutes, Statistics Portugal produced statistics through a non- integrated organizational architecture until 2005, based on numerous parallel processes, domain by domain, according to a traditional stovepipe approach. This way of producing statistics was then considered inefficient and not flexible.

After a reflection and a reorganization process in 2004, a project to re-engineer the production architecture was undertaken based on an integrated and process driven approach aiming at improving its efficiency and flexibility. Consequently, a central data collection department was created, regional directorates were extinguished, and domain production departments have been merged into three units: economics, social and national accounts. Methods and information system were merged into one department. The current production architecture (simplified) is shown in Figure 1.

(3)

3

It was a remarkable challenge, considering the new distribution of resources, roles and responsibilities. The transition was carried out in a way that the statistical operations were not substantially affected, in spite of some resistances and other constraints.

With the support of the board of directors, a multidisciplinary working group was established, composed by representatives from information systems, methodology, statistical domain specialists and data

collection. The first objective was a very pragmatic one: to improve the electronic data collection for business surveys. [1]

In July 2005 the Internet service WebInq was launched, offering an easy and secure alternative to businesses to provide the survey data through electronic self-completed questionnaires. After six years with the WebInq, more than 85% of the business questionnaires are collected electronically.

A new approach of statistical production was considered essential, focusing a broad integration, and process and tools standardization. Thus, the mandate and composition of the multidisciplinary working group were broadened accordingly, focusing on designing an integrated architecture to support the production.

An internal reference model to describe the statistical business processes was used, called

“Statistical Production Process Manual” (SPPM) [2] which provided the group a basis to agree on standards terminology, aiding the discussions on developing the new integrated system. SPPM is still an internal tool to describe and define the set of statistics production processes.

Recently, the authors compared SPPM with the Generic Statistical Business Process Model [3]

(GSBPM) and mapped both phases and sub-processes, which was possible at these levels of description. Consequently, GSBPM will be the reference model to describe phases and sub- processes in this paper, requiring basic knowledge about is terms and definitions.

This effort resulted in an Integrated Survey Management System (SIGINQ), which covered firstly the business surveys, and later on the social surveys.

Choosing an appropriate survey candidate to be the pioneer was a very discussed topic. Some defended the benefits of choosing a low impact survey, adopting a step-by-step approach. Others argued that a complex survey should be chosen, in order to cover an extensive group of common

Figure 1: Production architecture of Statistics Portugal

(4)

4

features to be used by other surveys, which was the option adopted. Intrastat was the pioneer in 2008 offering relevance and critical mass for the project. SIGINQ and its components were designed and built to support this survey. Meanwhile, as the backbone of SIGINQ was ready and implemented for Intrastat, other surveys were progressively integrated.

From 2008 until now, the number of features supported by SIGINQ grew up continuously. Four years after its first version, SIGINQ covers 40 surveys, representing more than 74% of the business collected questionnaires. In 2013 all of our surveys will be supported by this system.

Concerning international recommendations, a particular reference should be mentioned. The Communication from the Commission to the European Parliament and the Council on the production method of EU statistics: a vision for the next decade [4] which offers a perspective for reforming the production method of European statistics. The authors recall the following statement from the document: “At the level of the National Statistical Institutes, statistics for specific domains are then no longer produced independently from each other; instead they are produced as integrated parts of comprehensive production systems (the so-called data warehouse approach) for clusters of statistics. These systems would be based on a common (technical) infrastructure, (…)”

The authors consider the efforts from Statistical Portugal its production architecture are compliant with the vision statement. As a clear commitment to integration approach, Statistics Portugal has the following explicit objective: “Modernize the integrated process of the production with innovative approaches, simplifications and good practices, straightening the quality of official statistics and optimizing the available and limited resources”.

3. Integrated Survey Management System

The SIGINQ aims at offering an integrated infrastructure to better support the statistical production and development in an efficient way, covering all the statistical operations (business and social) [5]. It unifies the main components into a comprehensive and interdependent system based on the architecture illustrated in Figure 2.

The system follows the basic production sub-processes collect, process, analyse and disseminate.

Statistical units registers and metainformation support the flow of the processes. A contact centre system offers the infrastructure to telephone interviews (for social surveys), and the support to data

Figure 2: Integrated Survey Management System architecture (Level 1)

(5)

5

providers. The next points describe these subsystems, especially the processes and systems that support the production of business statistics.

3.1 Statistical Units System

This system manages the registers and the surveys’ populations and samples, according to the type of unity. It updates the business statistical units register (FUE), using information from external sources and from the surveys. Furthermore, it offers the reference population to all surveys, assuring the history of the changes in order to provide comparability between series.

3.1.1 Statistical Unit Registers System

The registers aggregate all of statistical units, which provide the basis for the selection of the populations and samples for all surveys. Statistics Portugal has the following main registers:

• Business register (FUE), covering companies, institutions, local units, vehicles and periodicals;

• Household base: it is a register composed by a subset of the national household addresses, built from Census information. Is the base from where samples for social surveys are extracted;

• Agriculture base: it is a register composed by all farms. It is the base from where sampling frames and samples for agriculture surveys are extracted.

These registers include variables of identification, localization, and characterization of their statistical units. The register is updated to assure quality to the samples and surveys themselves are one of the most important sources for this purpose.

3.1.2 Population and Samples System (SIGUA)

This component aims at creating and maintaining a repository to store all of the reference population and the samples selected to the surveys. This is a key element to the Collect System.

Online processes to update the repository, managed centrally, are available. One internal collection agent can submit proposals to update some information about the statistical units. These proposals are analysed centrally by methods team, and information is updated, if considered reliable. Finally, the proponent receives the feedback of the proposal, i.e., if it was accepted or rejected, and why.

3.2 Collection System

(6)

6

The Collection System aims at feeding the Statistical Production Chain with microdata, and it is composed by three components: (1) Process Management; (2)

Questionnaires and Capture, and (3) Respondent Management.

The diagram of Figure 3 shows these three components, and the relationships.

3.2.1 Process Management

This component is responsible for the management and control of all data collection processes, including information about respondents and paradata. These processes are full supported by the Metadata System. Process Management is subdivided into three other components: (1) GPap (self- completed surveys), for business surveys; (2) GPie (interview surveys), for social surveys; and (3) SAGR, for agriculture surveys. These three different components have similar features and functions, but adapted to each kind of statistical unit. This component is a core component for Business Surveys, linked with Questionnaires and Capture (WebInq and WebReg), Respondent Management (GRESP), Business Register (FUE), transferring validated microdata to the Data Warehouse. Process Management supports the following GSBPM phases and sub-processes: (4.) Collect: (4.2) Set up collection; (4.3) Run collection; (4.4) Finalise collection; (5.) Process: (5.1) Integrate data; (5.2) Classify & Code; (5.3) Review, validate and edit; (5.8) Finalize data files

For instance, in the “Run collection” sub-process, “the collection itself is implemented, with the different collection instruments being used to collect the data. It includes the initial contact with providers and any subsequent follow-up or reminder actions.”[3]

Other important sub-process is “Review, validate and edit”, which “applies to collected microdata, and looks at each record to try to identify (and where necessary correct) potential problems, errors and discrepancies such as outliers, item non-response and miscoding. It is also referred to as input data validation. It is run iteratively, validating data against predefined edit rules, usually in a set order. It raises alerts for manual inspection and correction of the data” [3].

There is still a module in this sub- process, which identifies suspicious records based on a score method, which is a selective approach that considers the potential impact in the results. GPap provides common processes and functions to all

Figure 3: Components of the Statistical Units System

Figure 4: GPap modules (example)

(7)

7

surveys, as well as survey specific processes. Figure 4 shows one example of the set of GPap modules, showing the integration of common modules with survey specific ones. In fact, the diagram represents the list of options for one business survey.

3.2.2 Questionnaires and Data Capture

This system offers the data collection supports and IT solutions, each of them dedicated to a particular mode of collection: electronic, paper, telephone, face-to-face interview, etc. For business surveys, questionnaires and data capture have two components: WebInq and WebReg.

WebInq: Surveys in the Web

WebInq is the area of Statistics Portugal Portal where data providers can respond to surveys’

questionnaires via Internet webforms. Respondents can find information about the surveys and further information about other surveys where the statistical unit is included, the status of response, and also extensive information about the respondent and its response behaviour.

WebInq provides a very flexible solution to multiple relationships between the statistical units, the surveys and the respondents. One respondent can be evolved in one or more surveys and it can represent several statistical units. The same principle of multiple relationships is applied to the statistical units and, of course, the surveys.

WebReg is a clone of WebInq, aiming at supporting the internal data entry of the remaining paper questionnaires, been also a data editing tool.

WebInq and WebReg provide another key feature of SIGINQ: the Data Version Stack. The system storages and documents each data transformation of the data. It allows a complete audit of the

records of these transformations, since the beginning of the capture (first version), then when it is edited, until the final version, when the collection is closed, and the data is transferred to the Data Warehouse. This concept is illustrated in Figure 5.

3.2.3 Respondent Management

This component aims to maximize the relationship with the data provider and the respondent. This is achieved through a repository of all respondents, including information about the identification, localization, contacts, relationships and their collection behavior, (history of the collection activity, quality of the data provided, response timing, etc.,). This tool is very important when the processes

Figure 5: The Data Version Stack

(8)

8

are repeated regularly. Information is shared, namely with the provider support and mainly during the “Set up collection” sub-process (preparing the collection strategy and instruments).

3.3 Process and Analyse System

This system prepares the cleaning of data records, preparing for data analysis, the stage where statistics are produced, examined in detail and made ready for dissemination. As the "Process" and "Analyse" phases are iterative and parallel, the system assures the principle of “interdependence but no interference”

between both phases, which is represented in the Figure 6. Process and Analysis System supports the following GSBPM phases and sub-processes: (5.) Process: (5.4) Impute; (5.5) Derive new variables and statistical units; (5.6) Calculate weights; (5.7) Calculate aggregates; (5.8) Finalize data files; (6.) Analyse: (6.1) Prepare data outputs; (6.2) Validate outputs; (6.3) Scrutinize and explain; (6.4) Apply disclosure control; (6.5) Finalize outputs.

3.4 Metainformation System

This system is composed by several components, such as: Terminology and concepts; Statistical Classifications; Repository of questionnaires; Methods Documents; Variables and Questions.

Metainformation is closely interrelated with Collection, Process and Analyse, and Dissemination systems.

3.5 Contact Centre System (SICC)

As mentioned before, this system offers the infrastructure to telephone interviews (households’

surveys), telephone reminders and the support to data providers. It also facilitates the access to context information about data providers, respondents and surveys. From March 2012, SICC supports another component of Questionnaires and Capture System: Telephone Data Entry (TDE), which is a solution by which respondents can return their data using the keypad on their telephone.

Mapping SIGINQ to GSBPM

After describing the systems and their components, a summary of mapping SIGINQ and GSBPM model we can consider that the system assures mainly the “Collect”, “Process”, and “Analyse”

phases of GSBPM. In spite of that, we also consider that SIGINQ has some influence in all phases, except “Specify needs” and “Evaluate”.

Figure 6: Parallel Processing

(9)

9 SIGINQ building blocks

As conclusion of this description, Figure 7 shows the building blocks of a complete representation of SIGINQ. We can note that the system is specialised in three domains: (1) Business Surveys; (2) Social Surveys; and (3) Agriculture Surveys.

4. Benefits of an Integrated Approach

From our experience we can highlight following perceived benefits of this integrated approach:

• Creation of harmonized procedures and tools;

• Reduction of the application development cycle, using common components, avoiding duplication and simplifying the specifications;

• Assistance to develop a steady culture based on efficiency and innovation, considering the full in-house design and development approach;

• Reduction of the data collection cycle, specially the time to deliver statistical results;

• Optimization of the human recourses allocation, benefiting the value-added quality activities;

• Improvement of the quality of the information due to the availability of more tools concerning reviewing and validation;

• Reduction of respondent burden avoiding the duplication of the collected variables and offering easy and multiple ways to provide data;

• Reduction of production costs. We estimate that Statistics Portugal has reduced the total costs of business surveys in 23.3% (from 2006 to 2011; constant prices).

5. Lessons learned from using an integrated survey management system

As mentioned before, it has been a long journey to achieve the current stage of development. There were many successes, but also some drawbacks. Let’s list some of the lessons learned:

Figure 7: SIGINQ building blocks

(10)

10

• A full support from the top management is needed from the very beginning of the project;

• Establishing multi domains working groups with a single leader is essential;

• Starting with mapping processes using a well-known model is better than using an internal one.

It would have been better if the GSBPM was available at the beginning of our project;

• It is difficult and takes time to reach consensus internally about the expected benefits;

• The main issues to deal with are not the technical ones. Organizational and change management are the areas where key barriers areas and bottlenecks are found;

• Benefits from the approach better materialize if taken at the same time with the change of the organizational production architecture into a process-driven one.

6. The way forward

Statistics Portugal remains seriously engaged with the project and further developments are planned for the period 2013 – 2017:

• Complete the full integration of surveys within SIGINQ;

• Extend the coverage of WebInq to household statistics, providing CAWI surveys;

• Prepare the extension of the automatic data transfer to structural business surveys, as an alternative to self-completed questionnaires;

• Develop SIGINQ new features, such as the improvement of the analysis of paradata, and extension of score model to more surveys;

• Evaluate the adaptations needed to extend the use of the SIGINQ to other statistical authorities.

7. References

[1] Statistics Portugal (2004), Electronic Data Collection, Workgroup Report, internal document.

[2] Statistics Portugal (2010), Statistical Production Processes Manual, internal document.

[3] The Joint UNECE / Eurostat / OECD Work Session on Statistical Metadata (METIS) (2009), Generic Statistical Business Process Model (GSBPM), http://www.unece.org/stats/gsbpm.html .

(11)

11

[4] European Union (2009), Communication from the Commission to the European Parliament and the Council on the production method of EU statistics: a vision for the next decade,

[5] Statistics Portugal (2005), Statistical Production System: Architecture, Workgroup Report, internal document.

References

Related documents

Valeo has a streamlined organization comprising four Business Groups (Powertrain Systems, Thermal Systems, Comfort and Driving Assistance Systems, and Visibility Systems) and the

The twin concerns of my preceding reading, that the narrative form highlights the ways in which settler-colonial actors disrupt the lived experience of the Khanty people, and that

Therefore, many efforts have been devoted to solve most optimal Job Shop Scheduling Problems (JSSP), as most of the researches aimed at minimizing the maximum completion time. JSSP

/ L, and a different adsorbent mass, is shown in Figure 5. The results obtained show that the removal rate increases quickly at first, until a time of 60 min, and

Better Budgeting entails five techniques that can be used to overcome some of the limitations of traditional methods (Neely et al, 2003).. Activity Based Budgeting involves

Within indirect techniques two different vulnerability index based methodologies were used on the seismic vulnerability assessment of individual buildings regarding the assessment