Data management 2.1 Introduction
2.48 Statistical Data and Metadata eXchange (SDMX)
Core tasks of National Statistical Organisation are to collect, process and organise statistical data, and subsequently put them at the disposal of various communities of users, often termed as dissemination of the statistics. Obviously, some of the main obligations of NSOs are to make the necessary strategy decisions on what should be measured and how, and to manage and document the statistical system.
country, even within the same national organisation. This is often related to the statistics production being organised in so-called stove-pipes, or independent production lines. This makes it difficult to use statistics for different subjects in a coherent way, thus impairing the quality of statistics as seen from the user perspective. It also reduces efficiency in the production process.
To overcome these problems, there has been a strong tendency in NSOs towards standardisation and integration, breaking down stove-pipes. This leads to the creation of corporate statistical data warehouses, bringing together statistics on different subjects under one system. In this endeavour, the creation of statistical metadata plays an important part. The changes required towards such integrated systems are not only technical, but also organisational.
The Statistical Data and Metadata eXchange (SDMX) initiative was launched in 2001 by seven organisations working on statistics at the international level: the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat, the International Monetary Fund (IMF), the Organisation for Economic Co-operation and Development (OECD), the United Nations Statistical Division (UNSD) and the World Bank. These seven organisations act as the sponsors of SDMX. The stated aim of SDMX was to develop and use more efficient processes for exchange and sharing of statistical data and metadata among international organisations and their member countries.
To achieve this goal, SDMX provides standard formats for data and metadata, together with content guidelines and an IT architecture for exchange of data and metadata. Organisations are free to make use of whichever elements of SDMX are most appropriate in a given case. With the Internet and the world-wide web, the electronic exchange and sharing of data has become easier and more common, but the exchange has often taken place in an ad hoc manner using all kinds of formats and non-standard concepts. This creates the need for common standards and guidelines to enable more efficient processes for exchange and sharing of statistical data and metadata. As statistical data exchange takes place continuously, the gains to be realised from adopting common approaches are considerable both for data providers and data users.
SDMX aims to ensure that metadata always come along with the data, making the information immediately understandable and useful. For this reason, the SDMX standards and guidelines deal with both data and metadata. Common standards and guidelines followed by all players not only help to give easy access to statistical data, wherever these data may be and without demanding prior agreement between two partners, but they also facilitate access to metadata that make the data more comparable, more meaningful and generally more usable.
The SDMX standards are designed for exchange or sharing of statistical information between two or more partners. Evidently, the SDMX standards have been developed by the sponsoring organisations in order to accommodate their constituencies, which include national statistical offices, central banks, ministries and other bodies. Within and across these constituencies, the standards are intended for reporting or sharing statistical data and metadata in the most efficient way.
SDMX standards can also be used within a national system for transmitting or sharing statistical data and metadata. This is particularly interesting in countries with a federal structure or a fairly decentralised statistical system. In such cases, a close link can be established between the national system for data sharing and the international ones, allowing for additional efficiency gains for the involved organisations. The use of SDMX for data exchange can easily evolve towards open SDMX-based dissemination; such dissemination may respond well to user demands for well structured data and metadata in reusable formats, and should be considered as an option for national authorities as well as international organisations. It is also an interesting option for private data providers, such as re-sellers of statistical databases.
SDMX can also be used for data and metadata management within statistical
organisations, since its information model is applicable for much of the information stored and processed within statistical organisations, and such organisations can make use of the SDMX IT tools to reduce the costs of developing their data management systems.
The data (and related metadata) for a particular statistical domain are structured according to a "Data Structure Definition" (DSD, formerly known as a "key family"). The DSD describes the structure of a particular statistical data flow through a list of dimensions (for example: country, variable/topic, year), a list of "attributes" (for example, unit of measure) and their associated code lists. Attributes are metadata about an individual value, a time series or a group of time series.
SDMX also defines a model for additional explanatory metadata, which are often referred to in SDMX as reference metadata. Reference metadata are generally in a textual format, using concepts describing the content, methodology and quality of the data. The reference metadata for a particular statistical data flow, statistical domain or – if used homogeneously throughout a statistical institution - a particular statistical institution are structured according to a "Metadata Structure Definition" (MSD).
The data exchange process is represented in SDMX via the definition of ―data flows‖, ―data providers‖, and, in particular ―provision agreements‖. The latter describes the way in which data and metadata are provided by a data provider. Thus, a data provider can express the fact that it provides a particular data flow covering a specific set of countries and topics, with a particular publication schedule. India should be part of this programme.