African Commission on Agricultural Statistics Twenty-second Session
Addis Ababa, Ethiopia 30 Nov – 3 Dec, 2011
Use of the IHSN Microdata Management Toolkit to
Document Agricultural Census Data
Alemayehu Gebretsadik CSA
C
S
A
Introduction
Data collection, Documentation, Capturing, Archiving
and Dissemination in the Old Good Days
The Need for Improvement
Why
IHSN Microdata Management Toolkit?
IT Data Management Framework
Achievements through the Improvement Process
Documentation
Dissemination
Infrastructure
Achievements Summary
C
S
A
•The use of statistical data for better decision making has been recognized better than ever.
•Statistical offices across the world have been engaged in this statistical data production to address the ever growing demand of timely, reliable and well documented statistical information.
•The CSA conducts about 11 different types of surveys per annum and generates data through censuses usually undertaken in ten years
intervals
•Many advances has been made by improving the quality and usability of statistical data through improved data documentation, archiving and
dissemination system utilizing the Information Communication Technology (ICT)
C
S
A
•CSA has been engaged in utilizing the ICT to facilitate its data collection,
processing, archiving and dissemination system so that the required statistical information could be generated and reaches the users for better decision making.
• Addressing the importance of documentation, archiving and dissemination in the Ethiopian National Strategy for the Development of Statistics (NSDS) could be considered as a critical step in guarantying the sustainability of these activities
C
S
A
•D
ata collection nothing but paper based •The IBM series 12k CPU•HP 3000/Series 44 system processor unit with 1MB memory
•A Stand alone PC system was used until 2004 at the CSA and the resource sharing and efficient communication was a series problem.
• Utilization of 1.44 MB floppy diskettes was considered the efficient means of transferring files or documents among professionals.
• In addition, there was no any centralized management of the system which was hindering the data security and management system.
• Little attention was given for meta data documentation
• There was no integration between Micro data and meta data
Data collection, Documentation, Capturing, Archiving
and Dissemination in the Old Good Days
C
S
A
•Access to data was difficult and poor dissemination system
•Utilization of data was difficult because of poor
documentation
•No electronic method of dissemination
•Less consideration in utilizing international standards for
meta data documentation
•Archiving was done in a decentralized manner
=> User dissatisfaction
=> Difficulties in maintaining institutional memory
Data collection, Documentation, Capturing, Archiving
and Dissemination in the Old Good Days
C
S
A
•Central Statistical Agency priorities:
• Improve data collection, management and dissemination system • Effective use of ICT
• Better Profile for ICT and GIS activities in the Agencies Organizational Structure
• Initiated Central Databank Project
• Establishment of socio-economic database
• Deployment archiving and dissemination system •To ensure compliance with international standards:
• IHSN Microdata Management Toolkit (DDI and Dublin Core) • UNICEF DevInfo (SDMX, DDI, Dublin Core, ISO)
•FAO CountrySTAT
C
S
A
• Strengthening of ICT framework
• Take full advantage of new technologies for data management:
•GPS, Satellite Imageries, PDA, Scanners
•Internet, RDBMS, GIS, and electronic methods of data
archiving and dissemination
Provision of data also involves:
• Harmonizing and integrating statistical data,
• Filling the gap between data produced and data available,
• Laying down efficient ICT infrastructure,
• Improving the quality and comparability of data,
• Addressing the challenges of data and metadata exchange
• Adoption of standards available in data management
C
S
A
In general, improvement of the CSA’s internal ICT capacity focused on the following:
Look forward for new tools to improve data capturing
Development of an integrated Central Data Bank of survey and other data as well as creating Ethiopian Socio-Economic Database for basic indicators;
Development of database management systems;
Strengthening the Local Area Network (LAN) and
C
S
A
Development of a Wide Area Network (WAN) to connect branch offices;
Web Site Development;
CD_ROM publishing;
Comprehensive Program of Documentation of existing and new data especially related to socio-economic indicators;
Utilization of GIS for Geo referencing, spatial data analysis and other referencing of new and existing data.;
Appropriate training and capacity building;
C
S
A
Why
IHSN Microdata Management Toolkit?
• Statistical data production is an expensive exercise that requires a great deal of investment in terms of experts’ time and organization’s budget in the national statistical agencies
• The return of this investment is when the data generated is utilized well by data users
• However, if one can really observe the utilization of existing data visa-vise the investment on its production, it is really underutilized particularly in developing countries like Ethiopia
• In most of African countries for example, availing accurate and
timely statistical data for developing policy is equally challenging as much as underutilizing the already available data due to a very little emphasis given to documentation, archiving and dissemination
C
S
A
Why
IHSN Microdata Management Toolkit ?
• The CSA was not exceptional to this problem and it was too difficult to obtain survey data and related metadata once the CSA produced basic statistical reports
• Therefore, there was a great demand from the CSA’s side to work on improving the documentation, archiving and dissemination
system in order to address users’ dissatisfaction in availing the data as well as keep the institutional memory by having a centralized
system of a well documented information of both micro and meta data
• Accordingly, the CSA has worked very hard on this issue since 2004 and tried to address most of the problems associated through a very collaborative work with the International Household Survey Network (IHSN) of the World Bank and the Accelerated Data program of
C
S
A
Why
IHSN Microdata Management Toolkit?
• The assistance obtained from IHSN and the ADP includes provision of software (Toolkit, NADA) capacity building through on job
trainings
• The Agency has benefited greatly from the assistance obtained by IHSN/ADP as stated above coupled with a high commitment of the CSA as well as the financial and technical assistance obtained from the development partners
• Ethiopia is considered as one of the model country in documenting 98 of its surveys conducted since 1995
C
S
A
Framework Set to Improve Survey Documentation, Archiving and Dissemination System at the CSA• In order to improve the survey documentation, archiving and
dissemination system at the CSA, the IT based data management framework has been designed in accordance with international
metadata recommendations and best practices in data archiving to facilitate data dissemination and metadata exchange at the global level
C
S
A
IT Data Management Framework
DATA PRODUCTION Planning, collection, cleaning, processing Quality Control DATA ARCHIVING Conversion, Packaging, Confidentiality DATA DISSEMINATION Publish Traditional Media, Multimedia, Web ACCESS metadata, data &documentation + research tools
ANALYSIS Retrieve data for
analysis Online Analysis Technical Support
COLLABORATION Disseminate and share
knowledge, user-producer dialog, feedback HARMONIZATION Standard formats, comparability, multilingual
USERS / PRODUCERS NEEDS
HARDWARE Server, Workstations,
Laptops, CD/DVD, Scanners, Printers, Backup,
Tools
SOFTWARE OS, DBMS, Development,
Web, Multimedia, Office, Statistics, Security TELECOMS Intranet, Internet, Connectivity, LAN/WAN, Security SPECIALIZED SOFTWARE AND GUIDELINES Toolkit, DevInfo,WinISIS, DDP, Nesstar, CSPro, XML, IT CORE ARCHITECTURE PROJECT MANAGEMENT ICT UNIT TRAINING DATA TOOLS
C
S
A
IT Data Management Framework
• This IT based archiving and dissemination system is made possible by the establishment of a central databank to archive all
documentation and micro-data obtained from surveys and censuses by developing a user friendly system for its dissemination
• This includes specifications such as DDI, Dublin Core and SDMX and the use of tools like the International Household Survey
Network’s Microdata Management Toolkit and the UNICEF DevInfo package as well as the CountrySTAT system of FAO related to food and agricultural statistics.
C
S
A
IT Data Management Framework
Web Development and Dissemination Tools
(SQL server, web server, web development, DDP-Country)
WinISIS
Catalog of documents
Dublin Core compliant
XML format Data Dissemination Toolkit Documented datasets DDI compliant XML format DEVINFO Indicators database (SDMX compliant) SQL/OLAP format
Country Statistical Website
On-line access to indicators database,
Microdata in SPSS, STATA, SAS and other formats (optional), On-line analysis of microdata (Nesstar server; optional),
Searchable document catalog, with documents in PDF Other information (contacts, legislation, methods, etc.)
CD-ROMs CD-ROMs CD-ROMs
Data processing / analysis
CsPro STATA/SPSS
Excel
PCTrade, EpiInfo and other specialized software for data processing and analysis
Production of Document (reports, manuals, questionnaires, etc) MS-Word Excel
Web Development and Dissemination Tools
(SQL server, web server, web development, DDP-Country)
WinISIS
Catalog of documents
Dublin Core compliant
XML format Data Dissemination Toolkit Documented datasets DDI compliant XML format DEVINFO Indicators database (SDMX compliant) SQL/OLAP format
Country Statistical Website
On-line access to indicators database,
Microdata in SPSS, STATA, SAS and other formats (optional), On-line analysis of microdata (Nesstar server; optional),
Searchable document catalog, with documents in PDF Other information (contacts, legislation, methods, etc.)
CD-ROMs CD-ROMs CD-ROMs
Data processing / analysis
CsPro STATA/SPSS
Excel
PCTrade, EpiInfo and other specialized software for data processing and analysis
Production of Document (reports, manuals, questionnaires, etc) MS-Word Excel
C
S
A
Achievements through the Improvement Process
Documentation:
•A Central Databank has been established for the microdata which contains over 6000 data and documentation files covering 98 surveys
• 98 surveys have been archived using the IHSN Microdata Management Toolkit, making the metadata compliant with the Data Documentation
Initiative DDI-XML specifications, as recommended by the International Household Survey Network
•This process enabled the CSA to well document its surveys down to variable level and include all the related metadata.
C
S
A
Achievements through the Improvement Process
Dissemination:
The CSA Website (WWW.CSA.GOV.ET) as Pivot of the CSA Dissemination Strategy most importantly the CSA is using its website to:
•Disseminate national statistics and monthly CPI figures
•Provide access to all documentation and related metadata for all of its surveys archived in the central data bank and
•Serve as a portal for other access points for CSA’s data, like the
ETHIOINFO database, the ENADA system, CountrySTAT, and the Price database.
C
S
A
Achievements through the Improvement Process
•Given the CSA’s openness to technology in order to facilitate and extend the culture of analysis, the CSA became very interested in the simple and powerful tools that were developed by the IHSN and ADP
•One of the key elements of the IHSN/ADP platform for data
dissemination is providing researchers and policy makers with the innovative tools to facilitate their need for country information
•The web based tool called National Data Archive (NADA) provides this facility
•The CSA, recognizing the functionality of this system, utilizes the NADA platform as the Ethiopian National Data Archive (ENADA). The ENADA system will simplify the CSA’s data access through its cataloging
C
S
A
Achievements through the Improvement Process
•This system will allow access to the DDI file which provides information on the CSA’s survey and census metadata.
C
S
A
Achievements through the Improvement Process
C
S
A
Achievements through the Improvement Process
C
S
A
Achievements through the Improvement Process
C
S
A
Achievements through the Improvement Process
C
S
A
CD-ROM products available for these surveys and metadata
and documents published on the web
Achievements through the Improvement Process
C
S
A
The CSA also uses other dissemination
systems like:
Achievements through the Improvement Process
C
S
A
• The EthioInfo database containing main MDG indicators
is available on CD-ROMs and online;
Achievements through the Improvement Process
C
S
A
Data Archiving and Dissemination
Achievements through the Improvement Process
•
The price database has been developed and
C
S
A
Achievements through the Improvement Process
C
S
A
CSA has an official presence on the Internet
Improved Data Capturing tools
Compliance with international metadata standards
The Central Databank has created an institutional memory
The information is more secure
Single point of access for data and documentation
Micro and macro data and metadata on CD-ROMs and Internet;
Better user/producer support through improved ICT infrastructure;
Potential new activities: metadata, data and documentation quality
assurance, survey comparability and harmonization, statistical data disclosure control and confidentiality