• No results found

Using emerging technologies Getting the best results from paper based data capture

N/A
N/A
Protected

Academic year: 2021

Share "Using emerging technologies Getting the best results from paper based data capture"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Using emerging technologies

Getting the best results from paper based data capture

Andy Tye

1

& Mike Smethurst

2

DRS Data Services Ltd.

3

Introduction:

This paper reviews the tried and tested techniques of Census “Data collection/processing” via booklets/paper and also considers the latest emerging technology which is now available to census organisation to capture the data from paper based census forms. The authors are both senior managers with DRS Data Services Limited from the UK. DRS is a leading international supplier of scanning products and services. The company has particular expertise in census data capture and voter registration.

Although it is acknowledged that other non paper based data collection methods are being used in census, such as the use of Personal Digital Assistant (PDA), Internet, Telephone, etc., this paper specifically focuses on the collection and processing of Census data from the paper medium.

There are four methods of traditional data collection4 from booklet/paper (summarised below). New emerging technology is able to combine key features of these traditional techniques. This paper discusses some projects where these combined techniques are being used.

This paper makes the following assumptions: The paper based forms are designed for purpose. A Control procedure5 is in place for the receiving of reports from the field and an initial Quality procedure is undertaken. All batches should have a fully auditable batch control sheet associated with them.

The main methods of data capture from forms which can be considered:

1. Manual entry

This method requires an operator to key data directly into the computer from the physical Census form. In more sophisticated versions of this approach, the keying can be “computer assisted” where the operator selects a response from various options displayed to them on screen. Average data input rates vary from between 5,000 to 10,000 keystrokes per hour per operator5. Based on the Papua New Guinea two page census form in 20006 this equates to perhaps 10 - 20 forms an hour per operator.

Advantages: May employ local staff in large numbers. There will be a large number of PCs available for other uses

after the Census project has ended. Relatively Low software costs CSPro7 is license free and allows for direct data entry.

Disadvantages: Very large numbers of staff required both PC operators and IT managers. Quality control procedures

required such as Double keying or Sampling. . Motivation of staff: how to keep them entering at a reasonable speed throughout the project. Logistics and management of the whole process: volumes in census are very large; some census there can be several hundred tonnes of paper to process. The physical space required to install computer systems.

2. Key from image

This method involves initially scanning the census forms using a standard document scanner and then PC operators using the image produced by the scanner to then key the data. Quality control systems used in Manual entry can again be used along with the concept know as Seeding by DRS. Seeding is where an sample image is displayed to an operator and the results are already know for that image and the operators keyed data is compared with the expected and assessed accordingly (increasing the potential accuracy of the process).

Advantages: As manual entry. However this approach opens up the possibility of expanding the system to cope with

peak volumes, perhaps using specialist offshore agencies to do the keying on behalf of the census agency during peak volume periods. A digital archive of all completed Census forms at the end can be kept at the end of the exercise if this is thought useful

Disadvantages: Keying cannot be undertaken until forms have been scanned. There is a need for a relatively

sophisticated computer network and workflow in place in order to manage the keying process.

3. Optical Mark Reading (OMR)

Specially designed and printed forms are used, each form having a tick box or bubble response for each census question. They are usually scanned on special OMR scanners which recognise the significance of a mark on a given form and automatically and immediately generate accurate data output files.

Advantages: Very accurate and very high speed. Costs are predictable and defined. Realistic overall hourly data

capture processing speeds of the order of 4,000 OMR forms an hour can be reasonably expected in a live census project using specialist DRS OMR scanners.

Disadvantages: requires specially printed forms and special scanners, tick box response are not suited to all types of

(2)

4. Intelligent Character Recognition (ICR)

Forms are scanned and images are captured. The captured images are interpreted by ICR software which is able to recognise numbers and letters written in response boxes on the forms.

Advantages: Forms designed for ICR processing are relatively easy to fill in and locally printed forms can be used. ICR

works well with numeric characters

Disadvantages: Does not work as well with alphabetic responses which may need a large amount of manual

intervention to ensure accuracy. ICR software is not able to recognise all handwriting, and is not always reliable in its recognition process, there needs to be a great deal of manual intervention when the recognition falls below certain predefined standards of certainty. Consequently it is not easy to accurately predict the timescale or the costs of the data capture process. ICR software and the necessary computer infrastructure can be expensive. High calibre IT staff are required to support the ICR system

5. Emerging technology, combining OMR & ICR – sometimes referred to as IMR (Intelligent Mark Recognition)

This is a relatively new approach to data capture. The OMR tick box data is immediately and accurately recognised and captured on the special IMR scanners. Data is ready for immediate import into software such as CSPro. At the same time images of the form are captured where necessary: for example where an OMR response is not appropriate, or where the OMR scanner has highlighted logic or validation errors on the form. Digital Images of such forms are then sent to ICR software or manual entry software for capture of the relevant fields

Advantages: Combines the benefits of “traditional” OMR technology with the potential of using the latest ICR

techniques. An image archive of all census forms scanned is automatically created. A reliable and predictable outcome can be achieved from a given investment in these techniques

Disadvantages: This technique still needs specially printed forms.

The success of a census to capture data capture from paper is substantially dependant on five critical factors:

Critical Success Factor One: Scanning Hardware

If document scanning is gong to be carried out, then the right scanner for the job needs to be selected. Census organisations should choose a scanner manufacturer who can offer a scanner which has been shown in other similar census project to be entirely fit for the purpose.

Consideration should be given to the following:

• Scanners should be designed with a long duty cycle in mind to help ensure general reliability and consistent data capture during non-stop or lengthy scanning periods.

• If an OMR form is to be used and it is intended to process this form on a dedicated OMR scanner, then this scanner must have a heavy duty metal construction to ensure accuracy of read during the entire scanning process

• The scanner must have ability to remove any jammed census forms successfully; an open paper path will help. • The scanner must have the ability to detect and prevent double form feeds during scanning.

• The scanner must be capable of being networked to integrate with the necessary software systems and servers.

• The scanner must have a user friendly interface and easy to replace consumables.

• Careful thought must be given to the availability and experienced of local partners to provide full technical back up and service if required, ideally local support engineers should have been trained at the manufacturer’s head office.

• It is recommended to use scanners which have actually been used in a census before. There are also benefits to choosing a scanner which can:

• Route forms to different output trays depending on certain predefined business rules • Achieve high throughput rates during peak loads in the scanning process

Best practice dictates that a contingency is in place in the event of a scanner failure. It is recommended that more scanners are put into place then the number required. There are two main advantages to this approach:

• Any extra scanners can be utilised during forms processing (So the census processing can be kept ahead of schedule)

• If any scanner fails then there is minimal impact of the schedule (As the replacements are already ready and in use

Critical Success Factor Two: Census Form Design

Time and effort invested to ensure that the census forms are completed as accurately as possible and returned in the best condition possible will pay significant dividends during the data capture exercise.

(3)

enumerators can be more complex and often much shorter (possibly even one side of A4 per household in the case of OMR forms with consequent cost savings).

• Paper quality needs to meet and exceed minimum standards; both from a point of view of paper handling in the field and also from the point of view of the requirements of different scanners manufacturers and scanning techniques.

Both manufacturers of OMR scanners and manufacturers of Image scanners recommend that paper should meet certain standards performance standards laid down by such organisations as ISO, BS, BSEN.

These standards define such features as:

• Grammage, Thickness, Bendsten Roughness, Static Friction, Stiffness, Air resistance, Internal tearing resistance, Folding Endurance, Luminous reflectance, Opacity, Grain Direction, Folds & Creases, Moisture

DRS typically supply paper designed using a specific mixture of these standards based around the generally available APAC8 standard of CBS2

It would be highly recommended that a discussion of the paper quality is undertaken with the scanner supplier and paper supplier to identify the most appropriate standards before production commences. The environment that the form is due to be used in, may dictate the specification of each of these standards. For example if a form is to used in a humid environment the customer should consider the impact and effect that this may have on any paper type selected to ensure accurate completion and good scanning. The paper quality requirements for OMR scanners are more stringent than image scanners, both in the above paper features but also in the type of ink that can be used and the accuracy of the printing process.

The best results using OMR technology for census have been where imported forms supplied direct from the scanner manufacturer have been used. The scanner manufacturer then takes responsibility and guarantees that their form will be able to be scanned and recognised on the OMR scanners.

It is beneficial when selecting OMR and ICR techniques for the Forms to be printed with sequential unique barcodes. A sequential barcode enables all the information on the form to be recorded and referenced against the individual barcode number of those particular forms. Printing such sequential barcodes is not a simple technique; it requires printers with a high level of skill. It is recommended that printers should be qualified to ISO 9001:2000 for Specialist Forms Printing.

Critical Success Factor Three: Training & Support

No matter what technique is used to capture the data from the forms, the techniques themselves cannot turn “bad forms” into “good forms”. If the forms have been badly filled out, then costs will escalate and delays occur and inaccurate results will be recorded. Clearly the best solution is to get the forms filled out correctly in the first place:

• Training of enumerators is fundamental to success. Use the correct pencils, shade the bubbles properly, and keep the forms dry. Treat the forms with respect

• Motivate the enumerators – not just money but other higher values as well.

Local support via the supplier and/or its accredited trained partners during the processing of the census forms will help achieve the best results. Experience has shown that properly training scanner operators significantly improves the throughput of forms through the workflow processes.

Component Four: Computer Hardware

A complete assessment of the existing computer hardware available at the census organisation needs to be made early on in the process of selecting the data capture technique. Different data capture approaches need different PC server and storage requirements, and it is unlikely that existing PCs at the census organisation will be totally suited to the demands of the latest high speed data capture techniques. There will inevitably need to be some investment in this area no matter which technique is used. However relatively speaking, computer hardware and data storage is now much cheaper compared to previous years. Software investment will include both data capture software and also storage and data retrieval software. Consideration should be given to the amount of data to be stored and adequate redundancy within all server systems should be employed to protect the data and images. Local power continuity should also be accounted for when specifying new computer hardware and Un-interruptible power supplies (UPS) are potentially the best option.

Component Five: Software & Workflow

The workflow for the data processing operation will be tailored to suit the individual circumstances of a census authority. Consequently suppliers should be flexible in their approach rather than prescriptive in their solution. It is recommended that during the pilot census a review is undertaken to get the best results form the workflow. This may include:

• Moving storage closer to the data processing area

(4)

• Designating or changing job roles and functions of staff • Reviewing the security of the data at each part of the process • Adding or adjusting quality control processes

The software selected for census processing should be compatible with the technical skill set used in house within the census authority. It is critical to try and align the local skills and software to the business processes required to get the best out of the system

A local understanding of the data capture software and the operating systems it is used on will increase the level of success and enable the early resolution of issues during the data processing exercise. The use of standard proven software packages used in the census market place will reduce risk, give confidence to the users and potentially enable information and experiences gathered from other census processing exercises to be effectively used.

Summary

The final decision of the choice of data capture technique will be a compromise which aims to balance out often competing decision drivers. A balance needs to be achieved which will have taken into account all five critical success factors discussed and which has also reflected the following factors:

• Available budgets and funding

• Cultural traditions and previous methods used • The local Geography

• Total required speed of data processing operation

• The local economics, the local infrastructure and logistic capabilities • The skill set of the local staff and their experience.

• The current infrastructure in place within the census authority

• Willingness and practicality of using imported forms/ the availability of high quality locally printed forms. • The need to use form designs which can be understood by the local population

• The level of education of the enumerators or the population. This will help decide the potential layout of the form and then the best data capture method.

• Historical and cultural factors.

• Information and experience gained during any pilot exercise

• Experiences of other census authorities and geographical member organisations

If a decision is taken to use ordinary image scanners for census data processing, where it is intended to capture the data using ICR software or keying from image techniques, then these forms can usually be printed locally. When printing ICR forms locally there might at first be an apparent cost benefit compared to the costs of importing specially printed forms – however take care when making this calculation. The immediate costs savings of using locally produced forms in an ICR/keying solution might well result in much higher overall costs in other parts of the census- e.g. much higher software costs and potentially a much greater time in capturing the data.

An OMR solution does have the strong advantage of offering a clearly defined a calculable data capture cost. By comparison the costs of an ICR data capture solution (even though the forms might initially cheaper to print), can be far more difficult to accurately estimate in advance of the census. Nevertheless the ease of filling in ICR forms makes ICR techniques an increasingly attractive option for census authorities. However for any census authority looking to implement ICR data capture techniques, then there will inevitably need to be a need to be an increase in the numbers of staff with advanced IT skills compared to the numbers of staff required to manage other techniques. It is interesting to note that The Australian Bureau of statistics9 has announced publicly, it has over 400 IT staff. Inevitably census authorities with less IT resources may make different decisions on the methods of capturing the data from their forms from those census authorities who have more IT resource at their disposal.

DRS Data Services Limited has expertise in all five data capture methods which have been discussed in this paper. Each census project that DRS is involved in is assessed on its own merits, and the company is able to advise and recommend the best data capture method for each project based upon the particular local circumstances prevailing. After a thorough review of the technology the two most recent census projects that DRS has been involved with are using the combined approach:

The Central Statistical Agency (CSA) of Ethiopia awarded a contract to DRS in 2006 to supply a combined OMR/Image census data processing solution in 2007(using the “IMR” technique) They undertook a pilot census in 2006 with combined technology. CSA was very happy with the results of the pilot exercise and subsequently will be using a combined system for the national census in 2007. They specifically chose to use this combined system as they had encountered many issues in manually keying the data from their previous national census. It was decided after the pilot exercise to streamline some of the validation and business rules for the forms during processing. This simplified the validation and verification process of the data being captured.

(5)

1

Author – International Manager, DRS Data Services Ltd, UK. 2

Co Author – Regional International Manager, DRS Data Services Ltd, UK

3

DRS Data Services Limited is a leading international supplier of scanning products and services. The company has particular expertise in areas such as census, elections registration and examinations solutions. DRS has over 35 years experience both in the UK and Internationally

4

Defined in the UNSD report Principles & Recommendations for Population & Housing Censuses (Series M, No.67/Rev 1,1998) 5

Description of recommended process in the UNSD report Principles & Recommendations for Vital Statistics Systems - Control of receipt of statistical reports (Series M, No.19/Rev 2, 2001)

6

UNSTATS - http://unstats.un.org/unsd/demographic/sources/census/censusquest.htm#P 7

Provided by the US Bureau of Statistic CSPro (Census and Survey Processing System) is a public-domain software package for entering, editing, tabulating and mapping census and survey data

8

APACS is a trade association for the payments and banking sector in the UK – www.apacs.gov.uk 9

14th

British commonwealth conference on managing statistics in South Africa 2005 - Recent Advances in the Use and Management of Technology at the Australian Bureau of Statistics

References

Related documents

Ramirez, OATH 2418/07 (petitioner was not entitled to retain custody of seized vehicle where it failed to make timely service of notice in both English and Spanish,

● From the Start/Finish at the River House Barn, head south and cross Fig Ave into the River Campground and head EAST and connect with Main Loop trail.. ● The main loop trail will

Intermediate casing hangers are identical in every respect to casing hangers used in lowermost casing heads and are used to suspend the next smaller casing

Due to different migrants’ features (irregular legal status abroad, dangerous journeys, precarious social incorporation in migrants’ new places of residence),

In each case, panel A graphs the smoothed median, panel B plots the associated contour lines, panel C shows median weekly wages by age for various years, panel D gives wages by year

We apply these two theorems to obtain an algebraic independence testing algorithm, an arithmetic circuit lower bound over arbitrary field and a PIT algorithm (over fields

Read, which is a program to promote reading among reluctant male readers.

Perjeta may be considered medically necessary for neoadjuvant therapy for the treatment of HER2-positive, locally advanced, inflammatory, or early stage breast cancer (either