Controlling quality at the source - Prevention of Poor-Quality Data

7 Prevention of Poor-Quality Data

7.1 Controlling quality at the source

Each of the causes of quality problems enumerated in section 4 should be addressed in an attempt to detect, prevent, and/or correct such problems at the source.

7.1.1 Intrinsically limited data content

Subjects with limited intrinsic characteristics may not be a preventable problem. The best solution is likely to be collection of additional data: use of multiple instances (e.g. fingerprints from ten fingers) or multiple biometric modalities (e.g. fingerprint and iris) reduce or eliminate the impact of an individual with a single poor-quality biometric instance. If the system is limited to a single biometric modality and instance, multiple samples should be captured and the best of the set retained; findings from US-VISIT and NIST’s MBARK project have indicated that quality improves if multiple images are captured automatically and a quality metric is used to choose the best of the set.

Subjects with missing biometrics (such as amputations) should be clearly designated at the time of capture, and that designation should be permanently

attached to the biometric record; otherwise, an unexplained blank image will continue to cause problems in the system. (The same is true of temporarily unobtainable biometrics (such as bandaged features). Procedures should be clear to operators so that missing biometrics are consistently treated.

7.1.2 Degraded or obscured characteristics

Degraded biometrics that cannot be corrected at the time of capture should be treated in the same way as subjects with limited intrinsic characteristics. Operators should be aware that they should correct those obscured characteristics that can be corrected (such as glasses, clothing, hair, and dirty fingers) and know which should not be corrected (such as bandages or clothing like veils that cannot be removed for social/cultural reasons).

7.1.3 Subject behavior

The system and its procedures should be designed to minimize the negative effects of subject behavior. For cooperative or ambivalent subjects, this is primarily a human factors* issue: it should be as intuitive as possible to subjects what is expected of them and how they are to accomplish it. The complexity of the interface visible to the subject should be minimized so that training/habituation is as unnecessary as possible. Usability evaluations of the system interface and physical configuration are important. Some of the issues that may be considered during human factors evaluation may include

• Collection devices and processes need to be designed for ease of use of both subject and operator. Unnecessary complexity reduces the chance of collecting good quality data.

• Provide clear and simple feedback to subjects and operators. There may be opportunity to re-collect data if needed. Consider the use of a 2-tiered approach when quality determination may cause queuing problems: fast metric to eliminate obvious problems, a rigorous one for final determination. • Devices need to be properly aligned for good quality data collection. Subjects

need to be provided sufficient room to adequately position themselves. For example, reach across the body can skew fingerprint images.

• Design specialized processes for handling subjects who attempt to evade or spoof the system.

• In environments in which subjects may be actively uncooperative or hostile, consider that the collection device could be used as a weapon, and that close proximity to subjects can be a risk to operators.

• Design for exceptions as well as the norm. Operators or subjects may have disabilities and can vary in size.

• Design for consistency of use and consistency of data.

• Design for anticipated operator fatigue. Data collection is a repetitive process and fatigued operators can let quality lapse.

• Include operator performance metrics to identify poor or lapsed training and/or data collection device degradation.

The appropriate procedures to deal with actively uncooperative subjects are presumably specific to each system. Policies regarding uncooperative subjects should be made clear to operators so that they are uniformly and appropriately managed.

7.1.4 Fraud

The first line of defense against spoofing and evasion is during data collection, through the use of trained operators and supporting technology. Operators should be made aware that evasion or spoofing may be attempted, and be informed of how to identify suspicious behaviors. Subjects acting suspiciously should be confronted. Technology can help prevent fraudulent samples: [Van der Putte] describes using temperature, conductivity, blood pressure, heartbeat, dielectric constant, blood pressure, and detection under the epidermis as methods of verifying livescan fingerprints. Special image processing algorithms can be implemented to detect suspicious data. For example, samples that are too perfect, exactly repeated, are blurred, or contain edges or suspicious marks may be fabricated.

Collection of multiple biometric instances or modalities helps reduce risk by collecting more data of different types. Successful evasion or spoofing is not simple for a single biometric instance, but is made much more complex if multiple biometric instances or modalities are collected. For example, an attacker who wants to be identified as someone else would need to forge multiple fingerprints in a multi-fingerprint system and would need a disguise if face images are collected.

Unattended collection of biometrics should be suspected as a source of fraudulent samples.

7.1.5 Collection devices

Collection devices should not be a serious source of errors if the specific device being used is compliant at the time of use with well-thought-out quality standards, and it is used according to well-thought-out procedures. This means that

• Standards for collection devices need to be defined, carefully evaluated, and enforced.

• Certification processes need to be implemented; use of certified products should be enforced.

• There should be an ongoing process to monitor the quality of specific devices (see section 7.2.3). Procedures should be in place for rapid replacement of problem devices.

A good example of such standards and certification is the FBI’s certification process for fingerprint scanners. The most widely-used standard for biometric capture devices is the FBI’s IAFIS Image Quality Specifications, Appendix F of [EFTS], which was described in section 6.3. The FBI uses these standards to certify fingerprint collection devices, card printers, integrated products, and identification flats systems. To date, 252 products have been certified, from 41 different companies [CERT]. The FBI certification program, defined in [Nill], requires vendors use their products to collect images of standard test targets. The FBI tests the images and certifies products if they pass the test. Compliance is not intended to endorse any product over any other.

This type of testing shows that the product design can be compliant with the standard and requires vendors to produce at least one compliant product. It provides acquirers with an initial reference to identify applicable products. The FBI’s certification program does not address validation that individual devices are compliant at a given point in time.

7.1.6 Collection processes

Policies and procedures should carefully define the process of sample collection so that it is performed correctly and consistently. Some of the aspects that should be covered include:

• Collection policy and procedures should be defined and enforced.

• Whenever possible, operators should be made aware of quality problems in real time so that samples can be recaptured: recapture of poor-quality data is desirable whenever possible.

• Operators should be trained and tested as to proficiency. Ongoing monitoring or sampling of operator proficiency should be put in place, through a combination of human and automated processes (see section 7.2.3).

• Operator workload should be considered so that it does not become a source of quality problems.

• Operators should be trained in correct and consistent guidance and oversight of subjects, including procedures to handle exceptions (such as missing biometrics or uncooperative subjects).

• The physical configuration of equipment and the collection environment should be clearly defined in standards or best practices documents. This configuration should be designed so that it can be readily replicated in different locations.

• Illumination, temperature, and humidity need to be considered. Fingerprint, face, and iris use imaging that vary with lighting. Sunlight can cause harsh shadows and has been observed to cause halos in livescan fingerprint devices. Temperature and humidity can change the characteristics of collected samples, or increase the risk of device failure.

One example of a successful collection policy is from the DHS Benefits program, which was notable in terms of quality and consistency for an operational data set in the SlapSeg evaluation. Those collection guidelines are included in Appendix B.11 of [SlapSeg].

7.1.7 Compression and sample processing

Compression and sample processing problems are determined by the algorithms used. The compression and sample processing algorithms must be evaluated to determine their impact on sample quality. Once acceptable use of the algorithms is determined, correct use of the algorithms and use of the correct parameters must be defined and enforced. When possible, parameters that should be set system-wide should not be available for manipulation by operators.

7.1.8 Feature extraction

Feature extraction fidelity is determined by the algorithms used. Systems that have developed their own feature extraction methods (or have engineering oversight over development) should evaluate their effectiveness rather than assuming that they are optimal by definition. In many systems this is not possible.

7.1.9 Matching accuracy

Matcher accuracy is a key concern in every system, and presumably improving error rates is a goal for any system. We assume that the fact that matcher errors cause database errors as a secondary effect underscores this concern.

7.1.10 Administrative and database issues

The processes of entering, maintaining, and updating biometric data in the database all need to be made as foolproof as possible:

• The system should be designed to minimize human data entry, and to provide immediate lookups of any cross-references to minimize typographic errors.

• Any significant human-initiated changes, such as merging or deleting records, should be flagged and verified.

• Keys from other systems that purport to be unique should not necessarily be regarded as unique.

• Preventing administrative fraud is a system-specific issue, but one that should not be overlooked. Biometric systems can follow the model of forensic applications, which require processes designed to prevent fraud and administrative attack. [Wertheim] suggests keeping evidence intact when possible, good logging and tracking of evidence, using witnesses to corroborate key processing steps, using reviews and audits to show process and procedure is followed.

In document The Role of Data Quality in Biometric Systems (Page 55-60)