Data Quality and Validation

(1)

M-4. 1

Module-4

Data Quality and

Validation

Learning objectives:

After reading this module you will be able to understand:

1. What is data quality and its importance for

HMIS.

2. How to do data quality check at point of data

entry.

3. How to create data validation rules.

4. How to carry out data triangulation.

(2)

M-4. 2

4.1 Overview of data quality check

Ensuring data quality is a key concern in building an effective HMIS. Data quality has different dimensions including:

If we have poor quality data, we will have “garbage in and garbage out” situations. Use of poor quality data leads to ill informed decisions. So, the HMIS software should be built in with different tools to do data quality checks and validation.

4.1.1 Data quality checks

Data quality checking can be done through various means, including:

1. At point of data entry, the software can check the data entered to see if it falls within the min-max ranges of that data element over the last six months or as defined by the user.

• Correctness: Data should be within the normal range for data

collected at that facility. There should be no gross discrepancies when compared with data from related data elements.

• Completeness: Data for all data elements for all health

facilities/blocks/Taluka/districts should have been submitted.

• Consistency: Data should be consistent with data entered during

earlier months and years while allowing for changes with reorganization, increased work load, etc. and consistent with other similar facilities.

• Timeliness: All data from all health facilities/blocks/Taluka/districts

(3)

M-4. 3 has finished data entry. The user can also check the entered data for a particular period and Organization Unit(s) against the validation rules, and display the violations for these validation rules.

3. Analysis of data sets, ie, examining gaps in data.

4. Data triangulation which is comparing the same data or indicator from different sources.

4.2 Data quality check at the point of data entry

Data quality can be checked at the point of data entry in the following two ways:

a) By setting the minimum and maximum value range for each element manually. Or

b) Generating the min-max values using the DHIS 2 if there is historical data available for that data element.

a) Setting the minimum and maximum value range manually

If you are using the default entry screen click on the element for which you want to set the min-max value, as shown below.

(4)

M-4. 4 A pop up window will appear as shown below. Here you can enter the min-max values.

On subsequent data entry if the value entered does not fall within the set min-max range the text box will change colour to red. The user will also

(5)

M-4. 5 get a popup as shown below. This change in colour is a prompt to check the data entered and make necessary correction.

On the data entry screen the users also have the option to add a comment on how the discrepant figure might be explained (if required). This you can do by using the drop down menu of the ‘comment’ box.

In case you are using the custom data entry screen which is displayed when you deselect the ‘default data entry form’ option on the top right

(6)

M-4. 6 corner of the screen. In this case the minimum and maximum values can be added by double-clicking on the data entry box instead of the data element.

b) Generated min-max values

If you have a minimum of six months of your data entered in the DHIS2 it is possible to generate the min-max value, element-wise, using the DHIS2. In such case you merely need to click on the ‘Generate min-max’ tab as shown below.

In case of default data entry screen the min and max values, when generated, will appear on the left and right side of the data entry box.

In case you deselect the default data entry form the generated values will appear on the top right end of the screen as shown in the following screenshot.

(7)

(8)

M-4. 8

4.3 Defining Validation Rules

Validation rules are data quality check mechanism based on verification of the logic of relation between related data elements. Validation rules are relational expressions comprising of related data elements and an operator that states the expected / logical relation between the elements. For example number of infant deaths cannot be greater than the number of deliveries. As can be seen from the example a validation rule comprises of a left and a right side. On the left side of the expression, there must be a data element or a combination of data elements, and the same on the right side. The left and right hand sides of the expression are separated with a validation operator which states the realtion between the elements. As validation rules have a relational property there must be atleast two data elements for which the validation rules may be applied.

4.3.1 Types of validation operators (equal to, less than,

greater than):

Following are some validation operators used for data quality analysis in DHIS.

• Equal to: It will validate the validation rule only if both sides are equal.

• Not Equal to: It will validate the rule if both the sides Not Equal

• Greater Than: It will validate the rule if the left side is greater than the right side.

• Greater Than Equal to: It will validate the rule, if the left side is Greater or Equal to the right hand side.

(9)

M-4. 9

• Less Than: It will validate the rule if the left side is smaller than the right hand side.

• Less Than Equal to: It will validate the rule if the left side is either smaller or equal to the right side.

4.3.2 Adding new validation rule

Follow the steps below to add a new valiadtion rule.

First select the Data Quality module from the drop down menu of the Services module located on the main tool bar.

(10)

M-4. 10 In the screen that is displayed – Validation Rule Management screen, click on ‘Add new’

(11)

M-4. 11 Enter the first three fields specifically validation rule name, description of the validation rule and select the particular operator that forms the validation rule. Next click on the Edit left side button to enter the ‘left side’ details of the concerned validation rule.

The following steps can be used by the user:

1. Add Description

2. Select data element from the Available Data Elements options shown on the right side.

3. Add Operators in between the data elements to generate the desired formula.

(12)

M-4. 12 4. When you have entered the required fields click on ‘update’. This will return you to the previous window. Here click on the ‘Edit right side’ and follow the steps that you followed for the ‘Edit left’.

4.4 Validation Checks

When you open the Data Quality module you will see a menu on the left side that lists different options related to validation rules. For purposes of validation checks you will need to use the ‘Run validation’ option. This is described below.

4.4.1 Run validation:

1. If you select the run validation option the following screen will be displayed.

(13)

M-4. 13 2. You will be required to specify the period for which you want to run the validation check by selecting the start and end dates. This you can do by using the drop down calender provided for the date fields.

3. Next select the particular organisation unit (s) for which you want to run the validation.

4. Finally click on the ‘validate’ tab.

5. When you click on ‘Validate’ (Number 5 on the screenshot) button the following popup will be displayed which will list the validation rules that are violated with data values of the elements constituting the particular validation rule.

(14)

M-4. 14

4.4 Diagnosing the source of the validation

violation:

This you can accomplish by selecting the ‘Run validation by avergae’ option. You can run this validation after entering the required fields of the ‘run validation’ screen which includes the organisation units(s) and the period for which you want to run the validation. The result of ‘Run validation by average’ is a pop up screen (see screenshot below) that displays the percentage of validation rules violated by the selected orgnaisation unit(s).

Amogst the Orgunits showing violation, select the one which has maximum violation percentange, and drill down to get its detailed

(15)

M-4. 15 validation list. You could do the same for any other organisation unit as well.

The screen that gets displayed presents the list of validation rules that have been violated by the specific orgnaisation unit (see screenshot below).

(16)

M-4. 16 To get drilldown for one validation, click on any validation rule, it will give you the detailed validation analysis for the selected orgunit and its immediate children.

(17)

M-4. 17 If you click on any orgunit you can drilldown to its children.

4.5 Analysis of data status

The purpose of analysis of data status is to see what is the percentage of missing or unreported data either by data elements or by facilities.

4.5.1 Types of missing data

Missing data can be listed by facilities and or data elements. Missing data creates different kinds of problems, such as:

1. Incomplete reports.

2. Indicator calculations will be misleading as there will be some numerator or denominator that is missing.

(18)

M-4. 18 needs more care and support.

4.5.2 Generating missing data reports by facilities, data

elements and periods

‘Data Status’ option provides us the tool to analyze how much data is entered. You can find this option in Dashboard Module which is inturn displayed in the drop down menu of the services module.

Clicking on the dashboard module will lead to the following screen where you can find the ‘Data status’ option.

(19)

M-4. 19

Once you click on ‘Data Status’ option you will get the following screen where you can select orgunit, dataset, period for which you want to generate the data status. Once you have done this click on ‘View data status’ as shown below.

(20)

M-4. 20 You will get the following output.

From here you can go drill down to its immediate children by clicking any orgunit to obtain more detailed sub facility wise data status.

(21)

M-4. 21

4.6 Data Triangulation

Data (for example on institutional deliveries) is collected from different sources such as routine health data and NFHS surveys. By plotting data on this data element across the three surveys and juxtaposing it with routine data, we can have a method of data triangulation.

In the boxes below, the NFHS trends in institutional delivery are compared with trends of monthly figures from the state routine HMIS.

Da t a ::Tr i a ng ul a t ion

Routine Health Data

NFHS Census

Routine Health data. Collected and reported routinely every month

Large scale, multi-round survey conducted in a representative sample of households throughout India. Once in 5 years.

largest single source of a variety of statistical information on different characteristics of the people of India once in 10 years

Trends in Institutional Deliveries (% )

37 46 55 70 0 10 20 30 40 50 60 70 80 37 46 55 70 NFHS 1 NFHS 2 NFHS 3 Apr-Aug 07

Trends in Institutional Deliveries (%)

37 46 55 70 0 10 20 30 40 50 60 70 80 Institutional Delivery 37 46 55 70 NFHS 1 NFHS 2 NFHS 3 Apr-Aug 07 Institutional Deliveries (%) 66.9 69 70.7 70.3 73.2 63 64 65 66 67 68 69 70 71 72 73 74