The Elements of Data Accuracy:
A Step-by-Step Process for
Improving Data Quality
Margaret Leonard
Redwood Community Health Coalition
Santa Rosa, California
Carol McHale
Redwood Community Health Coalition
Santa Rosa, California
Margie Powers
Margie Powers Consulting
San Anselmo, California
Prepared through the
Accelerating Quality Improvement through Collaboration Project
For the
Overview
Community health centers (CHCs) provide crucial medical, dental and mental health care for millions of individuals who otherwise would lack access to medical care. As the economy weakens, the number of people served by CHCs is rising, while increased scrutiny is placed on the CHCs to demonstrate that they are providing high quality care. Only with accurate and credible data can CHCs show that they are providing excellent care. Accurate data allows comparisons within and across CHCs. CHCs can then identify successful and effective systems, and replicate their program across additional health centers.
Accurate data also allows comparisons to those organizations outside of the safety net system. With health reform imminent, much can be learned from the experiences of CHCs. It is imperative that safety net organizations are included in reform discussions early and often. Using data to demonstrate a high caliber of care both within and external to the safety net gives CHCs credibility and authority. It is for these reasons that CHCs must commit themselves to high quality data.
Problem Description
Increasingly, health centers are utilizing electronic systems to collect patient data. Recent governmental emphases on the availability of affordable electronic tools, electronic systems, and increased awareness of population management, have
combined to create a reliance on large sets of data to manage populations of patients. Clinical information systems such as electronic health records and chronic disease registries provide advantages in that CHCs can now synthesize electronic data from multiple sources, providing a more complete picture of a patient’s health status. Information that historically had been available only through looking in multiple sources—medical charts, lab reports or referral forms—is now available in a single record. Rapid decisions can be made about a patient’s care with this data, such as prescribing medications, requesting tests and providing instant feedback to patients on their progress.
However, the collection of large quantities of data electronically requires CHCs to move further away from source data—such as patient paper charts—and to rely on systems that collect and collate data from multiple, disparate, and possibly, unconfirmed data sources. As clinical information systems grow more complex, inaccurate data becomes more apparent and the probability of uncertain data quality can increase. There is a real and common need emerging for clinics to validate their data. While the need is real, there is no standard, accepted method for data validation. This paper will provide a framework for assessing data accuracy and performing data validation, so that clinics can feel confident about their clinical data.
What is Data?
Before describing data validation, it is important to understand the evolution of data collection. Clinics are in the midst of transitioning to primary reliance on electronic data for managing their patients’ care. Until recently, the majority of clinics obtained data about their patients from paper charts. These charts were kept on-site at the health center, pulled from files for patient appointments, and providers made notes in the chart to document treatment and progress.
As clinics begin utilizing electronic systems, such as disease registries, the charts continue to be maintained, but electronic systems are developed in tandem—either maintained by manual data entry or electronic data feeds. Within a health center, multiple systems can be utilized to inform patient treatment decisions, each of which require on-going maintenance by clinic staff.
Given the intense demands on clinic staff time, and the high staff turnover faced by clinics, set-up and maintenance of these systems is not always performed consistently, and errors are introduced into the systems. Errors can take any number of forms: electronic systems may not have been programmed at the onset to accurately capture data, manual data entry could be inconsistent or erroneous, external data sources could be setup incorrectly, or patient information could change and not be captured in a timely basis. Thus, there is a risk of clinic staff working with data that is not correct. There is a very real need for standardized data validation processes at health centers to ensure that patient care is delivered effectively and efficiently.
What is Data Validation?
An early barrier to successful data validation is a lack of standard nomenclature to describe it. In a report published by the United States General Accounting Office, data verification is defined as “the assessment of data completeness, accuracy, consistency, timeliness, and related quality control processes”, while data validation is defined as “the assessment of whether data are appropriate for the performance measure”.1
For our purposes, data validation is a process used to improve or maintain data quality. In a clinical setting, data quality means ensuring that patient data is valid, complete, accurate, consistent, timely and easy to use.1 Data validation can occur in myriad
ways—through the examination of paper charts, through the review of patient lists generated by clinical information systems, or typically, through some combination of the two.
Performing data validation can provide enormous value to a clinic by reassuring them that their patient data is accurate and complete, or surfacing errors that can be fixed. Accurate and complete data creates opportunities for change in clinics, allowing clinics to make decisions based on solid evidence, identify areas for improvement, disseminate accurate information, and strengthen the case for securing program funding. On the
1 Report to the Chairman, Committee on Governmental Affairs, U.S. Senate. Performance Plans: Selected
Approaches for Verification and Validation of Agency Performance Information. United States General Accounting Office. July, 1999.
other hand, not having accurate and complete data results in the clinic not knowing where patients are, how they are feeling, how their health is changing, or how to support their providers in caring for their patients.
Clinics that are interested in data validation, are motivated and have the resources to do it are often stymied by the “how” of data validation. There are few publicly available, vetted resources available to facilitate the validation, there is no accepted terminology to help clinics define the types of work they want to do, and after performing a literature review for this paper, there is a very small, fragmented body of literature on data quality in clinics.
This paper takes a step toward overcoming that resource gap. What follows is a proposed process for improving data accuracy at health centers. The process has been developed through conversations with health centers in California that are now
validating their data, and with those that want to start. While it is certainly not the only method of performing data validation, it is a process that can be utilized, tested and re-worked by any clinic that relies on electronically captured patient data.
The Data Cycle
Data validation is just one activity within a data cycle. As discussed, there is no standard vocabulary to describe data validation and the other activities that occur in the data cycle. Thus, for purposes of this paper the data cycle is defined as follows:
Data Collection Different types of data are collected into a single location—such as a disease registry—for use in patient care, or reporting purposes. Data typically comes from more than one source, and can be electronic or manually collected. Data Validation Data is reviewed by clinic staff for overall accuracy. Any deviation from what is expected will then trigger data validation activities—those steps taken to verify that data is a) accurate and b) complete. Data validation may entail looking at charts, patient lists, and comparing data from different data sources and is frequently an iterative process.
Data Correction Steps taken to correct data after it is validated. This typically will require some type of change to workflow as inaccurate data is frequently
attributed to human error.
Data Use Upon completion of validation and correction, data is ready for analysis and use in patient care.
Step-by-Step Guide for Improving Data Quality
The seven step process that follows is a process for improving data quality. The process can be used with data from virtually any clinical information system, including Practice Management Systems (PMS), disease registries and Electronic Health Records (EHRs). The step-by-step process is summarized in the table below, with each step listed next to the corresponding portion of the data cycle.
Data Cycle Step-By Step Process
Data Collection 1. Identify data element for validation
2. Define goal for data
3. Assign data validation to specific person 4. Create timeline for completion
Data Validation 5. Validate data
Data Correction 6. Correct errors
7. Re-validate data
Data Use When CHC is confident that data is correct, it can be
used for its intended purpose.
For a detailed example of a fictitious clinic doing data validation for the first time, see Appendix A.
Step 1: Identify data element for validation
Data validation is typically triggered by one of three things: 1) data is to be submitted for a specific purpose or project, 2) data does not look right, or 3) data is validated on an on-going basis as a routine clinical activity.
Regardless of the reason for initiating data validation, clinics are encouraged to pick a single element of data to test per validation cycle, expanding data validation as each data element is corrected. There are innumerable data elements that can be validated, and within each element are various types of errors that could be made. The table below suggests just a few possibilities:
Data Element Types of Errors to Look For Total number of patients with a certain
condition (per a PMS or registry report) CPT, ICD-9 coding errors, from encounter form Duplicate patients in registry Deceased, inactive patients in registry
Total number of patients that received a
certain test, procedure (i.e. mammogram) Patients received the test from external provider, and documentation is not in chart Wrong patients identified as needing test (i.e. outside age, gender parameters)
Average value of a certain test (i.e. HbA1c) Invalid values outside the accepted range
Test results are available, but are not entered into the registry For example, a CHC may want to confirm that their disease registry was accurately capturing all of their adult diabetes patients, so the total number of adults with diabetes would be the data element to be validated.
Step 2: Define Goal for Data
Once a data element is selected, the clinic should set a goal for the selected data element. Clinics may want to look for standards for this data element, and what is a reasonable value to expect. For example, a clinic may know the prevalence of diabetes in their community, and so should have an expectation of what the total number of patients with diabetes should be in their registry.
Also, an acceptable error rate should be established. It is virtually impossible to perform data validation on 100 percent of a clinic’s data. Typically, clinics select a sample for validation, and based on findings from looking at the sample, extrapolate an error rate to the entire population. There are numerous methods for calculating the appropriate sample size for data validation. One method is that proposed by HRSA for testing the UDS measures,2 or some clinics simply determine what is a reasonable amount of charts
for a first-time audit. For example, a clinic may decide to pull charts for ten percent of the patients in the list, and determine the error rate. If the error rate exceeds their acceptable rate, they will pull additional charts.
For example, a clinic may be participating for the first time in a pay-for-performance program that pays an incentive if the CHC Pap rate is above the state-wide average. The clinic has never done data validation before so does not know what their clinic-wide Pap
rate is; only that the state-wide Pap rate is 75%. The clinic may go into the data validation expecting to see a rate of 75% if they intuitively believe that they are performing the tests appropriately. Any large deviation above or below this rate may indicate that the data is not accurate.
The clinic may also decide that their acceptable error rate is five percent. This means that they will allow for five percent of the total results to be erroneous without performing additional validation. So, in this example, the clinic may have a list of patients generated from their PMS that shows patients who had Pap tests in the measurement period. They may decide to select 20 names from this list and review those patients’ charts to confirm that the Pap was performed. If more than one out of those 20 charts shows the patients did not receive the test, then the clinic will review additional charts. If one, or none of, the charts have errors, the clinic can feel confident about their data accuracy.
Step 3: Assign to Specific Person
The data validation process should be managed by a single person within the clinic. This person will work with others in the clinic, such as providers, information technology and administrative and clerical staff to perform the validation—but the person assigned the task should have responsibility for tracking and documenting progress. If a quality improvement staff person is available, this can be a good candidate to manage the process as quality improvement staff has familiarity with the various disciplines that are required for completing data validation.
One challenge faced by clinics is that most staff people are managing multiple job responsibilities, leaving little time for data accuracy activities. To overcome this, some clinics are moving towards utilizing a “Data Steward” for data quality work. A Data Steward is a staff person whose primary responsibility is overseeing data quality through the entire data cycle.
Step 4: Create Timeline for Completion
While the hope is that CHCs are perpetually performing data validation, it is well
understood that clinics have limited resources to commit to these tasks. Thus, from the onset, a timeline should be established including deadlines, and the amount of time clinic staff can commit to data validation.
Frequently, data validation is initiated to meet the requirements of a specific project—a report may be due, or a pay-for-performance project may require data validation before payment. The clinic should establish first if there is a due date for the work. If there is no externally mandated deadline, the clinic should establish its own timeline. It should be noted that data validation activities frequently surface numerous issues that need additional scrutiny, so the timeline should be generous enough to allow for that work. Also, data validation frequently is performed multiple times on the same data element. Clinics should allow ample time to repeat validation processes several times, and then time to fix any errors found during the processes.
Step 5: Validate Data
Up to this point, clinics have been focusing on making decisions about the data validation process. Now, the clinic will begin validating their data. The first step in the process is to identify all relevant data sources.
As already discussed, clinics rely on information from multiple sources to make clinical decisions. Chronic disease registry data for example, can include electronic data from PMS and laboratories, and also data that is manually entered by staff. Ideally, the source data (for example, PMS information) should be accurately transferred into the disease registry. Frequently, clinics are surprised to find that this not the case, and data validation work will uncover the reasons for the discrepancy.
A common validation technique is to select a data element from one report or system, and compare it to the same data element in a different report or system. For example, selecting the total number of patients with diabetes in a PMS and comparing it to the total number of patients with diabetes in a disease registry as shown in the diagram below.
Report A—from PMS Report B—from Registry
If the data is accurate, the two totals should be the same. The CHC can now see that there is a discrepancy between the two reports and some data validation is required. A likely next step would be for the CHC to select a sample of patients from Report A and confirm that they are included in Report B, or select a sample of patients from Report B and confirm that they are also in Report A. The findings of this test will reveal the source of any errors that exist.
There are several very common errors that a CHC may encounter in their data validation
Report from PMS—Adult Patients with Diabetes Patient 1 Patient 2 Patient 3 . . . Patient 500
Total Adult Patients with Diabetes: 500
Report from Registry—Adult Patients with Diabetes
Patient 1 Patient 2 Patient 3 . . . Patient 450
Registry contains inactive patients: The CHC may learn that patients are added to the registry as they are diagnosed, but never removed as they move, or are learned to be deceased. The result is an overstated registry total. PMS or registry contains duplicate patients: CHCs may include the same patient more than once in both PMS, resulting in overstated totals.
PMS contains patients incorrectly coded for diabetes: Errors may be made in assigning ICD-9 or CPT codes to patients, either through
misunderstandings or inconsistencies in how codes are used, or through human error in data entry.
Step 6: Correct Errors
While data errors can certainly be created by computer or system errors, in many cases, errors in data are due to inconsistencies in clinic workflow. As errors are uncovered through validation, it is important to examine both system-generated and human errors. For system-generated errors, a clinic can closely examine how reports are generated— confirming that the right codes and encounters are being selected, as well as, that the interfaces are feeding the proper data into the system.
Human errors typically require examining workflow. Even with an electronic record-keeping system, clinics depend on staff to perform manual processes. This may mean manual data entry of some information into a system, or it may mean a provider
checking off a code during an encounter. When investigating errors, clinics must closely examine their workflow to determine which staff or systems are involved in data
collection. Each step of the process is vulnerable to error.
Discovering errors may lead to creating new workflow processes within the clinic so as to prevent future errors. For example, a CHC may learn that several providers are using an HbA1c test to screen for diabetes, but their chronic disease registry classifies all patients who have had an HbA1c test as having diabetes. Providers and staff may need to be educated on how to appropriately utilize the HbA1c test so as not to overstate the number of patients that have diabetes.
Step 7: Re-validate Data
Once errors have been corrected, data should be re-validated. Clinics may run the same reports as in Step 5, and compare the totals again. With each round of data validation activities, and subsequent workflow corrections, the CHC should expect to see errors reduced.
It may be helpful for the CHC to view data validation as an on-going process and not as a single event. Clinics can then allocate sufficient resources to maintain and continually improve the quality of their clinical data.
Conclusion
This paper provides a framework for improving data quality. The framework is intentionally a simple one, with the expectation that clinics will modify it to suit their own needs.
As increased scrutiny is placed on clinical data for FQHC reporting, grant reporting requirements and even pay-for-performance initiatives, the importance of data quality will only increase. Clinic leadership is encouraged to allocate sufficient resources for their staff to undertake data validation on a consistent, on-going basis. Funders of clinic initiatives are encouraged to include data collection and validation activities in grant requirements for funding. And clinic staff is encouraged to promote data quality across their clinics as an integrated component of high-quality patient care.
APPENDIX A
Data Validation Steps—Activity Summary Sample
The clinic participates in the AQIC project, and has a report due for the project in two months. The clinic uses i2iTracks as their chronic disease registry, and will be validating elements within the CSLC report generated by i2iTracks.
Step in Process Summary/Status
Step 1: Identify what data element (s)
needs validation Total number of patients with diabetes in the registry, per i2iTracks “CSLC Report”. Step 2: Define Goal for Data Total patients in clinic: 100,000
Prevalence of diabetes in population: 9.0 %
Expected value for data: 9,000 Acceptable error rate: 5%
Step 3: Assign to Specific Person Staff person: Quality Improvement Coordinator
Step 4: Create Timeline for Completion Time period for validation: December 1-31, 2008
Goal: Two validation cycles
Step 5: Validate / Analyze Data Reports to compare:
a) CSLC Report Total Patient Count b) Total diabetes patient list from PMS Findings: Providers not using encounter forms so patients are not coded as having diabetes. Patients in i2iTracks as having diabetes are NOT in PMS.
Step 6: Correct Errors Correction: QI Coordinator to have
one-on-one meetings with each provider to review form, importance of completing correctly.
Step 7: Re-validate Data Compared reports, and found error rate
reduced although not at goal. Will do targeted outreach to providers with lowest incidence of correct codes.