Data Quality and Consistency
Section Intent:
The ultimate reward for participating in the STS Adult Cardiac Surgery National
Database is meaningful data output to improve patient care delivery. This is achieved by building on good, accurate data abstraction using STS data variable definitions, accurate data entry, data validation and data cleaning by querying and cross-checking your data. This section discusses how to query and cross-check your data.
A. How to query your database
**Note: Due to the variety of software, this portion of the manual will be a broad, generic discussion.
The end result of data entry into the STS database is data output at the site level and to DCRI via the harvest process. Thanks to the hard work of the vendors, the harvest process for all or most of the software products is automated.
Data output at the site level is where there is a lot of variance among users. Some sites choose not to do any reporting at the site level. Other sites get data out of their STS software as often as every 2 to 4 weeks. However your site chooses to report data, to get useful, meaningful data out of your software, some querying of the data must be performed.
A simple definition of query is “to ask or inquire” or “to question as doubtful or obscure” (Source: infoplease.com 2001 – 2002. Family Education Network, Inc.). To query your STS software means to set up criteria, filters, or conditions to answer a question.
The requirements for efficient database query development are basic knowledge of: Cardiac surgery
STS data abstraction/STS data collection form
STS Adult Cardiac Surgery Database Specifications v2.52 The “in’s and out’s” of your software beyond data entry
How to incorporate “AND” and “OR” statements for multiple field queries
The specs include the following elements: Field Name – the field’s display name
Short Name – an 8-letter code that the field is known by DCRI and all vendors Status – status of the field in v2.52
Format – field format (i.e. text, integer)
Data Source – where the field is entered (automatic, lookup, user, calculated) Default – the value that the software automatically enters if field is left blank
Parent Field – 1st level of dependency to another field; determines what data should be entered in the child field(s)
Parent Value – the value that needs to be entered in the parent field to require an entry in the child field
Missing Data – how the software will alert you to missing data (no action, report, report and warn)
Valid Data – values that the software restricts for data entry Usual Range – usual range for numeric fields
Description – a short description of the field Definition – the field definition
Harvest Coding – the numeric coding that all field choices get assigned for DCRI harvest
Variation in software design from vendor to vendor makes it difficult to delve deeper into querying your database or to give examples. Knowing how to tie together all of the concepts discussed will produce useful, meaningful data output.
B. Data Elements to cross check
**Note: Due to the variety of software, this portion of the manual will be a broad, generic discussion.
The ultimate reward in participating in the STS Adult Cardiac Surgery National Database is meaningful data output in some form of a report to improve patient care delivery. To obtain the ultimate reward, you must have accurate abstraction using STS data element definitions, accurate data entry, and data validation and data cleaning by cross-checking your data as your foundation for meaningful data output.
Time Management for the STS database at the site level:
Although Validation/Cleaning should represent 15% of the total time allotted for STS database management, it is a vital step to produce accurate reports/analysis, where half of the total time allotted for STS database management is represented.
Categories for data cross-check:
There are four categories to consider for data cross-check:
(1) Case volume by operative category (Attachment A)
Attachment A lists all the criteria needed to query your STS data by the 6 major
operative categories – CAB Only, AVR, MVR, AVR+CAB, MVR+CAB, AVR+MVR. Any time data is pulled by operative category at the site level, these criteria should be used in order to compare to the STS regional and national benchmarks.
This step is also necessary to complete the Harvest Verification Form as part of the bi-annual DCRI harvest process.
Attachment B lists all of the parent fields with their corresponding child fields. There are 69 parent fields in v2.52 with a varying number of child fields for each parent.
To cross-check the parent-child relationships, refer to the STS Adult Cardiac Database Full Specifications v2.52. The key items to pull out of the specs are:
(1) Missing Data – how the software will alert you to missing data (no action, report, report and warn)
(2) Parent Field – 1st level of dependency to another field; determines whether the child field(s) should have any data
(3) Parent Value – the value that needs to be entered in the parent field to require an entry in the child field
Examples:
RF-Diabetes-Control, Sequence No. 410 Missing Data: No Action
Parent Field: RF-Diabetes Parent Value: Yes
The query would be set up to pull all cases where: RF-Diabetes = Yes AND
RF-Diabetes-Control = null
When there are multiple child fields for one parent field, a parent field might be answered as “Yes” with all of the child fields answered as “No”.
When a parent field = Yes, there must be at least one child field that = Yes. A query could be set up to check that the parent = Yes and all of the child fields do not = No:
Comps-Complications, Sequence No. 2710 Missing Data: Report
Parent Field: None, this is a parent field Parent Value: Not Applicable
The query would be set up to pull all cases where: Complications = Yes AND
Comps-Op-ReOp Bleed = No AND
Comps-Op-ReOp Vlv Dys = No AND
Comps-Op-ReOp Gft Occl = No AND
Comps-Op-ReOp Other Card = No AND
Comps-Op-Reop Other NonCard=No AND
Comps-Op-Perioperative MI = No AND Comps-Infect-Stern-Deep = No AND Comps-Infect-Thoracotomy = No AND Comps-Infect-Septicemia = No AND Comps-Infect-Leg = No AND Comps-Infect-UTI = No AND
Comps-Neuro-Stroke Perm = No AND
Comps-Neuro-Stroke Trans = No AND
Comps-Neuro-Cont Coma>=24 = No AND
Comps-Pulm-Pulm Embolism = No AND
Comps-Pulm-Pulm Pneumonia = No AND
Comps-Renal-Renal Failure = No AND
Comps-Vasc-Ao Dissect = No AND
Comps-Vasc-Illiac/Fem Dissect = No AND
Comps-Vasc-Acute Limb Isch = No AND
Comps-Other-Heart Block = No AND
Comps-Other-Card Arrest = No AND
Comps-Other Anticoag Comps = No AND
Comps-Other Tamponade = No AND
Comps-Other-GI Comps = No AND
Comps-Other-Multi Sys Fail = No AND
Comps-Other-AFib = No
The above examples should yield no records. If there are records that meet the criteria, those records need to be looked up, verified, and if need be, changed.
These are only a few examples of how to cross-check parent-child relationships. Always remember that fellow STS users, your vendor, and DCRI are your resources to help you set up cross-checks.
(3) DCRI consistency edits (Attachment C)
Attachment C lists the 9 DCRI consistency edits. A consistency edit is a standard rule for editing the data to make information consistent. These changes ensure that each data item is consistent with the other data on the record.
The criteria needed to cross-check the consistency edits is included in Attachment C.
These edits are done at the DCRI level as part of the harvest process. All data that is changed based on these edits is reported back to the site as part of the post-harvest data quality report.
Data changed as a result of the consistency edits will affect the STS risk modeling. This means that the STS risk ratios that are calculated at the site level will not match the STS risk ratios for the same site that are calculated at the DCRI level during the harvest process.
It is recommended that the data variables involved in the consistency edits be cross-checked and changed at the site level pre-harvest so that no data will need to be changed at the DCRI level.
(4) Variables involved in STS risk modeling (Attachment D)
Attachment D lists all of the data variables involved in STS risk modeling. If time dedicated to the STS database is an issue, prioritizing which data variables to cross-check may need to be considered.
The bare minimum data variables to cross-check are all data variables involved in STS risk modeling. After referring to Attachment D, the specs should be checked for the information needed to create a query, and the query should then be created to check for missing data and data that is out of range (for numeric data).
If a numeric data variable is null and is used for STS risk modeling, DCRI assigns a normal value to this variable. This means that the STS risk ratios that are calculated at the site level will not match the STS risk ratios for the same site that are calculated at the DCRI level during the harvest process.
Always remember that fellow STS users, your vendor, and DCRI are your resources to help you set up cross-checks.