Family Practice
© Oxford University Press 1996
VoL 13, No 5 Printed in Great Britain
Real world data—retrieval and validation of
consultation data from four general practices
Richard D Neal, Philip L Heywood and Stephen Morley*
Neal RD, Heywood PL and Morley S. Real world data—retrieval and validation of consulta-tion data from four general practices. Family Practice 1996; 13: 455-461.
Objective. We aimed to retrieve data on consultations from general practice databases
and to develop and use appropriate methods of validation for these data.
Method. MIQUEST software was used to retrieve the data from four practices. The data
were validated by comparing them with figures generated by practice-based searches, measuring the uptake of recording of consultations over time, and comparing records of consultations in the case notes with those on the practice computers.
Results. The required data were retrieved from general practice databases, but the path
to success was difficult, and typified by uncertainty and unpredictability. The recording of consultations on the computers of four practices was more complete than the record-ing in the paper case records. There was a time period in the early months of computer use when the recording of consultations was less complete. There were differences in the completeness of recording consultations between practices, doctors, and patients.
Conclusions. This study confirms the potential of general practice databases for research,
demonstrates how MIQUEST software can be a useful tool in retrieving data from general practice databases, and indicates how the completeness of data recording permitted further analysis for the purposes of our study.
Keywords. Data retrieval, general practice, consultation, MIQUEST, validation.
Introduction
In primary care settings, increasing quantities of data are now being collected and held electronically. Databases in primary care therefore offer a rich potential source of data for research.1 The ease with which this
data can be retrieved, and ways in which the data can be interpreted and validated are therefore of importance. Researchers considering using data that have been col-lected by others for a different reason first need to gain access to the data,2 and second they need to ascertain
the validity of the data—do the data actually record what they purport to record?
As the validity of data is often unknown, it is impor-tant to develop appropriate and rigorous methods of validation. Medical records, whether paper or
elec-Received 2 May 1996; Accepted 17 May 1996.
Centre for Research in Primary Care, University of Leeds, 30/32 Hyde Terrace, Leeds LS2 9LN and 'Division of Psychiatry and Behavioural Science in Relation to Medicine, University of Leeds, 15 Hyde Terrace, Leeds LS2 9LT, UK.
tronic, record events. Records are valid when all events are recorded and all entries in the record signify an event. The researcher must be able to clarify exactly what has been recorded, and what each recording ac-tually means; this is the process of validation. Data from one source may be compared with that from another (e.g. electronic and paper records); although there may be no way of knowing which is the more accurate. In this study, computerized consultation data were com-pared with written consultation data. Both of these may be regarded as proxies for the consultation—an event in the real world.
The paper describes the process of retrieving and validating large sets of data routinely collected about consultations in four general practices. We were col-lecting data for a study looking at attendance patterns of a large number of patients from 1990 to 1995. The paper shows how the data were retrieved from the prac-tices, and how difficulties encountered in retrieving the data were overcome. We also demonstrate how the validation process can be designed to address the specific needs of the data sets in a study.
456
Method
The data setOur research required a data set that included a date record of all consultations made by every patient on the practice lists between 1990 and 1995, linked to a limited amount of associated information. This included, for all patients fully registered for General Medical Ser-vices: sex, dates of birth and death, dates of joining and leaving the practice, the dates of all consultations, where they took place and whom they were with. Prac-tices were approached to determine whether computer-ized consultation data had been collected, and whether access to these data were possible.
Morbidity Information Query and Export Syntax (MIQUEST) software,3"1 which anonymizes the data
by generating a unique reference code, was used to run the search, facilitated by using an Health Query Language (HQL)3 interpreter supplied by Egton
Medical Information Systems (EMIS).
Uptake of recording of consultations
The practices were uncertain about the reliability of their data in the early months following introduction of their computer systems; the number of consultations recorded per month, over successive calendar years, for each practice was plotted. The graphs were then analysed identifying a learning period after which the number of consultations reached a consistent monthly pattern. Other factors were considered that may have affected the consultation rate or the recording of consultations (e.g. stability of the list size, appointment systems, dif-fering use of the computer). The data relating to this learning period were discarded, and only data subse-quent to it were subject to further validation.
Comparison with practice data
The data were compared with figures produced by the practices' own search software for the current list size, the annual consultation rate, the night visit rate, and the workload of each doctor.
Comparison of notes with computer records
As all of the practices endeavoured to keep thorough case notes and computer records of consultations, com-parisons of recording between the two were possible. A random sample of patients was selected from each practice, and their paper and computer records matched. The dates of all consultations recorded in the notes were compared with those from the data sets, and vice versa. Continuation sheets and all other relevant documents in the notes were searched. Where discrepancies oc-curred in the data (either the recording of consultations on the computer but not in the notes, or vice versa), the nature of the contact and the name of the health pro-fessional were, wherever possible, identified.
The use of the MIQUEST software, for this type of search, was validated in Practice A, prior to its use in subsequent practices. As it was not known whether the MIQUEST search would identify all the patients and their consultations recorded on the practice computer, two random samples of patients were generated in dif-ferent ways and compared. Random numbers were used to generate a list from the data set, and case notes were identified by using random numbers to define the posi-tion of the notes in the general practice filing cabinets. As a consequence of the validation in Practice A, and because non-doctor consultations were not recorded in a standard way between the practices, all non-doctor consultations were removed from the data sets of Prac-tices B, C and D. The data sets of PracPrac-tices B and D contained a number of double consultations, where a second consultation was erroneously recorded with the same doctor on the same day. Consequently, in Prac-tices B, C and D, second and subsequent consultations were removed when patients had more than one con-sultation in a day.
Results
Description of the practices
Thirty-eight practices were contacted, of which four were recruited. Three other practices had appropriate data; one failed to respond to an initial approach and discussion, one practice was willing to participate but was not eventually recruited, and one was unwilling to provide access to the data. The remaining 31 prac-tices had not collected the required data.
The practices were all stable and well-established gTOup practices, in terms of both the doctors and their patients. Some of the features of the practices are shown in Table 1. All of the practices endeavoured to keep accurate computer records (using EMIS software) and to maintain complete and thorough A5 ('Lloyd George') notes. The doctors consulted with the computer set to 'consultation mode'; the fact that a consultation occur-red was recorded by using the computer during the con-sultation, or by pressing the return key. There were minor differences in the recording of telephone and third-party consultations. Although recording of home visits was carried out in a variety of ways, each of the practices had an established system of entering the data onto the computer.
Retrieving the data
Searching for, and downloading the data was a difficult and time-consuming process. It took 4 months from the first contact with Practice A to obtain the data, and similar difficulties continued with the other practices. Altogether, over 50 telephone calls for assistance were made to the software suppliers, and over 55 visits made to the practices. The reasons for these unexpected
Data from general practice databases
TABLE 1 77K practices—list sizes, number of partners, and number of consultations
457
Practice
Number of Partners List size (data set)* List size (practice)* Difference in list size (%) Total number of patientsb
Number of consultations A 4.5 7874 7860 + 14 (0.2) 10112 134312 B 4 6535 6565 - 30 (0.5) 8727 122826 C 6 11476 11487 - 11 (0.1) 14027 151355 D 10 22439 22329 + 110(0.5) 30894 344616 * As of data of validation
b On practice list at any time from cut off date to time of search.
difficulties were centred around problems with both the MIQUEST software and the HQL interpreter, com-pounded by the magnitude of the data. It proved im-possible to predict whether the data retrieval would ultimately be successful; how long the entire process would take; how many times the searches would have to be run before they ran correctly; how long it would take to run the searches and download the data; what effect running the searches would have on the normal running speed of the practice computer; or, indeed, what was the best way of downloading the data. Many of these questions were answered during the process, and the best method of downloading the data was only discovered in the fourth practice (transferring the files to the practice's hard disk, 'zipping' them up, and then downloading to floppy disks).
Figures provided by the practices were consistent with those from the data sets. The differences between the list sizes generated by each practice compared with those generated from the data set were small (Table 1). The MIQUEST search accurately identified all the patients from the sample derived from the filing cabinets and 100% of their consultations. The software was therefore validated for the purposes of this study.
Uptake of recording of consultations
There were no significant external reasons for changes in the consultation rate in any of the practices. The graphs were therefore taken at face value, and the start dates for useful data analysis points defined. During the first 6-12 months of computer use in each practice, the number of consultations increased until an 'annual cycle' of consultations per month was reached, demon-strating peaks and troughs in certain months of the year, which was consistent between practices (Fig. 1).
Comparison of case notes with computer records
Two sets of notes were unavailable for the validation process in Practice B, and one in Practice D. From the four practices, eight patients had left the lists and two had died between the end of the study period and time
of validation. Thirteen more sets of notes were therefore selected randomly.
The samples of patients from each practice were com-pared with the full lists of patients for each practice, and found to be representative in terms of age and sex, number of consultations, surgery:visit ratios and the distribution of consultations between doctors.
Consultations that were recorded either in the notes or on the computer, but not on both, were spread evenly over the time period; there was no evidence that the number of discrepancies lessened as time passed. There were up to 10-fold differences in discrepancies between doctors within practices. In Practice A, over 30% of the discrepancies were either practice nurse or com-munity midwife consultations. This became a major reason for removing non-doctor consultations prior to analysis in the other three practices.
The overall recording of consultations in the notes and on computer is shown in Table 2. For all practices, a larger percentage of visits compared to surgery con-sultations were not recorded in the notes (Table 3). In all four practices, doctors were more likely to fail to record consultations with patients who were female, older, and had more consultations. One patient's en-tire set of continuation sheets was missing, and two other patients had several consecutive continuation sheets missing.
Discussion
There is great potential to use data from primary care for research purposes. However, the problems high-lighted here show some of the difficulties that may be encountered, particularly with the use of new software. Data retrieval can be extremely time consuming and difficult, and help may be needed from practices and the software suppliers. The process of retrieving the data was characterized by uncertainty and unpredic-tability. However, as the use of software such as MIQUEST increases, data retrieval should become
4000
3000
2000
1 0 0 0
-Jan. Feb Mar. Apr. May Jun. Jty. Aug. Sep. Oct Nov. Dec Month Practice C 4000 3000 2000 -£ s 1000
-Jan. Feb Mar. Apr. May Jun. J)y. Aug. Sep. Oct. Nov. Dae Month 1990 1991 1992 1993 1994 •• 2 S S 3000 2500 2000 1500 1000 500 0 -/ • v / ' - - - . - - / 1 1 1 1 1
1
1 1 1 1 1 00Jan. Feb Mar. Apr. May Jua Jly. Aug. Sep. Oct. Nov. Dec. Month Practice D 8000 6000 3 4000 2000
-Jan. Feb Mar. Apr. May Jun. Jly. Aug. Sep. Oct Nov. Dec. Month
1995
FIGURE 1 Annual cycle of consultations per month. Practice A—start date for useful data: July 1991, includes same-day and non-doctor consultations. Practice B—start date for
useful data: January 1991, excludes same-day and non-doctor consultations. Practice C—start date for useful data: October 1991, excludes same-day and non-doctor consultations. Practice D—start date for useful data: July 1990, excludes same-day and non-doctor consultations
Data from general practice databases
TABLE 2 Number of consultations (surgery and visits) recorded on computer and in notes
459 Practice A* Recorded Not recorded Total Practice B Recorded Not recorded Total Practice C Recorded Not recorded Total Practice D Recorded Not recorded Total Recorded 2616 47 2663 1505 14 1519 942 48 990 1317 48 1365 Notes (%) (89.6) (1.6) (91.2) (91.6) (0.9) (92.5) (88.5) (4.5) (93.0) (87.1) (3.2) (90.3) Not recorded 257 0 257 124 0 124 75 0 75 147 0 147 (ft) (8.8) (0.0) (8.8) (7.5) (0.0) (7.5) (7.0) (0.0) (7.0) (9.7) (0.0) (9.7) Totals 2873 47 2920 1629 14 1643 1017 48 1065 1464 48 1512 (%) (98.4) (16) (100) (99.1) (0.9) (100) (95.5) (4.5) (100) (96.8) (3.2) (100)
1 Includes non-doctor and 'same-day' consultations.
TABLE 3 Number of visits recorded on computer and in notes
Practice A* Recorded Not recorded Total Practice B Recorded Not recorded Total Practice C Recorded Not recorded Total Practice D Recorded Not recorded Total Recorded 179 2 181 166 2 168 101 4 105 150 4 154 (*) (81.7) (0.9) (82.6) (88.7) (1.1) (89.8) (80.8) (3.2) (84.0) (86.7)
a.3)
(89.0) Notes Not recorded 38 0 38 19 0 19 20 0 20 19 0 19 (ft) (17.4) (0.0) (17.4) (10.2) (0.0) (10.2) (16.0) (0.0) (16.0) (11.0) (0.0) (11.0) Totals 217 2 219 185 2 187 121 4 125 169 4 173 ( » ) (99.1) (0.9) (100) (98.9) (1.1) (100) (96.8) (3.2) (100) (97.7) (2.3) (100)460
easier. MIQUEST has been specifically designed to operate on different general practice software systems, and has the further advantage that, with appropriate safeguards, can be used via a modem link into the surgeries; researchers can therefore access data from a variety of software systems from outside of the surgery.
The four practices in this study may not be typical: they were recording consultations on their computers in 1990, when less than one-third of practices were doing so;6 furthermore, they were willing to share
their data.
Although there were minor differences between prac-tices in how they recorded indirect consultations, all practices recorded face-to-face consultations in the same way. In Practice A, 30% of the discrepancies were made by the practice nurses or community midwives, and was one reason for removing all non-doctor consultations in subsequent practices. This apparent difference in accuracy between nurses and doctors most probably reflects lack of access to either the computer or the notes at the time of consultation, but may warrant further study. Comparisons between the results of Practice A and Practices B, C and D must therefore be undertaken with care, as data from the latter practices did not contain non-doctor or 'same-day' consultations.
Differences in the list sizes generated by the prac-tices and by MIQUEST were small, and were due to a combination of factors including: delays in coding those patients who join, leave or die, a difference of a few days in the calculation of the figures, and coding differences.
The production and analysis of the monthly charts was found to be an integral and useful part of the valida-tion process. We correctly predicted that the charts would provide a simple and graphic representation of the uptake of recording of consultations, in order to exclude data from a time where the reliability of data recording was questionable. There was a remarkable similarity between the practices in the patterns revealed in the graphs. They show a similar learning period of 6-12 months before a 'steady state' was reached in recording consultations.
Some consultations may have taken place that were not recorded either in the notes or on the computer. Similarly, some consultations may have been recorded that did not actually take place. No allowance could be made for either possibility.
As would be expected, those who had more consulta-tions (females and older patients) had more discre-pancies. There are several reasons why there were discrepancies in data recording. First, the computer may be unavailable, for example consulting when the com-puter is turned off, or when on a home visit. Second, the notes may be unavailable, for example if they are in use by another person, they are mislaid or missing, or when patients are seen out of hours. Third, there are
issues such as motivation, human error (for example using the wrong patient's records), and forgetfulness, which vary between individuals and practices (high-lighted by inter- and intra-practice variations in discrepancies). Our experience is similar to that of a study which found that the success of a computerized record system is dependent upon the users of the system.7
In each practice, more consultations were recorded on the computer; therefore analysis of computer records is likely to be more complete. These findings are in keeping with two other studies. The first found that 91 % of all consultations were entered onto the computer, with
12% being entered only on the computer.8 The second
was from a study on data recording in four EMIS prac-tices, and found that the computer data were at least as accurate as the manual records for diagnoses, prescriptions and referrals.9 Furthermore, it has long
been established that paper records may be inaccurate and be generally of a low standard.10
We believe that the findings from this paper have im-portant implications. First, the paper demonstrates an example of using routinely collected general practice data for research, and by doing so helps to confirm Pringle and Hobbs' assertion that large general prac-tice databases can be of sufficient quality to be of value in research.1 Second, it shows how and why data can
be validated; although the methods described may be of use in other settings, the important principle is in designing specific validation tools for individual studies. Third, it confirms the potential of MIQUEST software as a tool for general practice research. Lastly, it has shown the data recording to be sufficiently complete in each of the four practices to permit further analysis of the data for the purposes of our own study.
Acknowledgements
We would like to thank the four practices for their co-operation, and Chris Storah at EMIS and Kevin Allan at Northumberland Health for their technical assistance.
References
1 Pringle M, Hobbs R. Large computer databases in general practice. Br Med J 1991; 302: 741-742.
2 Murphy E, Spiegal N, Kinmonth A-L. "Will you help me with
my research?" Gaining access to primary care settings and subjects. BrJ Gen Pract 1992; 42: 162-165.
3 Allan K, Markwell D. MIQUEST Project Report. Leeds: NHS Executive, 1994.
4 The Clinical Information Consultancy. Collecting health infor-mation from general practices. / Informatics in Prim Care,
1995; 10.
5 Allan K, Murphy P. Health Query Language can be used for collecting data from general practices. Br MedJ 1996; 312: 978.
Data from general practice databases 461 6 NHS Management Executive. Computerisation in GP Practices ' Pringle M, Ward P, Chilvers C. Assessment of the
com-1993 Survey. Leeds: Department of Health, com-1993. pleteness and accuracy of computer medical records in four
7 Njalsson T, Sigurdsson JA. Doctors, computers and quality of practices committed to recording data on computer. BrJ Gen registration. An audit on prescription items and x-ray Pract 1995; 45: 537-541.
requests. Eur J Gen Pract 1995; 1: 59-62. 10 Mansfield BG. How bad are medical records? A review of the 8 Scobie S, Basnett I, McCartney P. Can general practice data notes received by a practice. J R Coll Gen Pract 1986; 36:
be used for needs assessment and health care planning in an 405-406. inner London district? J Publ Health 1995; 17: 475-483.