As with any research study, there are various ways in which the nature of the methods and findings were limited, with the implication that caution must be taken in drawing conclusions.
The use of ProMES as the principal method does carry with it some limiting factors and, therefore, needs to be considered in context. The nature of ProMES is that it is a bottom-up approach, with decisions about measurement driven by those closest to the activity being measured. Generally, this is a sensible strategy, but there can sometimes be a benefit in viewing from afar. Although the workshops attempted to get participants to take a step back to consider the objectives and indicators involved in general practice, it is unlikely that they would ever be able to present a truly independent view. There is certainly also a potential limitation caused by the non-random nature of the participants. Although it was ensured that the sample came from a wide variety of geographical locations and types of practice, the participants were still
self-selecting and there is no way of knowing how a different group of participants (whether professionals or patients) may have come to their conclusions. This was mitigated to some extent by having at least two different workshops looking at each set of questions; there was generally good agreement between them. However, it is impossible to know how those who could not participate might have differed in their views.
In addition, the large-scale implementation of ProMES that was used in this study, with representatives of multiple teams contributing to the process, is less well tested than the implementation in a single team (although this has been done successfully multiple times before). There was no reason to suggest that this did not work appropriately, but there is more likely to be some bias introduced by the fact that a relatively small number of self-selecting participants (relative to everyone working in general practice in England) were responsible for making the decisions. The additional complexity of the general practice task and working environment means that the resulting tool needs to be subject to extra scrutiny and comparison. The researchers attempted to mitigate for this by using a further consensus exercise and a detailed evaluation of the pilot study, but further consensus work might be needed before the GPET is implemented on a larger scale. In particular, it may be useful to carry out similar exercises within a single team (or a range of teams) to examine what differences, if any, would result.
The sheer scale of activity in general practice required a much larger number of indicators to be needed than had been anticipated. This had several knock-on effects, as extra workshops needed to be arranged, and the consensus exercise condensed into a shorter timescale.
The nature of indicators needed for such a tool was also a limiting factor. As a result of the requirements of the ProMES process, indicators needed to be collectable using little or no extra effort (otherwise practices would not have been willing to participate) and needed to be measurable on a regular basis, preferably monthly, or at least quarterly. Moreover, less regular data could be included, but this would limit the ability to change over a shorter period of time. Although practices might, in principle, have been willing to put more effort into data collection for a helpful result, the reality of the situation is that most practices cannot spare many hours on such a regular basis with current pressures. Therefore, it seems essential that any future versions of this tool (or other similar tools) are designed in such a way as to reduce the burden on practices to the greatest possible extent, to give it the maximum chance of success.
The regular nature of this data collection certainly caused some issues. There are a number of indicators that are collected annually or biannually (e.g. Public Health England practice profiles or General Practice Patient Survey data) that could not be included as a result. This particularly limited the options for indicators in one area: public health. In addition, many of the indicators that were included, even though measurable monthly, were not likely to change very much (if at all) in a short timescale. Therefore, future iterations of the GPET might consider a range of different timescales for indicators. This would mean that effectiveness scores would be likely to change less on a regular basis, but that might be a price worth paying for the increased inclusion.
A different approach entirely would have been to determine what components an ideal measurement of productivity should include, and then develop ways to measure these. Although that may have been more successful in producing a standardised, comprehensive measure of productivity, as per the original objective, it would have necessitated a completely different research design, and it is not possible to know how successful that might have been.
As discussed earlier in this chapter, some of the indicators that were included were not gathered with complete fidelity (or at all) by all practices. There were a variety of reasons for this, with the single biggest reason being accuracy of practice records in some areas. In the future, the use of such a tool should be accompanied by very clear advice about what data to collect and how such data can be stored to enable easier use of the tool.
DISCUSSION
NIHR Journals Library www.journalslibrary.nihr.ac.uk 106
As a result of the first stage of the study being longer than anticipated, the second stage (the pilot study) was compressed to a significant degree. This meant that not all practices could complete the full 6 months and it also meant that the evaluation overlapped with the pilot period itself. Although this was beneficial in some respects (i.e. it meant that more current responses could be given to many of the questions), it did mean that practices did not have the benefit of reflecting from a distance on the usefulness of the GPET. In addition, the short time scale meant that practices had only a few months to make changes that could affect their effectiveness. In reality, many such changes could take longer to make, and far longer still to take effect. Therefore, to evaluate how changes might actually make more of a difference, a greater time horizon would be needed: tracking change over the course of at least 1 year, and preferably longer, would ensure that there was more scope for this.
The short time scale (even without the compression) meant that it was not possible to take stock after the consensus exercise as much as might be done. Given the findings, one option at this point would have been to reframe the tool as a QI tool; with added time and input, and a different theoretical framework, this may have led to an improved tool. The findings that may have led to this conclusion were not anticipated when designing the study, but, in future, other studies using a similar approach may build in this opportunity more clearly from the start.
Likewise, the lack of a control group was a significant limitation. There is no way of knowing to what extent the practices in the evaluation would have improved on the chosen indicators over this time period anyway; to some extent, this may be plausible because of the time of year (in particular with many practices starting the pilot at the end of the winter period, and continuing into the summer months), natural maturation or an observation effect (i.e. being aware of being studied). Although it is tempting to suggest that another study, involving a control group, would be beneficial here, such a study would have to be very carefully designed, as the process of gathering the data itself (some of which would need to come from within the practice) could be seen as an intervention in its own right. Therefore, it is likely to remain impossible to estimate the true effect of the use of the GPET on effectiveness, although the combination of the effectiveness changes observed, the evidence on changes made from the evaluation and the prior literature on changes made by teams using a ProMES process suggest that it is likely that at least some of the change can be attributed to participation in the study.
The incompleteness of evaluation data also meant that there was a limited view of outcomes, overall. For most of the evaluation questions, most practices did provide data via both the telephone interviews and the practice manager questionnaire; however, for some there was less good data, and some practices did not find it easy to provide financial data, meaning that it was not possible to construct a productivity index (ratio of quality-adjusted outputs to inputs) from the data. Any future efforts to gather such data might try alternative methods of extracting this separately, for example working with practices to give clearer advice on which financial data should be included, and doing this separately rather than as a section of a longer questionnaire.