• No results found

TELEMEDICAL services are being developed all over

N/A
N/A
Protected

Academic year: 2021

Share "TELEMEDICAL services are being developed all over"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Evaluation of Telemedical Services

Rolf Holle and Gudrun Zahlmann,

Member, IEEE

Abstract— With the rapidly increasing development of tele-medicine technology, the evaluation of telemedical services be-comes more and more important. However, professional views of the aims and methods of evaluation are different from the per-spective of computer science and engineering or from medicine and health policy. We propose that a continuous evaluation strategy should be chosen which guides the development and implementation of telemedicine technologies and applications. The evaluation strategy is divided into four phases in which the focus of evaluation is shifted from technical performance of the system in the early phases to medical outcome criteria and economical aspects in later phases. We review the study design methodology established for clinical trials assessing therapeutic effectiveness and diagnostic accuracy and discuss how it can be adapted to evaluation studies in telemedicine. As an example, we describe our approach to evaluation in a teleconsultation network in ophthalmology.

Index Terms— Evaluation, study design, telemedical services, telemedicine.

I. INTRODUCTION

T

ELEMEDICAL services are being developed all over the world [1]–[4]. They differ not only in their ap-plication area, e.g., radiology, dermatology, and almost all other medical subdisciplines, but also with respect to the medical task and the different partners as in teleconsultation, telescreening, or telesurgery. From another viewpoint, there is a distinction according to the implementation of services of disciplines like computer science, engineering, statistics, etc., within the telemedical service. The complexity ranges from teleconsultation connecting two medical partners from the same subspecialty by a single telecommunication channel to teleinformation services using “intelligent” agents for a worldwide knowledge retrieval.

Telemedical projects rely heavily on interdisciplinary co-operation due to the diversity of problems that must be solved, ranging from medical tasks to telecommunication solutions. This makes the assessment of the overall deliv-ered service—the telemedical service—very complex. Some authors have pointed out the distinction between the evaluation of the telemedical technology ( telemedical system) and of healthcare employing this technology ( telemedical service) [3], [5].

In computer science, engineering, and other technologi-cally oriented professions, it is good scientific practice that technical solutions are designed according to very detailed specifications. The test procedures for this purpose are well established. The evaluation of complete medical services is a

Manuscript received January 4, 1999; revised March 2, 1999.

The authors are with GSF-National Research Center for Environment and Health, Institute of Medical Informatics and Health Services Research, 85764 Neuherberg, Germany (e-mail: holle@gsf.de, zahlmann@gsf.de).

Publisher Item Identifier S 1089-7771(99)04671-3.

task beyond the limits of these professional fields. However, since the development of knowledge-based systems and the attempts to establish them in medical practice, it has been realized that the overall evaluation is a much more serious task than just checking some sophisticated test examples [6]. From the medical perspective, it is not sufficient to describe the quality of single components; the overall outcome for the patient has to be the primary focus and should be optimized under specific circumstances such as available medical staff, resources, reimbursement, necessary equipment, etc.

There are practical reasons why the evaluation of telemed-ical services often falls short. Between the lifetime of a telemedical project, whatever the basis of funding was, and the real availability of the telemedical service in the market place, there is often a considerable gap. One reason is that within a three-year term—the normal funding period—the problem description, a first technological solution, and only first experiences from an application can be given. Most projects end with more questions and ideas than they started with, but the funding stops. At this project level, it is almost impossible to organize large-scale projects, and, therefore, real-life experiences are too restricted to convince policy makers that such services may have an impact on improving the quality of service or that they can reduce costs. One possible way out of this dilemma could be the foundation of cooperative groups for telemedicine evaluation [3].

In this paper, we want to emphasize the necessity of a com-prehensive evaluation strategy for applications in telemedicine, which starts with the first step into medical practice and, if the application shows to be successful, may end with an evaluation of cost-effectiveness. Evaluation is regarded as a continuing empirical process in which study design issues play an important part. We will review the methodology suited for the evaluation of telemedicine and illustrate the importance by some examples.

Our main application focus is on teleconsultation services. Teleconsultation connects two or more partners for, e.g., giving a second opinion about a single case by a remote medical expert and discussing it with the local healthcare provider and/or the patient. As an example, we describe our approach to evaluation in a teleconsultation network in ophthalmology.

II. EVALUATIONMETHODOLOGY FORTELEMEDICINE? To start with, it has to be clear that the evaluation of telemedical services requires no unique and special method-ological approach of its own. Telemedicine is nearly as hetero-geneous as conventional medical practice, and the evaluation methods of telemedicine are in the first place those needed for studying conventional diagnostic or therapeutic procedures. 1089–7771/99$10.00  1999 IEEE

(2)

Nevertheless, there are some characteristic problems com-mon to many telemedical applications which complicate the design and conduct of rigorous studies and which call for specific solutions. Some of these problems are similar to those encountered in other areas of health technology assessment.

• Rapid technological progress makes long-term studies irrelevant because the equipment is outdated by the time the results are published.

• Telemedicine often requires a distribution of new tech-nology to former nonusers and therefore depends greatly on the acceptability of this new equipment.

While the latter fact requires the researcher to provide enough time for training and an initial learning phase, the former may cause him or her to keep it as short as possible.

Other problems are more specific for the “distance” aspect of telemedicine.

• Many relevant telemedicine applications are confined to specific geographic regions with a sparse population and, thus, large sample sizes that are needed for statistical reasons are hard to obtain.

• Telemedical applications often involve many institutions using different systems and technologies which compli-cate the necessary technical standardization.

Another characteristic of telemedicine is the fact that typ-ically a new intervention is not provided but an intervention of known or generally assumed effectiveness is provided in a different way. This has implications on the choice of outcome criteria. While the efficacy of healthcare will always be the primary objective and cost the limiting factor, there may be other outcome measures for which substantial gains for the patient and for the healthcare provider are expected by telemedical services, e.g., by:

• saving time and money by improving the direct access to other healthcare providers and by reducing the need for transport;

• improving patient satisfaction, e.g., by actively participat-ing in a synchronous teleconsultation.

On the physicians’ side, these gains may be counterbalanced by additional efforts and costs, at least during the initial phase. Evaluation studies have to investigate whether in the long run the net benefit from cost reductions for communication, higher patient numbers, etc., will be substantial.

III. PHASES OF THEEVALUATIONPROCESS

The choice of an adequate design for an evaluation study depends on the aim of the study, which very much depends upon the phase of development of the telemedical technol-ogy. Therefore, a distinction of phases in the evaluation of telemedical technology will give a helpful orientation for designing an adequate study. The characterization of different phases, usually four, is well known from the field of drug development and testing and has been adapted in slightly different ways to decision support systems by several authors [7]. We propose the following division into four phases of evaluation in telemedicine.

• Phase I—Technical Pilot Study: The quality and per-formance of the hardware and software used for the

communication of partners in telemedical applications is critical for the usefulness of the system. This becomes a major issue when medical images are to be transmitted. In phase I, technical pilot studies are performed to find the optimum technical conditions under which the telemedi-cal application promises to yield the best result. Phase I studies are purely exploratory and may often be planned in several trial-and-error steps.

• Phase II—Feasibility Study: The first step out of the laboratory aims at showing the feasibility of the telemedi-cal application under well-controlled practitelemedi-cal conditions. Typically, only a single or few institutions participate in the study, optimum technical equipment is provided, and only specially trained and highly motivated personnel are involved. Phase II studies are often, but not necessarily, performed without a control group, because the focus is on how the system functions in interaction with its potential users and with the communication network into which it should be integrated. Therefore, evaluation criteria are in the first place directly related to the system (process quality, user satisfaction).

• Phase III—Controlled Effectiveness Study: Only when the feasibility of the application has been shown and the functioning and the acceptability of the system has been optimized, time has come for a controlled trial comparing the effectiveness of the telemedicine service with standard healthcare. The evaluation is focused on outcome quality which should either improve or at least not decline, if the system has other evident advantages (reductions in cost or time). Phase III studies should be performed in a representative sample of many institutions on the basis of standard technical equipment and in an unselected sample of users with minimal training, thus reflecting routine conditions of medical practice.

• Phase IV—Cost-Effectiveness Study: When the effects of the telemedical application have been demonstrated in a phase III field trial, the question of cost effectiveness will still be unanswered in most cases. Data for the analysis of cost effectiveness may come from a phase III study, but usually a model-based analysis using methods like decision trees or simulated Markov models is performed. A different terminology representing a different focus in the evaluation process is common in the domains of computer sciences and engineering. Here we find a distinction between verification, validation, and evaluation [8]. Main contributions were given by the artificial intelligence community when facing the problem of assessing the content of knowledge bases. Verification and validation were the main focus here and were supported by the development of automated tools, special languages, and mathematical models [9].

The whole process is described as a spiral life-cycle model [10]–[12]. Each developed part or module of the overall solu-tion must be verified in its technical appropriateness, validated according to the medical task, and then evaluated considering the functionality, user friendliness, and clinical/medical value. Verification ( “doing the system right” [13]) refers to the checks of the technical components of telemedical ser-vices. For teleconsultation services, for instance, verification

(3)

may start by assessing the synchronous or asynchronous telecommunication channels according to technical bench-marks like bandwidth, transmission speed, data quality, etc., [1], [14]. “Validation ( ‘doing the right system’ [13]) is the determination of the correctness of the final program or software produced from a development project with respect to user needs and requirements” [15]. For the teleconsultation service, this means that the overall communication equipment as a whole (hardware and software) must be assessed as well as the acceptance of the technical and content-related solutions.

Engelbrecht et al. [10] distinguish between three broad steps of the evaluation process which they call the micro, midi, and macro stage. However, since these authors have a different focus directed more to the development of a system, our phases II and III both belong to the macro stage, which Engelbrecht et

al. [10] differentiate as “limited field testing under controlled

conditions” and “extended field testing.”

In summary, studies in phases I and II of our approach will mainly be initiated by the developer of a telemedical system aiming to show that it is worthwhile to carry on with its development and implementation. These phases refer to verification and validation of the second approach. Phases III and IV are oriented toward the medical community and the society, aiming to show that the telemedical service should be introduced on a large scale. They refer in part to validation and to evaluation of the second approach. A consequent economical evaluation as in phase IV studies is not included in the evaluation description of the second approach.

IV. DESIGN OF EVALUATION STUDIES

As with many classification schemes the distinction of phases I–IV is not clear-cut but nevertheless they can be seen as helpful landmarks in the continuing evaluation process. Evaluation methods are usually different in different phases of the evaluation process. Since phase I studies share no common design features, we will start the discussion of the evaluation methodology with phase II. For the discussion of methodolog-ical principles the typmethodolog-ical distinction according to the medmethodolog-ical specialty of the telematics application, e.g., teleradiology, telepathology, teledermatology, etc., is not helpful. Instead, the main differentiation in phase III should be between diagnostic accuracy studies and comparative outcome evaluation studies, because these require different methodological approaches.

However, before going into detail, we would like to em-phasize the most fundamental principle: evaluation should be seen as a process based on prospectively planned separate steps. Each step should ideally be guided by prospectively defined aims which allow the researcher to check whether the evaluation criteria have been fulfilled. Thus, a comparative approach, either against a reference procedure or against a predefined standard, should be chosen. Many so-called evaluation studies ignore this basic requirement and therefore often end up with inconclusive results. The development of accepted standards has not yet been sufficiently approached in telemedicine research, and we see this as a first important step toward an evaluation culture in this field.

A. Study Design in Phase II

We have characterized phase II studies as feasibility studies performed under optimum conditions (the term “laboratory conditions” does not seem quite appropriate). The basic study design will be noncomparative in many cases or there will be before–after comparisons, i.e., a historical control group. Since this is usually the first step of studying the telemedical application under somewhat realistic conditions, phase II stud-ies often end up with several suggestions for improvement. It is therefore more important to locate the deficits of the application than to perform an unbiased comparison of the effects of using or not using the telemedicine application. Thus, in early phase II studies, several different evaluation criteria are used that are indicators of the process quality, e.g., the number and types of technical failures, operation time, user satisfaction, etc.

The choice of specific evaluation criteria and the design of questionnaires and documentation forms are important tasks in planning a phase II study. If there is an instrument that has proven to be useful in a comparable study, it should be adapted and used. This will not only save time and effort but it also makes comparisons possible and thereby leads to an establishment of standards. If no suitable instrument is available, one must start by collecting a list of concepts and items that seem to be important evaluation criteria. The questionnaire should cover all relevant aspects but, on the other hand, it has to be kept short in order to ensure acceptability and thereby the data quality.

The attention given to questionnaire design is often in-adequate compared to the total amount of time devoted to other aspects of the study. Established rules for questionnaire development exist, mainly from the field of social sciences research, which are often overlooked because researchers approach this task using common sense only. Some important rules are:

• present questions in clear wording, e.g., avoid negative formulations;

• do not put two questions into one phrase;

• quantify, if possible, and categorize if quantification is based on guesses only;

• use scales with four labeled categories for asymmetric questions and scales with five or seven categories or visual analog scales (VAS) with clearly defined endpoints for symmetric questions;

• do not use different response formats unless it is neces-sary;

• always do a pretest.

A small pilot study is always helpful to select the most useful questions and hint at other important aspects for which detailed information should be collected in the main study. After reviewing the questionnaire, this procedure should be repeated until the contents, wording, and response formats of the questions have been optimized.

B. Study Design for Diagnostic Accuracy Studies

In the evaluation of telediagnostic or telescreening services, the assessment of diagnostic accuracy under routine conditions

(4)

will often be the first step in phase III. Comparative studies of the clinical usefulness of diagnostic procedures are only rarely performed.

1) Specification of the Study Aim: In a telediagnostic

set-ting, a common scenario is to have one or more unspecialized local physicians requesting a diagnosis for a patient and one or more distant experts providing the diagnosis. The central aim of a study would be to estimate the gain in diagnostic accuracy provided by the experts and the extent to which the expert diagnosis is compromised by quality problems in data transfer. Thus, one may either want to compare the diagnostic performance of the local physicians and the distant experts or the performance of the experts with and without telecommunicated images. In the first case, the aim of the study is to show an improvement and in the second it is to show that there is no deterioration of the results.

2) Selection of Patients: In a phase III study under realistic conditions, any selection of “suitable” patients must be avoided. Inclusion criteria of patients for whom the telemedicine service should be applied have to be defined before the start of the study with strict adherence to those criteria. If, for example, patients who are particularly difficult to diagnose (so-called borderline cases) are excluded, this may severely bias the estimation of the diagnostic accuracy and thus the evaluation of clinical usefulness.

3) Definition of Diagnostic Categories: Clear-cut

diagnostic categories should be used which are relevant to the further treatment of the patient. The addition of a probability statement quantifying the certainty of the diagnosis may be useful because it allows one to compare the diagnostic accuracy of different experts using different implicit diagnostic thresholds. Also, equivocal findings as well as findings based on data with quality problems can be included in this way. Ignoring poor-quality cases leads to an overestimation of diagnostic accuracy; therefore, an analysis of these patients should be included (the “intention-to-diagnose” approach as compared to the well-known “intention-to-treat approach” in clinical trials) in addition to the analysis based only on cases with acceptable data quality.

4) Choice of Gold Standard: Whenever an accepted gold

standard exists, e.g., intraoperative or autopsy findings, the telediagnostic system is validated against this gold standard. Of course, the gold standard must be judged independently of the other tests, i.e., assessment of the gold standard must be blind with respect to the diagnostic test and vice versa. If a gold standard may only be defined by expert opinion, this will be subject to intra- and inter-rater reliability problems, and, thus, this variation has to be considered explicitly in the study design (e.g., by consensus methods) and in the analysis. Cases for which the gold standard is missing (e.g., original documents lost) have to be excluded, but it should be checked whether these exclusions lead to systematic attrition in the sample. An extreme example of this will occur when the gold standard procedure is too invasive, costly, or time-consuming and is therefore restricted to a subsample of cases, e.g., those with a positive test. The resulting so-called “work-up bias” may be adjusted for if a random sample of negative test results is also checked by the gold standard. However, a loss in precision in

the estimates of diagnostic accuracy will inevitably occur [19].

5) Sample Size: Sample size calculations for diagnostic

studies require a precise statement of the tested null hypothesis and of the clinically relevant difference as well as a choice of error probabilities. As an alternative, sample size calculation is based on confidence intervals with a given width. There is, however, often a tradeoff between performing a study in a naturalistic setting and attaining high statistical power, because the effective sample size for a diagnostic study depends on the prevalence of the cases. Whenever this prevalence is low in a real-life study, the estimation of the sensitivity of a diagnostic test will be imprecise.

C. Study Design for Comparative Outcome Studies

The aim of comparative outcome studies is to show that the intervention, i.e., the telemedical service, leads to the desired effects with respect to process quality and clinical outcomes. A causal conclusion presupposes that the study has internal

validity, which means that the observed effects cannot be

attributed to other systematic or random influences. Several types of confounding factors, e.g., the Hawthorne effect, have been described which may cause systematic error [16], [17]. What we usually call effects mean the quantitative improve-ment in outcome criteria in the comparison of intervention to no intervention or control. When these effects can be assumed to exist in similar clinical situations outside the study, the study is said to have external validity as well. An elaborate methodology of study design has been developed in clinical trial research which includes various approaches to guarantee a high internal validity, as, for example, randomization and double blinding [18]. This methodology has proven to be useful in assessing drug efficacy, and it has been successfully adapted to the evaluation of other interventions that are less easy to standardize. We will review those issues that are important for the evaluation of telemedical services.

1) Choice of Control Group: The primary question of study design is that of the appropriate control. In drug research, we distinguish between explanatory trials testing the specific pharmaceutical effects (and thus requiring a placebo control group) and pragmatic trials testing the overall effect in comparison to the best available standard. In phase III studies of telemedicine evaluation, the pragmatic approach should be chosen, and, therefore, in the near future, telemedical services have to be compared to standard health care, at least until some of these applications have been established in medical practice as a new standard.

2) Randomization: Randomization is the only way to

as-sure structural homogeneity of the intervention and the control group with respect to known and unknown confounding fac-tors. Therefore, randomized comparative trials have to be regarded as the gold standard for the evaluation of telemedical interventions. However, the implementation of randomization is difficult in many practical situations, and sometimes it may even be difficult to study parallel nonrandomized groups with and without the intervention. This will depend on the degree of integration of the telemedical service into the healthcare system. An integrated system requires its environment to be

(5)

restructured with respect to technical interfaces, user skills, and administrative procedures. As a consequence, a parallel study group not using the telemedical service will be difficult or even impossible to implement in the same environment at the same time.

As a solution, when telemedical services are provided on an institutional basis, randomization of patients can be substituted by randomization of institutions. In practice, this theoretically valid approach is often limited due to problems of recruiting a sufficient number of participating institutions and technical problems resulting from the different technology used by different institutions. Nevertheless, a large-scale introduction of a telemedical service will justify a large-scale randomized study of many institutions.

3) Comparability of Unrandomized Groups:

Nonrandom-ized studies with historical or concurrent control groups still form the majority of trials in the evaluation of telemedicine, and, of course, this is much better than to have no evaluation studies at all. In this case, however, other aspects of study design become even more important in order to guarantee some level of internal validity. Structural homogeneity can partly be attained by applying strict inclusion criteria. These must not be too narrow, which would endanger the external validity of the study, but they have to be identical for the intervention and the control group. The comparability of the intervention and the control group can also be improved by design measures like matching which may facilitate the use of statistical methods of confounder adjustment in the analysis of the study. A close cooperation with a biostatistician will allow one to choose the best suited strategy. In any case, a sample size calculation based on statistical considerations should be performed in order to minimize the influence of sampling error.

4) Standardization of the Intervention: When including

multiple institutions in a study, it is especially important to standardize the complete intervention as far as possible. This includes the technical standardization as well as the usage of the system. Treatment homogeneity should also be guaranteed, which means that the application of the telemedical service should not be confounded with other aspects of treatment quality, e.g., if the intervention groups comprise hospitals with better skilled and better motivated personnel or better equipment. This will easily be the case if only one ward, hospital, or area is compared with one other.

5) Outcome Criteria: Evaluation criteria in phase III

stud-ies should be medical outcome variables in the first place, since it is the primary aim of these studies to show that the outcome quality of medical care can be improved by means of telemedicine or at least stays the same if there is a considerable gain in process quality. Medical outcomes that can serve as indicators of the quality of care may be objective or subjective clinical criteria. The comparison of intervention and control with respect to criteria of process quality will only make sense if this is not either obvious or already demonstrated in phase II studies. Costs are usually not the main endpoints in a phase III study, but cost data should be collected as a byproduct of the study because they are needed in the subsequent phase IV cost-effectiveness evaluation.

D. Study Design in Phase IV

An evaluation strategy for telemedicine would not be com-plete without consideration of financial aspects. The costs of healthcare are already at an upper limit, and telemedical innovations, which often bear enormous initial costs at their introduction, should either lead to subsequent savings, or improve the quality and outcome of health care in a way which can be considered as cost-effective. Economic evaluation of telemedicine is, therefore, the important final step in the evaluation process. We have placed it in phase IV because promising results from phase III should be available before a complicated economical assessment is encouraged.

The most straightforward approach to economic evaluation is to collect cost data within a comparative phase III effec-tiveness study. One has to decide in advance which types of costs are to be included. In the literature a distinction is made between different possible perspectives (patient, healthcare provider, society) and different types of cost (e.g., direct and indirect costs). Direct costs refer to expenditures for goods, services, and other resources which are necessary to provide an intervention including its future consequences. Indirect costs refer to gains and losses in productivity which are associated with the intervention. Since phase III studies usually include only some hundred patients and run no longer than a few years, they are not suited to assess costs and savings which accumulate in the long run or which occur only in connection with certain rare events (e.g., severe complications).

Therefore, model-based analyses have become common practice in health economic evaluation. These are synthetic studies with model-based calculations or simulations where information about those variables which cannot be observed in a prospective randomized study are estimated from other sources or simply by an educated guess. By using a specific model, the associated costs and outcomes of each treatment path can be summed up and weighted by the calculated prob-ability of this path which yields average costs and outcomes for the population under study. Costs and outcomes occurring in later years are down-weighted by a discounting procedure to allow for time preference.

In order to combine expected costs and outcomes in a one-dimensional measure of cost effectiveness, the following approach has become most popular. The difference of average costs as well as the difference in expected outcomes (e.g., life expectancy) are calculated for the interventions under comparison, and the ratio of these two quantities is taken (the so-called marginal cost-effectiveness ratio). Different outcome dimensions for effectiveness (e.g., survival time and quality of life) are made commensurable by transforming them into so-called quality adjusted life years (QALY’s), as in cost-utility analysis, or into monetary units, as in cost-benefit analysis.

The development of methodology for health economic eval-uation has been rapid within the last decades. However, some basic controversies have remained unresolved and now skeptical comments seem to increase after the first wave of over-optimism. To a critical reader, it may appear strange that most published cost-effectiveness studies claim to sup-port the cost effectiveness of the drug or technology under

(6)

consideration. This may partly be due to publication bias but, in our opinion, it is more likely a case of bias in the analysis because existing methodological standards are not strict enough and therefore give enough room to come up with completely different results. It is the uncertainty of the model formulation and of the model parameters which gives room for manipulation and undermines the validity of the results. Methods to handle or estimate the uncertainty exist in the form of various techniques for sensitivity analysis, but they are often not properly used and they give only vague results.

In this paper, there is not enough room for a detailed pre-sentation of the state-of-the-art of cost-effectiveness research nor for a well-founded criticism. We refer the interested reader to a recent textbook [20] and to an article that makes special reference to telemedicine [21].

Little concern has been mentioned whether a single number such as the cost-effectiveness ratio can be accepted to convey so much complex information for decisions of far-reaching consequences. In agreement with other authors, we suggest that the complexity of a global evaluation of cost effective-ness should be made transparent to the decision-maker in a cost–consequence analysis [21] and not be ignored by simply providing values of “dollars per QALY gained” which are difficult, if not impossible, to interpret.

V. EXAMPLE—TELECONSULTATION INOPTHALMOLOGY Within the European Union (EU)-funded project OPHTEL (TELematics in OPHthalmology) and the “Bayern-online” initiative of the Bavarian state government, an ophthalmo-logical teleconsultation network has been established [14]. Six private ophthalmologists are connected among each other and to a university eye department and a computer science research institute (GSF). Synchronous (ISDN-based desktop video conferencing equipment) and asynchronous (PGP en-crypted Internet e-mail) teleconsultations can be performed. The main communication content are patient referrals (to and from ophthalmic surgeons) and complicated rare cases. Telecommunication equipment is implemented in a “commu-nication PC,” which is separated from the practice network but connected to the Internet via ISDN.

A. Technical Pilot Study

A technical pilot study on the basis of lab tests was performed after installation of the special communication PC’s [14].

B. Feasibility Study

After a training phase, a feasibility study was performed by allowing a 12–month free communication period. The content of the teleconsultations were chosen by medical demand given by the physicians. At the end, we asked the participating physicians for their opinion about the telemedical service by giving them questionnaires asking for three main topics: 1) utilization behavior; 2) acceptance; and 3) an overall assessment [22].

1) Separately, for each teleconsultation type (synchronous, asynchronous), we asked for:

• the number of teleconsultations per month;

• the time effort for preparing and conducting the teleconsultation and the result assessment;

• the medical content of the teleconsultation.

2) Considering the acceptance of teleconsultation services, three main areas were assessed:

• technology acceptance; • acceptance of the application; • acceptance of the telemedical service. 3) An overall assessment was requested concerning:

• the importance of the telemedical service for the medical discipline—ophthalmology;

• the effect for the patient care; • the cost-saving potential; • the investment effort; • the time effort;

• the necessity of integrating the teleconsultation equipment into the existing technical equipment. The participating ophthalmologists had a demand of two synchronous teleconsultations per month. The average time effort to perform the teleconsultation was 15 min for the synchronous type. Asynchronous teleconsultations conducted in the study group were related to a special research project and therefore cannot be considered for validation purposes. The overall time effort for preparation, conduct, and analy-sis was 30 min. Medical contents conanaly-sisted of injuries and their consequences, macula degeneration, cataract (referrals to surgery), diabetic retinopathy, and complicated inflammations. On a scale from 1 (not) to 4 (very), the integration into the work place was rated as less difficult (2.7). The learning effort required to handle the new technology and the real application afterwards were evaluated as rather low (2.3). The synchronous teleconsultation required a higher learning effort than the asynchronous one.

The telemedical service—teleconsultation was ranked as very valuable (1.3) and the patients were very satisfied with this new technology as assessed by the physicians (1.0). The cost-savings potential was rated rather high (2.0). At the same time, the investment and the time effort were assessed as also rather high (2.0/2.3).

C. Comparative Outcome Study

In order to demonstrate that teleconsultation in ophthalmol-ogy may lead to a measurable improvement in patient care, a test scenario had to be chosen which fulfilled several criteria:

• important medical task;

• comparable regular scenario available; • validated telemedical service;

• no legal restrictions;

• both types of teleconsultation applicable; • controlled study design.

According to the experiences from the feasibility study, cataract OP management was chosen as the appropriate sce-nario for the teleconsultation evaluation study. Cataract de-scribes any type of opacity of the eye lens. At a certain state of cataract, the replacement of the natural lens by an artificial one may be indicated. This procedure is called cataract OP.

(7)

Usually, the patient visits the private ophthalmologist when vision is impaired. If, in his or her opinion, a cataract OP is indicated, the patient is referred to an ophthalmic surgeon. In the usual setting, patient findings are sent to the surgeon by surface mail and the patient gets an appointment for visiting the surgeon in order to decide whether he or she will perform the operation. After surgery has been conducted, the patient is referred back to the private ophthalmologist for postoperative care.

In our evaluation study, two private ophthalmologists refer patients to three ophthalmic surgeons for cataract OP. In the telescreening setting, the basic information about the patient is sent to the surgeon by asynchronous teleconsultation. To discuss all problems concerning the cataract surgery between the patient, the private ophthalmologist, and the ophthalmic surgeon, a synchronous teleconsultation is conducted. The clinical usefulness of this service is expected to lie in the improved direct communication between the three persons involved and in reduced visits.

We are trying to compare these two settings in a small prospective evaluation study. Two patient groups are included—the historical control group of patients from the preceding time period who have not been subject to the teleconsultation and the experimental group using the teleconsultation scenario. Questionnaires are given to the patients and physicians as well.

The main contents of the patient’s questionnaires of both groups after the surgical treatment are:

• the number of visits at the office of the private ophthal-mologist and the ophthalmic surgeon;

• overall time effort spent;

• the satisfaction level concerning the communication be-tween the ophthalmologist and the surgeon;

• the involvement level of the patient within the decision process;

• the satisfaction level concerning the overall treatment; • the level of the patient’s activity within the decision

process;

• clarity and understanding of the physicians’ activities. The patient group experiencing synchronous teleconsulta-tion received a second questeleconsulta-tionnaire after the teleconsultateleconsulta-tion but prior to surgery. Questions are related to:

• the patient’s impression concerning the new possibilities of teleconsultation;

• estimated time effort;

• quality of video and audio transmission; • active participation during the teleconsultation; • level of understanding of the physicians’ discussion; • the value of the telemedical service concerning the quality

of care.

Two questionnaires have been developed for the ophthal-mologists. One is directly case-related, asking the referring ophthalmologists and ophthalmic surgeons for:

• the time spent for both types of teleconsultation; • the value of teleconsultation for each individual case; • the patient’s reaction to synchronous teleconsultation.

The second questionnaire (with different versions for the two groups of physicians) must be completed as a summary at the end of the study. It contains:

• the technical effort necessary for conducting the telemed-ical service;

• the effort of practicing;

• the effort of implementing the system into the daily working routine;

• the potential for improving the communication between the referring ophthalmologists and the ophthalmic sur-geons;

• the potential for improving the communication with the patient;

• cost increase or reduction for physicians and patients; • time effort;

• the necessary reimbursement level for the telemedical service.

The detailed results of this study are not yet available and will be published later on. If the results of this study are promising, a second phase III study based on a larger sample and focusing on fewer outcome criteria may be indicated.

VI. CONCLUSIONS

The evaluation of telemedical services has become an important issue in telemedicine research. Because of objective and subjective problems, the question of cost effectiveness remains unanswered for most of the telemedical services that are developed worldwide. Objective factors are:

• small project sizes (few participating institutions, small patient numbers);

• limited project funds and funding periods; • rapid technological progress;

• differences in healthcare systems.

On the subjective side, the awareness of the system develop-ers is insufficient concerning the necessity of early evaluation of the nontechnical components of the telemedical service. Only a close cooperation between experts from the technical and the medical disciplines may guarantee that a telemedical service can be successfully implemented in medical practice. The early consultation of a biostatistician is recommended, because efficient study design and analysis plays an important role in the evaluation process.

From our experience, we strongly recommend developing an evaluation strategy during the project proposal formulation. Much attention should be given to a prospective protocol development for evaluation studies, because only on this basis can convincing evidence, e.g., by hypothesis testing, be given. For the future, the real challenge will be to organize cooperation between similar projects in order to reach a “critical mass” for controlled studies of the effectiveness of telemedical services. Solving the “subjective” and method-ological problems, it will be possible to get more results on this study level worldwide.

Due to several unresolved methodological issues, cost-effectiveness studies will remain a major problem in the evaluation of telemedical services. Today such studies are performed within one country and for a well-defined

(8)

telemed-ical service. The ability to generalize the results remains in doubt because of the differences in the healthcare systems of different countries. For instance, the results of a study from one of the most developed countries in the field of telemedicine (Norway) are difficult to compare with results of studies from other countries, because the Norwegian healthcare system is publicly funded [23]. Therefore, it requires more telemedical applications with a clear reimbursement strategy to “survive” long enough for such studies and a clear statement by policy makers and insurance companies to fund such studies.

REFERENCES

[1] E. B. Allely, “Synchronous and asynchronous telemedicine,” J. Med.

Syst., vol. 19, no. 3, pp. 207–212, 1995.

[2] R. L. Bashshur, “Telemedicine effects: Cost, quality, and access,” J.

Med. Syst., vol. 19, no. 2, 1995.

[3] D. A. Perednia, “Telemedicine system evaluation and a collaborative model for multi-centered research,” J. Med. Syst., vol. 19, no. 3, pp. 287–294, 1995.

[4] E. G. Tangalos, “Clinical trials to validate telemedicine,” J. Med. Syst., vol. 19, no. 3, pp. 281–285, 1995.

[5] P. Taylor, “A survey of research in telemedicine—Part 1: Telemedicine systems,” J. Telemed. Telecare, vol. 4, pp. 1–17, 1998.

[6] C. Ohmann. (1996) “MEDWIS arbeitskreis evaluation—Leitfaden zur evaluation von wissensbasen,” vol. 4. [Online] Available HTTP: http://www.tc.uni-duesseldorf.de/partners/MEDWIS/evaluation/ leitfaden.htm

[7] C. Ohmann et al., “Evaluation procedure in the TELEGASTRO project,”

Theor. Surgery, vol. 9, pp. 90–103, 1994.

[8] D. E. O‘Leary, “Verifying and validating expert systems: A survey,”

Expert Syst. Business and Finance: Issues and Applications, ch. 9, 1993.

[9] D. O‘Leary, “Verification and validation of intelligent systems: Five years of AAAI workshops,” Int. J. Intell. Syst., vol. 9, pp. 653–657, 1994.

[10] R. Engelbrecht et al., “Verification and validation,” Assessment and

Evaluation of Inform. Technol., 1995.

[11] C. Ohmann et al., “Evaluation of knowledge-based systems workshop in cooperation with and supported by the German MEDWIS programme,”

Theor. Surg., vol. 9, pp. 230–237, 1994.

[12] K. Clarke et al., “A methodology for evaluation of knowledge-based systems,” Lecture Notes in Medical Informatics, vol. 45, pp. 361–370, 1991.

[13] R. O’Keefe, O. Baloci, and E. Smith, “Validating expert system perfor-mance,” IEEE Expert, pp. 81–90, 1987.

[14] G. Zahlmann et al., “Telekonsultationsnetzwerk f¨ur die Ophthalmolo-gie—Erfahrungen und Ergebnisse,” Klinische Monatsbl¨atter f¨ur

Augen-heilkunde, vol. 212, no. 2, pp. 111–115, 1998.

[15] W. Adrion and M. Branstad, “Validation, verification and testing of computer software,” Comp. Surv., 1982, vol. 14, pp. 159–192. [16] C. P. Friedman and J. C. Wyatt, Evaluation Methods in Medical

Informatics. New York: Springer, 1997.

[17] J. Wyatt et al., “Evaluating medical expert systems: What to test and how?,” Med. Inform., vol. 15, no. 3, pp. 205–217, 1997.

[18] S. J. Pocock, Clinical Trials—A Practical Approach. Chichester, U.K.: Wiley, 1983.

[19] C. Begg and R. Greenes, “Assessment of diagnostic tests when dis-ease verification is subject to selection bias,” Biometrics, vol. 39, pp. 207–215, 1983.

[20] M. Gold et al., Cost-Effectiveness in Health and Medicine. New York: Oxford Univ., 1996.

[21] E. McIntosh and J. Cairns, “A framework for the economic evaluation of telemedicine,” J. Telemed. Telecare, vol. 3, pp. 132–139, 1997. [22] G. Zahlmann, M. Obermaier, and A. Wegner, “Telemedical systems and

services—Objects of and tools for evaluation/verification procedures,” in 20th Int. Conf. IEEE EMBS, Hong Kong, 1998.

[23] T. S. Bergmo, “An economic analysis of teleradiology versus a visiting radiologist service,” J. Telemed. Telecare, vol. 2, pp. 136–142, 1996.

Rolf Holle received the M.S. degree in mathematics from the University of Marburg, Germany, in 1979 and the Ph.D. degree in human sciences from the University of Giessen, Germany, in 1990.

He worked as a Biostatistician in the Institute of Medical Biometry and Informatics, University of Heidelberg, Germany, where he received his habilitation for medical biometry in 1995. Since 1996, he has been head of the working group for Quantitative Methods in Evaluation Research of the MEDIS-Institute of the GSF-Research Centre for Environment and Health, Neuherberg, Germany. His research interests are statistical methods for the evaluation of diagnostic tests and prognostic models as well as study designs for intervention studies.

Dr. Holle received the certificate “Biometry in Medicine” in 1990, given by the German Society of Medical Informatics, Biometry and Epidemiology (GMDS) and the German Region of the International Biometric Society. He is also a member of the International Society of Clinical Biostatistics.

Gudrun Zahlmann (M’91), for photograph and biography, see this issue, p. 83.

References

Related documents