Appointment scheduling in healthcare. Master Thesis

(1)

Appointment scheduling in healthcare

Alex Kuiper

December 9, 2012

Master Thesis

Supervisors: prof.dr. Michel Mandjes

dr. Benjamin Kemper

IBIS UvA

Faculty of Economics and Business Faculty of Science

(2)

Abstract

Purpose: Appointment scheduling has been studied since the mid-20th century. Literature on appointment scheduling often models an appointment scheduling system as a queueing system with deterministic arrivals and random service times. In order to derive an optimal schedule, it is common to minimize the system’s loss in terms of patients’ waiting times and idle time of the server by using a loss function. This approach is translated to aD/G/1 queue, which is complex for a broad range of settings. Therefore many studies assume service time distributions that lead to tractable solutions, which oversimplifies the prob-lem. Also, many studies overcome the complexity through simulation studies, which are often case specific. The purpose of this thesis is to offer an approach to deal with arbitrary service times and to give guidance for practitioners in finding optimal schedules.

Approach: First, we approximate service time distributions by a phase-type fit. Second, we compute the waiting and idle times per patient. Finally, we run algorithms that minimize, simultaneously or sequentially, the system’s loss. This approach enables us to find optimal schedules for different loss functions in both the transient and steady-state case.

Findings: Our approach is an explicit and effective procedure to find optimal schedules for arbitrary service times. Optimal schedules are derived for dif-ferent scenarios; i.e. for sequential and simultaneous optimization, linear and quadratic loss functions and a broad range of service time distributions.

Practical implications: The procedure can be used to compute optimal sched-ules for many practical scheduling issues that can be modeled as a D/G/1 queue.

Value: We present a guideline for optimal schedules that is of value to practi-tioners in services and healthcare. For researchers on appointment scheduling we present a novel approach to the classic problem of scheduling clients on a server.

Information

Title: Appointment scheduling in healthcare

Author: Alex Kuiper, [email protected], 5647169 Supervisors: prof.dr. Michel Mandjes, dr. Benjamin Kemper Second readers: dr.ir. Koen de Turck, dr. Maurice Koster Date: December 9, 2012

IBIS UvA

Plantage Muidergracht 12 1018 TV Amsterdam http://www.ibisuva.nl

(3)

Chapter 1

Introduction

In this thesis we study the classic appointment scheduling problem that practitioners often encounter in processes in services or healthcare. In such a setting an appointment refers to an epoch that sets the moment of the client’s or patient’s arrival in time. Next, the client receives a service from the service provider. For example, a doctor sees several patients during a clinic session in a hospital. A patient arrives exactly on the priorly appointed time epoch. Upon arrival, the patient either waits for the previous patient to be served, or is directly seen by the doctor. In the latter, the doctor wasidle for a certain time period. Ideally the schedule is such that the patient has no waiting time and the doctor has no idle time. Unfortunately, this case is never realized due to the fact that e.g. the treatment time for every patient is not constant, but arandom variable. Therefore, we must find the best appointment schedule such that theexpected waiting and idle times are minimized for all patients. In this thesis we will study different approaches to achieve this.

Above we gave an example of an appointment scheduling problem in a healthcare setting, but there are many more, such as scheduling of MRI and CT patients. MRI and CT scanners are expensive devices and therefore it is crucial to maximize their utilities, i.e. minimize idle times. So it seems optimal to have a tight schedule for these scanners. But in case of complications during scans, too tight a schedule will result in high waiting times for patients. Another typical example is scheduling the usage of operating rooms in a hospital or clinic. There are only a small number of these special rooms, where various surgeries have to be scheduled on. Therefore, the utility of each room should be maximized, i.e. minimizing the idle time at the expense of the patients’ (waiting) time. But it is known that poor scheduling performance, which results in high waiting times, lead to patients choosing other hospitals or clinics for surgeries.

In addition, there are numerous examples outside the healthcare setting. For instance, ship-to-shore container cranes, where the ships can be seen as clients, the service time is the total time of unloading, and the cranes must be seen as the service provider. Too tight a schedule results in waiting ships, which incurs extra costs for the ships. On the other hand, unused cranes do not make profit. Furthermore, there is a risk that ships choose for competing harbors with lower waiting times. These examples show that the cost is twofold: we have both waiting times for clients and idle time for the server. Finding an optimal schedule which minimizes both costs is our aim.

Now we have a feeling in how we can translate some practical problems to an appointment scheduling problem. Assume for now that there is a finite number of patients to be

(6)

5

uled, say N, where the service times Bi for i ∈ {1, . . . , N} are random variables which are

independent, and in most cases also identically distributed. The waiting and idle times per patient is denoted byWi respectivelyIi. The waiting and idle times are random variables as

well, since they depend on the service times of previous scheduled patients. Our goal is to minimize the sum of all waiting and idle times over all possible schedules. A naive approach would be to schedule all patients equidistantly based on a patient’s average service time. This schedule, denoted byT, can be written ast1 = 0 and ti =Pij−=11 E[Bj].However, a session

could take longer than the expected service time. When this happens all subsequent patients will have positive waiting times in expectation. We will see in Chapter 2 that this schedule is far from optimal, since it leads to infinite waiting times when N tends to infinity.

An important factor which affects the optimal appointment schedule is the trade-off be-tweenWiandIi in the cost functionRi. If the time of the patient is relatively more expensive,

more weight is put on Wi and the schedule will become less tight. Visa versa, if the doctor’s

time is considered to be relatively more expensive, more weight is put on Ii and the schedule

becomes more tight.

The outline of this thesis is as follows. In the upcoming chapter we will discuss the back-ground in the form of a literature review and some preliminaries. Because we are interested in a healthcare setting, we consult related literature. This gives us a proper framework to derive a mathematical model, with relevant cost functions. In the next chapter, Chapter 3, we will introduce phase-type distributions, which can be used to approximate any positive distribution arbitrary accurately. We will give a solid theoretical basis to sustain this claim. Furthermore, they exhibit a specific property which allows us to use a recursive procedure.

This procedure will be used to compute optimal schedules for various cost functions for finite amounts of patients, i.e. the transient case in Chapter 4, first. We will use two different optimization approaches: simultaneous and sequential. Simultaneous optimization is finding an optimal schedule for all patients jointly, while sequential optimization is a recursive ap-proach in which you optimize patient by patient. The latter has the advantage that it reduces the scheduling problem to finding the optimal schedule for one patient each time. At the end of this chapter we will compare our findings with relevant literature. Secondly, in Chapter 5, we will derive a method to compute the optimal schedule in its steady-state for different

coefficients of variation, which is a measure of the variability of the service times. Setting the expectation equal to one will allow us to compare the performance of optimal schedules under changes in the coefficient of variation. We will compute the optimal schedule in steady-state for both the sequential and simultaneous approach and for different cost functions, which will be used in Chapter 4.

In Chapter 6 we check the performance of our approach in simulated examples based on real-life settings. We model these examples by data generated from either the Weibull or log-normal distribution. The choice of these particular distributions originates from the healthcare setting as explained in Chapter 2. In Chapter 7 we summarize our results and suggest topics for further research.

Finally I am thankful to my supervisors Benjamin Kemper and Michel Mandjes for their scientific support and guidance during my six-month internship at the Institute for Business and Industrial Statistics of the University of Amsterdam (IBIS UvA). Also, I would like to thank the second readers Maurice Koster and Koen de Turck who made the effort to read and comment on my work.

(7)

Chapter 2

Background of appointment

scheduling

In this chapter we perform a background study on appointment scheduling. In the upcoming section we review literature on appointment scheduling with a focus on healthcare. Further, we derive assumptions for a model. So that at the end of this chapter we have a well defined optimization problem in a properly justified model.

2.1

Literature review

The research on appointment scheduling dates back to the work of Bailey (1952,1954) [6, 7], Welch and Bailey (1952) [5] and Welch (1964) [28]. The authors study the phenomenon of scheduled patients who are to be treated in a typical healthcare delivery process. This phenomenon of priorly scheduled arrivals that are treated by a service provider is often studied as a queueing system in which jobs arrive following a deterministic arrival process and receive a service which varies in duration.

After this pioneering work on appointment scheduling the subject has been extensively researched in both services and healthcare. The article by Cayirly and Veral (2003) [10] gives a good overview on the state of the art in appointment scheduling. We use this article to highlight special features for appointment scheduling in healthcare.

2.1.1 Dynamic versus static scheduling

To begin with the objective of outpatient scheduling is to find an appointment system for which we optimize over the system’s loss, which is the sum of the expected losses incurred by waiting and idle times. Literature on appointment scheduling can be divided into two cate-gories with respect to outpatient scheduling: static and dynamic. In the static environment the appointment system is completely determined in advance, in other words off line schedul-ing. This is in contrast to the dynamic case in which changes in the schedule are permitted, so called online scheduling. Most literature focuses on the static case; only a few papers such as Fries and Marathe (1981) [14] and Liao et al. (1998b) [18] consider the dynamic case in which the schedule of future arrivals are revised continuously.

(8)

2.1. LITERATURE REVIEW 7

2.1.2 The D/G/1 model

The outpatient scheduling problem can be modeled by a queueing model. The so called D/G/1 queueing model, by Kendall’s three-factor (A/B/C) classification (1953) [17]. The three parameters are:

A, the arrival process, in our case the schedule, is deterministic, denoted by D.

B, the service time distribution is general, denoted by G. This means explicitly that any positive distribution can be taken to model the service times.

C, the number of servers is set to 1, since we have a single doctor or practitioner.

An optimal schedule in this context is the deterministic arrival process of the patients, which minimizes the sum of Ri for all i. A few studies in healthcare investigated so called

multi-stage models in which patients have to go through several facilities, such as in Rising et al. (1973) [20], who performed an extensive case study and Swisher et al. (2001) [21], who did a broad simulation study. We will consider practitioners and doctors as independent queues, because there is a doctor-patient relationship, often seen in literature: Rising et al. (1973) and Cox et al. (1985) [11], but also justified by contemporary medical ethics. It is a mathematical fact that when there are multiple doctors it is better to have a single flow of patients, who are scheduled to the first available doctor.

2.1.3 The arrival process

The arrival process itself can come in many forms. First important factor is the unpunctuality of patients, which is the difference between the time the patient arrives and his appointment time. Generally patients arrive more early than late, which is showed in many studies that can be found in Cayirly and Veral (2003). On the other hand the doctors’ unpunctuality can be measured as lateness to the first appointment. This kind of lateness is only considered in some studies. A second important factor, also addressed in Cayirly and Veral’s paper is the presence of no-shows, that is that a patient does not show up at all. Empirical studies show that the probability of no-shows ranges from 5% to even 30%. Simulation studies show that the no show probability has a greater impact on the appointment schedule than the coefficient of variation and the number of patients per clinic session, see the extensive simulation study by Ho and Lau (1992) [9].

On the other hand patients can also drop by, not having a priorly scheduled appointment. These are called walk-in patients, the presence of these kind of patients is not often incorpo-rated in studies. This is of course in line with the static approach of appointment scheduling, since scheduling these walk-in patients on-the-fly will result in a modification to the schedule. In case there are walk-ins this should not automatically harm the staticity of the appointment schedule since one can take walk-ins into account in the schedule by seeing walk-in patients only at instances when the doctor is idle. Another factor on the arrival process is the presence of companions. The D/G/1 queueing model has infinite waiting capacity and therefore the presence of companions is not taken into account. Moreover, there is no restriction on the number of waiting patients. However as remarked in Cayirly and Veral, for hospitals it is important to know how many people are likely to use the waiting room, since hospitals have to facilitate all the waiting patients and their companions.

(9)

8 CHAPTER 2. BACKGROUND OF APPOINTMENT SCHEDULING

2.1.4 The service time distribution and queue discipline

So far we discussed the arrival process and the number of service providers (doctors). Re-maining is the service time and queue discipline. The queue discipline is in all studies on a

first-come, first-served (FCFS) basis. In case of punctual patients this discipline is the same as serving patients in order of arrival. But, if patients are unpunctual doctors can choose to see the next patient when the next patients is already dropped in. This reduces his idle time. However, assuming that lateness is always less than the scheduled interarrival times, we can assume that the order of arrival will be equal to the scheduled order.

Now, we discuss the most important factor for this thesis, that is the service time (per patient). The total service time is defined to be the sum of all the time a patient is claiming the doctor’s attention, preventing him or her seeing other patients, i.e. the service times per patient, see Bailey (1952). An important quantity in queuing theory is the (squared) coefficient of variation, denoted by (S)CV, of a random variable Bi for patient i

CV = σ

µ, and SCV = σ2 µ2

where µ = E[Bi] and σ2 = Var[Bi]. Many analytical studies assume Bi to be exponential,

obviously because their analytical approach will be intractable otherwise, see Wang [27], Fries and Marathe [14], and Liao et al. [18]. However, the one-parameter exponential distribution sets theCV = 1, which is too restrictive and not seen in practice. More common in healthcare is data with 0.35 < CV < 0.85. Furthermore empirical data collected from clinics and frequency distributions of observed service times display forms that areunimodal and right-skewed, Welch and Bailey (1952), Rising et al. (1973), Cox et al. (1985), Babes and Sarma (1991) [4] and Brahimi and Worthington (1991) [8]. Examples of such distributions are the log-normal or (highly flexible) Weibull distribution.

In 1952 Bailey already observed that the performance of the system is very sensitive to small changes in appointment intervals. Since then many studies have reported that an increase of the variability in service times, i.e. the SCVs, lead to an increase of both the patients’ waiting times and the doctor’s idle times, and thus incur extra costs. Examples of this phenomenon can be found in Rising et al. (1973), Cox et al. (1985), Ho and Lau (1992), Wang (1997), and in Denton and Gupta (2003) [12]. The choice of an optimal appointment schedule depends mainly on the mean and variance, see Ho and Lau (1992) and Denton and Gupta (2003). Hence it is important to model the observed data well, that is by matching the first two moments, a possible way to do so is by phase-type fit, see Adan and Resing (2002) [1]. Wang showed already in (1997) [26] how one can numerically compute an optimal appointment schedule for general phase-type distributions.

2.1.5 Some remarks on scheduling

We finish with some remarks. The appointment scheduling problem can also be considered to be discrete so that one can only set schedules at certain times. This is the approach followed by Kaandorp and Koole (2007) [15]. The main advantage of this approach is that there is just a finite number of combinations possible, so that they can use a faster minimization algorithm. We consider the continuous time case, which can be seen as the limit case of the discrete problem of Kaandorp and Koole, which is with their methodology impossible to solve due to dimension issues.

(10)

2.2. MODEL DESCRIPTION 9

Based on experience we assume that one patient who has to wait 20 minutes is worse than that 10 clients have to wait only 2 minutes. We can incorporate this effect withcost functions, which penalizes the loss incurred by one long waiting times more than the sum of the short waiting times, a common choice for this purpose is aquadratic cost function. Another choice is a linear cost function, which is more appropriate in a production environment, when we do not have to incorporate people’s experiences. We can reason along the same lines for the doctor (or server). We then implement a cost function, either linear or quadratic, on the doctor’s idle times.

Since we deal with randomness, driven by the variable service times, we look at these losses in expectations, which we call risk. A more general discussion of choices for cost functions can be found in Cayirly and Veral (2003). Herein they define the expected total loss (risk) as a weighted sum of expected waiting times, idle times and overtime. We only focus on the linear and quadratic cost functions used in Wang (1997) and Kemper et al. (2011). In the next section we present our model, settings and cost functions in more detail.

2.2

Model description

In this section we present our model. The optimization problem as stated in the introduction can be written as a minimization of the expected patients’ waiting times and server’s idle times over the arrival epochs:

min t1,...,tN+1 N+1 X i=1 (E[Ii] +E[Wi]) (2.1)

in whichWi refers to the patienti’s waiting time and Ii refers to the server’s idle time before

it serves thei-th patient. This problem is defined in the following setting:

• There are N + 1 patients to be priorly scheduled in one clinic session. So there are no walk-in patients.

• Let B1, . . . , BN+1 the service times of N+ 1 patients. • The Bi’s are independent and identically distributed.

• Thetiare the scheduled timesi∈ {1, . . . , N+1}, and letxi =ti+1−ti, fori∈ {1, . . . , N}

the interarrival times.

• Scheduled patients are punctual, always show up and are served in order of arrival. • One wants to minimize a convex cost function Ri depending on the expected waiting

time of patient iand theexpected idle time of the doctor.

We scheduleN+ 1 patients, so that we end up with exactlyN interarrival times. The relation of the interarrival timesxi’s and the arrival epochs ti’s is showed in Figure 2.1. We consider

now the naive schedule of the introduction. We propose it as an example, which proves that even a simple, but naive, heuristic can have major impact on (expected) waiting times. It shows the relevancy of optimal appointment scheduling.

(11)

10 CHAPTER 2. BACKGROUND OF APPOINTMENT SCHEDULING

Figure 2.1: the relation betweenN + 1 arrival epochs ti and N interarrival times xi.

Example 2.1 (A naive appointment schedule). Consider again the appointment scheduleT by setting the interarrival timesxi equal to the expected service time of patient i

t1 = 0 and ti = i−1

X

j=1

E[Bj]

fori= 2, . . . , N, N + 1. This is a very simple approach, but in fact the service load is equal to 1. It means that per unit time the in- and outflow of patients is equal to 1, which will lead to infinite waiting times by the following proposition, see Kemper et al. (2011).

Proposition 2.2. In a D/G/1 queue with load 1 starting empty, with the service times Bi

having variance σ2_<_∞_{, the mean waiting time of the}_N_{-th patient is given by}

E[WN]→σ r

2N

π N → ∞.

The result also holds in the more general setting of a G/G/1 queue, where σ2 ₌

Var[Ai] +

Var[Bi]where Ai is the arrival distribution of patient i.

Proof. LetAi be thei-th patient’s interarrival time andBi his service time. By the

Spitzer-Baxter identities [2, 13] we have to study

E_√[WN] N = σ √ N N−1 X k=1 1 √ kIk (2.2)

withWN waiting time of patientn and

Ik= Z ∞ 0 P " 1 √ k k X i=1 Bi−Ai σ > y # dy. By Chebyshev’s inequality the integrand is bounded

P " 1 √ k k X i=1 Bi−Ai σ > y # ≤min ( 1, 1 y2Var " 1 √ k k X i=1 Bi−Ai σ #) , so that Ik≤ Z ∞ 0 1∧ 1 y2 dy= Z 1 0 dy+ Z ∞ 1 1 y2 dy= 2.

(12)

2.2. MODEL DESCRIPTION 11

This gives us a majorant so that we can apply both the Dominated Convergence and the Central Limit Theorem

lim k→∞P " 1 √ k k X i=1 Bi−Ai σ > y # = Z ∞ 0 (1−Φ(y)) dy= √1 2π. Furthermore 1 √ N N−1 X k=1 1 √ kIk= Z 1 0 1_[_x≤1−N−1_] p dN xe/NIdN xe/N dx. This integrand is again bounded, knowing that IdN xe/n <2, by 2x−1/2 so

R1

0 2x

−1/2 _dx_{= 4.} Using Dominated Convergence Theorem gives us pointwise convergence of

lim N→∞ 1 √ N N−1 X k=1 1 √ kIk= Z 1 0 1 √ 2π 1 √ x dx= r 2 π.

We finish the proof by multiplying the latter expression by σ which gives us the right-hand side of equation (2.2).

What happens in the last example is that the occupation rate, defined as ρ= E[Bi]

E[Ai]

,

is equal to 1. The occupation rateρis also known as the service load. To ensure stability, i.e. with probability zero we have infinite patients waiting ift→ ∞, we need that_E[Bi]<E[Ai],

i.e. the service load should be less than 1.

LetRi =Eg(Ii) +Eh(Wi) the risk per patient, whereg, h:R≥0 →R≥0 arecost functions. Also, we demand thatg, h are convex and satisfy g(0) =h(0) = 0, so that the problem is

min t1,...,tN+1 R= min t1,...,tN+1 N+1 X i=1 Ri= min t1,...,tN+1 E "_N₊₁ X i=1 (g(Ii) +h(Wi)) # , (2.3)

c.f. equation (2.1). The riskR can also be thought of the system’s loss, since it captures all the loss incurred by idle and waiting times.

Proposition 2.3. It is always optimal to schedule the first patient at time zero.

Proof. Suppose one has an optimal scheduleV wheret1=a >0 and the alternative (shifted) schedule W where t∗₁ = 0 and all t∗_i = ti−a then expected idle time of the first patient is

larger than 0, because by punctuality

E[g(I1)] =g(a) +E[g(I1∗)] =g(a) and E[h(W1)] =E[h(W

∗

1)] = 0. Furthermore, for alli∈ {2, . . . , N+ 1}

E[g(Ii)] =E[g(Ii∗)] andE[h(Wi)] =E[h(Wi∗)]

as patients arrive on their arrival epochstiort∗i =ti−a. Observe that scheduleRV =RW+a,

(13)

12 CHAPTER 2. BACKGROUND OF APPOINTMENT SCHEDULING

By Lindley’s recursion formulas, see [19], which are graphically demonstrated in figure Fig-ure 2.2, we have

Ii = max{(ti−ti−1)−Wi−1−Bi−1,0}= max{xi−1−Si−1,0}, (2.4)

Wi = max{Wi−1+Bi−1−(ti−ti−1),0}= max{Si−1−xi−1,0}, (2.5) Define now theloss function l(x)

Figure 2.2: the relations between interarrival, idle, sojourn and waiting times. We see that the idle timeIi can be written as the interarrival timexi−1 minus the sojourn timeSi−1. The waiting time Wi can be written as the sojourn time Si−1 minus the interarrival time xi−1. The figure originates from Vink (2012) [24].

l(x) =g(−x)1x<0+h(x)1x>0.

The function l(x) is also convex which can be deduced easily by the fact that g(x) andh(x) are convex functions satisfying g(0) = h(0) = 0. Proposition 2.3 and the fact that either idle time is zero and waiting time is positive or visa versa, one can reduce the optimization problem to minimizing overN arrival epochs or interarrival times:

R= N+1 X i=1 Ri =E "_N₊₁ X i=2 (g(Ii) +h(Wi)) # =E " _N X i=1 l(Si−xi)) # .

There are many loss functions possible, but often one considers absolute loss l(x) = |x|, which corresponds with linear costs or quadratic loss l(x) = x2_{. The latter loss function} corresponds with quadratic costs of waiting and idle times. We can generalize these loss functions to weighted versions, i.e. letα∈(0,1):

(14)

2.2. MODEL DESCRIPTION 13 and R(α) = N+1 X i=2 E h α(xi−1−Si−1)21Si−1−xi−1<0+ (1−α) (Si−1−xi−1) 2 1Si−1−xi−1>0 i = N+1 X i=2 E h α(Si−1−xi−1)2+ (1−2α) (Si−1−xi−1)21Si−1−xi−1>0 i (2.8) =E "_N₊₁ X i=1 αI_i2+ (1−α)W_i2 # . (2.9) Furthermore, there are other variables which can incorporate costs, for example the approach by Wang [26] [27] where he minimizes the expected sojourn times per patient

N+1 X i=1 E[Si] = N+1 X i=1 E[Wi+Bi]

where E[W1] = 0. However, E[Bi]’s do not depend on the schedule choice, so that it is

equivalent with minimizing waiting time only. But minimizing waiting time only will take interarrival times xi → ∞ so we add the expected system’s completion time. The expected

system’s completion time is defined as the sum of arrival time of the last patient tN+1 and his expected sojourn timeE[SN+1]. We can generalize this also to weighted version by giving weights to the sojourn and completion time.

Since the system completion time can also be seen as the sum of all idle times plus service times. It is equivalent with minimizing idle time only. So Wang’s choice of cost functions is equivalent with linear costs. An equivalent definition is the sum of all the interarrival times plus the expected sojourn time of the last patient. We summarize, let α∈(0,1)

R∗(α) =α N+1 X i=1 E[Si] + (1−α) (tN+1+E[SN+1]) =α N+1 X i=1 (E[Wi] +E[Bi]) + (1−α) N+1 X i=1 (E[Ii] +E[Bi]) =α N+1 X i=1 E[Wi] + (1−α) N+1 X i=1 E[Ii] + N+1 X i=1 E[Bi] =α N+1 X i=1 E[Si] + (1−α) E[SN+1] + N X i=1 xi ! . (2.10)

(15)

14 CHAPTER 2. BACKGROUND OF APPOINTMENT SCHEDULING

So we distinguished four different costs in terms of time:

• Completion time: the time when the doctor (server) finishes seeing the last patient. • Idle time: the time that a server is idle before the next patient comes in.

• Waiting time: the time that a patient has to wait before he or she is seen (served). • Sojourn time: the sum of waiting and service time.

Moreover, we observed that

R∗(α) =R(α) +

N+1

X

i=1

E[Bi], (2.11)

which shows the relation between the system’s loss of Kemper and Wang. Minimizing absolute loss, equation (2.6), is equivalent with minimizing Wang’sR∗. This allows us to compare their methods at the end of Chapter 4.

In this chapter we proposed and motivated our model. In addition, we showed some preliminary results such as finding an optimal appointment schedule is equivalent with finding optimal interarrival times. Moreover, we reduced the problem of minimizing functions of idle and waiting times to minimizing a function of the sojourn time only. In the next chapter we focus on a rich class of distributions which lie dense in the class of all distributions and have appropriate properties. These are called phase-type distributions. We will use these distributions in our minimization problem to compute optimal schedules for transient and the steady-state cases.

(16)

Chapter 3

Approximation by a phase-type

distribution

In this chapter we will start with an overview of phase-type distributions. The reason why we will focus on phase-type distributions is that they are very flexible in capturing data structures, for example via moment matching [1] or via the EM algorithm [3]. In the next section we will give a theoretical framework, see [23] and [13], which proves that phase-type distributions can approximate any positive distributionarbitrarily accurately. After showing this result we will give explicit formulas of how we can fit phase-type distributions based on its first two moments. In the final section we show arecursive procedure of computing the sojourn time distribution of patients in the D/P H/1 setting when their service time distribution is phase-type.

3.1

Phase-type distribution

The assumption of exponential service times is often used, because of its simplicity. The

memoryless property is what makes it so attractive to make this assumption, that is

P[X > t+s] =P[X > t]P[X > s].

The reason why we focus on phase-type distributions is that they can be seen as a generaliza-tion of exponential service times. They do not satisfy the condigeneraliza-tion of above, except of course the special case of exponential service times. They have another attractive property that is, loosely speaking, that when the process is stopped at an arrival time ti the probability is

distributed over the possible number of phases in the system. This distribution of probability over the phases can be taken as a new probability vector, which can be used as a start vector in the same phase-type form, but time is then starting att= 0 instead of ti. This property

can be exploited to (numerically) compute optimal arrival times.

First, we start with the precise definition of a phase-type distribution and give some key examples. Consider a Markov process Jt on a finite state space {0,1, . . . , p}, where 0 is

absorbing and the other states are transient. The infinitesimal generator Q is given as

Q= 0 0 S0 S , 15

(17)

16 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

whereS is an m×m-matrix andS0 =−S1, which is the so called exit vector. The vector1

is a column vector of ones of lengthm, so that each row in Qsums to zero.

Definition 3.1. A random variableXdistributed as the absorption time inf{t >0 :Jt= 0}

with initial distribution (α0,α) (row vector of length m+ 1) is said to be phase-type dis-tributed. In phase-type representation we say X ∼ P H(α,S), since α and S define the characteristics of the phase-type distribution completely.

The transition matrices

Pt= exp (Qt) = ∞ X n=0 Qn_tn n!

of the Markov processJt can also be written down in block partitions

Pt= 1 0 1−eSt₁ _eSt , which gives an expression for the distribution function

F(t) = 1−αeSt1.

Also, other basic characteristics of phase-type distributions can be derived: 1. The density: f(t) =αeSt_S0 ₌_α_eSt_S₁_.

2. The Laplace-Stieltjes transform: R∞

0 e

−st _{F(ds) =}_α_(s_I₋_S₎−1_S0_. 3. The n-th moment:E[Xn] = (−1)nn!αS−n1.

Phase-type distributions do not have unique representations, see Example 3.2.

Example 3.2 (Erlang distribution). This distribution is denoted byEK(µ) and is a special

case of the gamma distribution in which the shape parameter is a natural number. Its probability density function is given by

f(t) =µ(µt)

K−1 (K−1)!e

−µt_.

The interpretation is that a random variable has to go through K exponential phases with same scale parameter. So that we can also write this in phase-type representation (α,S), namely α= (1,0, . . . ,0) of dimension 1×K and a matrixS of dimensionK×K

S=         −µ µ 0 . . . 0 0 −µ µ . .. ... .. . . .. ... ... 0 0 . . . 0 −µ µ 0 . . . 0 0 −µ         .

Its moments are given by

E[Xn] = (K+n

−1)! (K−1)!

1 µn

(18)

3.1. PHASE-TYPE DISTRIBUTION 17

so that itsSCV = _K1.

A further generalization is to take a mixture of two Erlang distributions with same scale parameter, denoted byEK−1,K(µ).This distribution is of special interest to us, since we will

use it to approximate distributions with. LetEK(µ) with probability 1−pandEK−1(µ) with

probabilityp, so α= (1−p,0, . . . ,0, p,0, . . . ,0) (dimension 1×(K+ (K−1)) and

S =                       −µ µ 0 . . . 0 0 0 . . . 0 0 0 −µ µ . .. ... 0 0 . . . 0 0 .. . . .. ... ... 0 ... ... . . . ... ... 0 . .. 0 −µ µ 0 0 . . . 0 0 0 0 . . . 0 −µ 0 0 . . . 0 0 0 0 . . . 0 0 −µ µ 0 . . . 0 0 0 . . . 0 0 0 −µ µ . .. ... .. . ... . . . ... ... ... . .. ... ... 0 0 0 . . . 0 0 0 . .. 0 −µ µ 0 0 . . . 0 0 0 0 . . . 0 −µ                       ,

whereS is a K+ (K−1)×K+ (K−1)-matrix. (Upper left block has dimensionK×K and lower right block K−1×K−1.) This is equivalent with the following, more parsimonious, representation, α= (1,0, . . . ,0) and matrix

S =          −µ µ 0 . . . 0 0 −µ µ . .. ... .. . . .. ... ... 0 0 . .. 0 −µ (1−p)µ 0 0 . . . 0 −µ         

with dimensionK×K. Its moments are given by

E[EnK−1,K] =p (K+n−1)! (K−1)! 1 µn + (1−p) (K+n−2)! (K−2)! 1 µn

and so itsSCV = K₍_K−₊(p_p−₋₁₎1)22, which is in the interval

h 1 K, 1 K−1 i

whenpvaries between [0,1].So we can use a mixture Erlang distributions withK ∈ {2,3, . . .}to approximate a distributions with a SCV < 1, which we will do so in Section 3.3. In fact by Theorem 3.5 we know that a mixture of Erlang distributions can approximate any distribution arbitrary accurately. Finally this example demonstrated that phase-type representationsP H(α,S) are not unique.

Example 3.3 (Hyperexponential distribution). The hyperexponential distribution can be seen as a mixture of exponential random variables, with different parameters µ1, . . . , µn>0

and withαi >0 andPni=1αi = 1, such that its density is

f(t) =

n X

i=1

(19)

18 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

This distribution is often denoted asHn(µ1, . . . , µn;α1, . . . , αn).In phase-type representation

the probability vectorα= (α1, . . . , αn) and the matrix

S =       −µ1 0 . . . 0 0 −µ2 . .. ... .. . . .. ... 0 0 . . . 0 −µn       .

Consider the special case wheren= 2, such thatα= (p1, p2) withpi >0 and p1+p2 = 1.

S = −µ1 0 0 −µ2 . Then we have F(t) = 1− 2 X i=1 pie−µit and f(t) = 2 X i=1 piµie−µit

and its moments are given by

E[H2n] =p1 n! µn 1 +p2n! µn 2 , so that theSCV = 2 p1µ21+p2µ22 (p1µ1+p2µ2)2 −1≥1, because p1µ2₁+p2µ2₂ =p1(p1µ2₁+p2µ2₂) +p2(p1µ2₁+p2µ2₂) =p2₁µ2₁+p1p2(µ₁2+µ2₂) +p2₂µ2₂ ≥p2₁µ2₁+ 2p1p2µ1µ2+p2₂µ2₂= (p1µ1+p2µ2)2.

Hence we can use this distribution to approximate distributions with a SCV ≥1, which we will show in Section 3.3 and the theoretical basis for this is given by Theorem 3.10.

Example 3.4 (Coxian distribution). The Coxian distribution, notation CK, is a wide class

in which the mixture Erlang is a special case in which the service times are equal. The phase-type representation is given by a vectorα= (1,0, . . . ,0) and the matrix

S =          −µ1 p1µ1 0 . . . 0 0 −µ2 p2µ2 . .. ... .. . . .. . .. . .. 0 0 . .. 0 −µK−1 pK−1µK−1 0 0 . . . 0 −µK          .

We restrict ourselves to the Coxian-2 since this distribution is sometimes used in approxima-tions wheneverSCV > 1₂, since by straightforward computations we have that

E[C2] = 1 µ1 + p µ2 , EC22 = 2 µ2 1 + 2p µ1µ2 +2p µ2,

(20)

3.2. PHASE-TYPE APPROXIMATION 19 so that we haveEC22 ≥ 3 2E[C2] 2_{. This gives} _SCV ₌ E[C22]

E[C2]2 −1≥ 1₂.In general it holds that the SCV of CK is greater or equal to _K1, since the minimum SCV is obtained when we set

p1 =p2 =. . .=pK−1 = 1 and µ1 =µ2 =. . .=µK i.e. an Erlang K distribution for which

the SCV = _K1.

In general the Coxian distribution is extremely important as any acyclic phase-type distri-bution has an equivalent Coxian representation, but this lies outside the scope of this thesis.

3.2

Phase-type approximation

Before we fit distributions by phase-type distributions we have to prove the validity of these fits. In this section we prove that we can approximate any distribution with positive support arbitrarily accurately by a mixture ofErlang distributions, see Tijms (1994) [23]. Furthermore, we prove that a certain type of distributions, i.e. with a completely monotone density, can also be approximated arbitrarily accurately by a mixture of exponential distributions, the so calledhyperexponential distribution. The main idea of this proof can be found in Feller [13], however, here we modified the proof to get the required result.

Theorem 3.5. LetF(t) be the cumulative distribution function of a positive random variable with possibly positive mass at t = 0 i.e. F(0) > 0. For fixed ∆ > 0 define the probability distribution function F∆(t) = ∞ X K=1 pK(∆)    1− K−1 X j=0 e−∆t t ∆ j j!    +F(0), t≥0, where pK(∆) =F(K∆)−F((K−1)∆), K = 1,2, . . . .Then lim ∆→∞F∆(x) =F(x)

for any continuity pointx of F(x), (i.e. pointwise convergence).

Proof. Let ∆, x >0 fixed andU∆,x be a Poisson distributed random variable with

P[U∆,x=j∆] =e− x ∆ x ∆ j j! , j= 0,1, . . . . We have E[U∆,x] = ∞ X j=0 e−∆x x ∆ j j! ∆j =x ∞ X j=1 e−∆x x ∆ j−1 (j−1)! =x, Var[U∆,x] =E[U∆2,x]−x2=x ∞ X j=1 e−∆x x ∆ j−1 (j−1)!j∆−x 2 =x2 ∞ X j=2 e−∆x x ∆ j−2 (j−2)! +x∆ ∞ X j=1 e−∆x x ∆ j−1 (j−1)! −x 2 ₌_x∆.

(21)

20 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

We prove for any continuity pointx that lim

∆→0E[F(U∆,x)] =F(x).

Fix >0 and take a continuity point xof F(x).Then there exists aδ >0 such that |F(t)− F(x)| ≤

2 for all twith |t−x| ≤δ, so that:

|_E[F(U∆,x)]−F(x)| ≤E[|F(U∆,x)−F(x)|] (Jensen’s inequality)

= ∞ X k=0 |F(k∆)−F(x)|P[U∆,x=k∆] ≤ 2 X k:|k∆−x|≤δ P[U∆,x=k∆] + 2 X k:|k∆−x|>δ P[U∆,x=k∆] ≤ 2 + 2P[|U∆,x−x]|> δ] = 2 + 2P[|U∆,x−E[U∆,x]|> δ] ≤ 2 + 2 x∆ δ2 (Chebyshev’s inequality) < if ∆< δ 2 4x. So we let ∆→0 F(x) = lim ∆→0E[F(U∆,x)] = lim ∆→0 ∞ X j=0 F(j∆)e−∆x x ∆ j j! = lim ∆→0 ∞ X j=0 e−∆x x ∆ j j! K X j=1 pj(∆) +F(0) = lim ∆→0 ∞ X K=0 pK(∆) ∞ X j=K e−∆x x ∆ j j! +F(0) = lim ∆→0 ∞ X K=1 pK(∆)    1− K−1 X j=0 e−∆t t ∆ j j!    +F(0),

which gives the desired result.

So every positive random variable with a distribution function and possibly positive mass at time zero can be approximated by a mixture of Erlang distributed random variables. Theorem 3.5 also holds for approximation by a Coxian distribution, since a mixture of Erlang distributions is a special case of a Coxian distribution. We prove the statement in the following corollary.

Corollary 3.6. LetF(t) be the probability distribution function of a positive random variable with possibly positive mass at t = 0 i.e. F(0) > 0, then it can be approximated arbitrarily accurately by a Coxian distribution function for any continuity point x of F(x).

(22)

3.2. PHASE-TYPE APPROXIMATION 21

Proof. The statements follows from the fact that mixture Erlang distribution is a special case of a Coxian distribution, which we prove now. Letn∈Nand Z =Pn_K₌₁pKEK withpK >0

and Pn

K=1pK = 1 in whichEK has an Erlang-K distribution with parameterµ. The unique

Laplace-Stieltjes transform of the Erlang mixture Z is given by ˜ Z(s) = n X K=1 pKE˜K(s),

where ˜EK is the Laplace-Stieltjes transform ofEK which is

µ µ+s

K

.On the other hand we have a Coxian distribution Z0 _{defined as}

Z0 =                  Y1 w.p. (1−q1) Y1+Y2 w.p. (1−q2)q1 Y1+Y2+Y3 w.p. (1−q3)q2q1 .. . Pn−1 i=1 Yi w.p. (1−qn−1) Qn−2 i=1 qi Pn i=1Yi w.p. Qn−1 i=1 qi

in whichYi ∼Exp(µi) and the probabilities sum up to 1. Hence Z0 is a mixture of random

variables and define E_K0 =

K X

i=1

Yi with Laplace-Stieltjes transform ˜EK0 (s) = K Y i=1 µi µi+s . So that ˜ Z0(s) = n−1 X K=1 K−2 Y i=1 qi(1−qK) ˜EK0 (s) ! + n−1 Y i=1 qjE˜0n(s).

Now, we set µi=µfor all iand find

˜ Z0(s) = n−1 X K=1       K−2 Y i=1 qi(1−qK) | {z } pK ˜ EK(s)       + n−1 Y i=1 qj | {z } pn ˜ En(s) = n X K=1 pKE˜K(s) = ˜Z(s).

By uniqueness of the Laplace-Stieltjes transform the statement is proven.

The hyperexponential distribution can also be used to approximate arbitrarily accurately a specific class of distributions. This class should have a probability density function which is completely monotone. There are several distributions, which are completely monotone, such as the exponential, hyperexponential, Weibull and Pareto distribution. The Weibull distribution is seen in some healthcare settings, see Babes and Sarma (1991) [4]. Before we prove the statement above in a theorem we present theExtended Continuity Theorem, which relates Laplace-Stieltjes transforms to measures in limits. First, we give the definition of a completely monotone function.

Definition 3.7. A probability density function f is completely monotone if all derivatives off exist and satisfy

(23)

22 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

Theorem 3.8. If f(t) and g(t) are completely monotone functions then the positive linear product of these two is functions is completely monotone. Furthermore the product of two completely monotone functions is completely monotone.

Proof. Let a, b > 0 then h(t) = af(t) +bg(t) is completely monotone by linearity. For the second statement leth(t) =f(t)g(t), so that by Leibniz’ rule we have

∂n_h(t) ∂tn = n X i=0 n i f(i)_(t)g(n−i)_(t),

it follows then by completely monotonicity of functionsf and g.

Theorem 3.9 (Extended Continuity Theorem). For n = 1,2, . . . let Gn be a measure with

Laplace-Stieltjes transform Fn. If Fn(x) → F(x) for x > x0, then F is the Laplace-Stieltjes

transform of a measure G and Gn→ G. Conversely, if Gn → G and the sequence {Fn(x0)}

is bounded, then Fn(x)→F(x) for x > x0.

Proof. See Feller (1967) [13].

Theorem 3.10. Let F(t) be a probability distribution function with a completely mono-tone probability density function f(t) then there are hyperexponential distribution functions

Fm, m≥1 of the form Fm(t) = 1− Mm X n=1 pmne −µmnt_, _t_≥_0,

with µni≤ ∞ and PMn=1mpmn = 1, where pmn >0 for alli, such that

lim

m→∞Fm(t) =F(t), for allt >0,

i.e. uniform convergence.

Proof. LetF(t) the probability distribution function and considerF(a−ay) for fixeda >0 and variable y ∈ (0,1). By Taylor expansion aroundy = 0, by completely monotonicity of f all the derivatives exist for all y∈[0,1), we have

F(a−ay) = ∞ X n=0 (−a)n_Fn_(a) n! y n₌_F_(a)₋ ∞ X n=0 a n+ 1 (−a)n_fn_(a) (n! y n+1

which holds fory∈[0,1).We change our variable y tox∈(0,∞) byy=e−x/a_{, so that}

Fa(x) =F(a−ae− x a) =F(a)− ∞ X n=0 a(−a) n_fn_(a) (n+ 1)! −n+1 a x =F(a)− ∞ X n=1 Ca(n)e− n ax,

which is the Laplace transform of an arithmetic measureG(z) giving mass of pn=Ca(n) >0 (by completely monotonicity) to the points z= n_a forn= 1,2, . . .. In detail, we have that

1≥Fa(x) =F(a)− Z ∞ 0 Ca(n)e−zxdGa(z) =F(a)− ∞ X n=1 pa(n)e− n ax ≥0.

(24)

3.3. PHASE-TYPE FIT 23

We observe that for anyx

Fa(x)→F(x).

Applying the Extended Continuity Theorem 3.9 gives the existence of a measureG(z) Ga(z)→G(z)

with the cumulative distribution function F(x) = R∞

0 e

−zx _{dG(z) as it is Laplace-Stieltjes}

transform.

So, if F(t) is a cumulative distribution function with a completely monotone probability density functionf(t) then there are hyperexponential cumulative distribution functionsFm(t)

of the form: Fm(t) = 1− Mm X n=1 pmne −µmnt_,

which converge to F(t) for allt >0 ifm andMn tend to infinity. Observe thatFm is indeed

a hyperexponential distribution function for m∈_N.

3.3

Phase-type fit

In practice it often occurs that the only information of random variables that is available is their mean and standard deviation, based on data only. On basis of these two quantities one can fit (approximate) its underlying distribution by a phase-type distribution. The only condition is that the random variable for which we approximate its distribution must be pos-itive, sometimes completely monotone. The syllabus by Adan and Resing (2002) [1] suggests specific distributions for this fitting purpose.

Let X be a positive random variable and (S)CV its (squared) coefficient of variation. In case 0 < CV <1 one fits an Erlang mixture distribution that is with probability p it is an ErlangK−1 and with 1−pan ErlangK, in shorthand notationEK−1,K(µ;p). The parameters

are given by

1

K ≤SCV ≤ 1 K−1, forK= 2,3, . . . .Secondly we choose pand µ such that

p= 1 1 +SCV

K·SCV −pK(1 +SCV)−K2_SCV_, _µ₌ K−p

E[X].

then theEK−1,K distribution matches its expectation E[X] and coefficient of variationCV.

Because of the fact that we use an Erlang mixture model with the same scale parameter, but different shape parameters, we have that the coefficient of variation is always less than one. To get a coefficient of variation greater than 1 we have to vary the scale parameter as well. The hyperexponential distribution is the simplest case with different scale parameters. So in caseCV ≥1 we choose to fit a hyperexponential distribution with parametersp1, p2, µ1 andµ2 in shorthand notationH2(p1, p2;µ1, µ2).We have four parameters to be estimated, so we setp2 = (1−p1), so that we do not have an atom at zero. Furthermore, we usebalanced means

p1 µ1

= p2 µ2

(25)

24 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

then the probabilities are given by

p1= 1 2 1 + r SCV −1 SCV + 1 ! and p2 = 1 2 1− r SCV −1 SCV + 1 ! with rates µ1 = 2p1 E[X] and µ2 = 2p2 E[X].

If SCV = 1 then it reduces p1 = 1₂ = p2 and µ1 = µ2, which is equivalent with the exponential distribution. And whenSCV → ∞ thenp1 →1.

In case that SCV ≥ 1₂ also a Coxian-2 distribution can be used with the following param-eters µ1 = 2 E[X], p= 1 2SCV , µ2=µ1p.

Remark that we use only two moments to fit the data, so that we do not match skewness and kurtosis of the particular distribution. On the other hand there is enough freedom to match more moments when we use more general phase-type distributions for fitting. However, choosing a parsimonious model is more practical and in most cases sufficient.

3.4

Recursive procedure for computing sojourn times

Up till now we have studied phase-type distributions as an approximation tool. The reason for this is that in the generalD/G/1 case we cannot compute the waiting and idle times. The solution for this to translate theD/G/1 setting to a D/P H/1 setting by approximating the general service time distributions by phase-type distributions.

Wang introduced in 1997 [27] a recursive procedure on the calculation of the sojourn time of customers on a stochastic server. We can translate this to our scheduling problem in which a doctor is seeing patients. Wang’s procedure makes use of an attractive property of phase-type distributions mentioned in Section 3.1. In this section we will explain Wang’s approach in detail. First, we show the approach for the specific case of exponentially distributed service times. Second, we explain it for phase-type distributed service times, in which the phase-type distributions are allowed to differ among patients, which is a further generalization of Wang’s result.

3.4.1 Exponentially distributed service times

In this section we present the iterative procedure for exponentially distributed service times, where the service times are not necessarily identical. The procedure gives a good idea for the next section where the service times are phase-type distributed. From now on,1 denotes a column vector of ones of appropriate size.

Suppose we have a service order 1, . . . , N + 1. We are interested in the sojourn time distribution

FSi(t) =P[Si≤t], t≥0

for all (patients)i∈ {1, . . . , N+ 1}. Letpi,k(t) the probability that patientiseeskpatients in

(26)

3.4. RECURSIVE PROCEDURE FOR COMPUTING SOJOURN TIMES 25

to that the patientiis served, so that

P[Si ≤t] = 1− n−1

X

k=0

pi,k(t).

Definep_i = (pi,i−1, pi,i−2, . . . , pi,0) a row vector of dimensioniso that

P[Si≤t] = 1−pi1.

The first patient, who will be served directly, has infinitesimal generator

Q1 = 0 0 µ1 −µ1 and submatrix S1=−µ1. Then we have, by definition of phase-type distributions,

p₁ =p1,0 =e−µ1t and FS1(t) = 1−e

−µ1t_{, t}_≥_0.

The second patient, who arrives at time x1, will find either no patient in the system with probabilityp1(x1) or the first patient is still in the system with probability 1−p1(x1). The first patient follows an exponential distribution with parameter µ1. Because of the memoryless property of exponential random variables, the waiting time of the second patient is also exponentially distributed with parameter µ1. So that its sojourn time is governed by the continuous-time Markov chain with infinitesimal generator

Q2 =   0 0 0 0 −µ1 µ1 µ2 0 −µ2  .

LetS2 be the submatrix

S2 = −µ1 µ1 0 −µ2 ,

thenp₂(t) satisfies the following system of differential equations dp₂(t)

dt =p2(t)S2 with p2(0) = (p1(t),1−p1(t)1) for which the solution is given by

p₂(t) = (p₁(x1),1−p₁(x1)1)eS2t_{= (e}−µ1x1_,₁₋_e−µ1x1_)eS2t _t_≥_0.

Since the Markov chain has an acyclic structure the transient solution ofp_i can be derived by a system of differential equations. In general, for thei-th patient

Qi= 0 0 S0_i Si ,

(27)

26 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

whereSi is the submatrix

Si =         −µ1 µ1 0 . . . 0 0 −µ2 µ2 . .. ... .. . . .. ... . .. 0 0 0 −µn−1µn−1 0 . . . 0 0 µb         ,

then p_i(t) = (p_i−1(xi−1),1−pi−1(xi−1)1)eSit fort≥0, since it is the solution of dp_i(t)

dt =pi(t)Si with pi(0) = (pi−1(xi−1),1−pi−1(xi−1)1).

We point out that the time for the i-th patient starts running when he arrives. Then he either waits or is served directly. The corresponding interarrival time xi is used in the initial

condition for the subsequent patient. We summarize the above procedure in a proposition.

Proposition 3.11. If the interarrival times are xi for i= 1,2, . . . , N then the sojourn time

distribution of the i-th (i= 1,2, . . . , N+ 1) patient is given by

Fi(t) =P[Si(t)≤t] = 1−pn1, where p₁ =eS1t_, p_i = p_i₋₁(xi−1),1−pi−1(xi−1) eSit _for _i_{= 2,}_{3, . . . , N}_{+ 1.} Furthermore, we have E[S1] = 1 µ1, E[Si] = pi−1(xi−1),1−pi−1(xi−1)       Pi−1 j=0 µi1−j Pi−2 j=0 µi1−j .. . 1 µi       for i= 2,3, . . . , N+ 1.

This proposition is proved by induction. The expression for mean sojourn times follows from the properties of phase-type distributions. Furthermore, we observe that each iteration the dimension of p_i(t) increases by 1. The underlying continuous-time Markov chain is ob-served at the epochs of arrival times,ti. At these points the states of the Markov chainYn(t)

are defined as the number of patients waiting in the system. In case there are no patients waiting the arriving patient is in service. We remark that only the firsti−1 patients affect the sojourn time of patienti.

3.4.2 Phase-type distributed service times

In this section we generalize the recursive procedure from the latter section to phase-type distributed service times. We use the article by Wang (1997) to describe the procedure.

(28)

3.4. RECURSIVE PROCEDURE FOR COMPUTING SOJOURN TIMES 27

He assumes independent and identical phase-type distributed service times, where S of the phase-type representation has an acyclic structure. We extend on this by varying the (acyclic) phase-type distributions among patients. In detail, let patientihave a phase-type distributed service time distribution, with probability vectorαi(dimensionmi×1) andmi×mi-matrixSi.

Define now the bivariate process {Yi(t), Ki(t), t≥0} for patient i = 1, . . . , N+ 1, where

Yi(t) ∈ N is representing the number of patients in front of the i-th patient and Ki(t) ∈ N

is the particular phase in which the service is in if the server is busy, otherwise Ki(t) = 0.

Furthermore, the state (0,0) is the absorbing state and all other states are transient for every patient i. Let p(_j,ki)(t) the probability that {Yi(t), Ki(t), t≥0} is in state (j, k) (j patients

before him and the server is in phase k)

p(_j,ki)(t) =P[(Yi(t), Ki(t)) = (j, k)]. p_i(t) =p(_i₋i)₁_,₁(t), . . . , p(_i₋i)₁_,m i(t), p (i) i−2,1(t), . . . , p (i) i−2,mi−1(t), . . . , p (i) 0,1(t), . . . , p (i) 0,m1(t) αi= (αi,1, αi,2, . . . , αi,mi)

For the first patient, at t1 = 0, it holds that there is no patient in the system, so that the first patient is phase-type distributed with transition matrixS1 and α1

p₁(t) = (p1₀_,₁, . . . , p1₀_,m₁) =α1eS1t ⇒ F1(t) = 1−p1(t)1.

HenceP[Si ≤t] = 1−pi(t)1. Then for the next patient, who is phase-type distributed (α2,S2)

and arrives att2 =x1, there are two cases:

• The process {Y2(t), K2(t), t≥0} will start at state (1, k) with probability p1 0,k(x1),

where k= 1,2, . . . , m1.

• The process {Y2(t), K2(t), t≥0} will start at state (0, k) with probability α2,kF1(x1), where k= 1,2, . . . , m2.

So that the sojourn time of the 2-nd patient is governed by a continuous-time Markov chain with infinitesimal generator

Q2=   0 0 0 0 S1 S01α2 S0₂ 0 S2  ,

with initial-state distribution (0,p₁(x1),α2F1(x1)).So that if we let S₂ =

S1 S01α2

0 S2

then the sojourn time of the 2-nd patient is given by the following system of differential equations

dp₂(t)

dt =p2(t)S2,

with initial conditionp₂(0) = (p₁(x1),α2F1(x1)). The unique solution is

p₂(t) = (p₁(x1),α2F1(x1))eS2t, t≥0.

(29)

28 CHAPTER 3. APPROXIMATION BY A PHASE-TYPE DISTRIBUTION

• The process{Yi(t), Ki(t), t≥0}will start at state (i−1, k) with probabilityp_ii−−12,k(xi−1),

where k= 1,2, . . . , m1

• The process{Yi(t), Ki(t), t≥0}will start at state (i−2, k) with probabilityp_ii−−13,k(xi−1),

where k= 1,2, . . . , m2

.. .

• The process {Yi(t), Ki(t), t≥0} will start at state (1, k) with probability pi0−,k1(xi−1), where k= 1,2, . . . , mi−1

• The process {Yi(t), Ki(t), t≥0} will start at state (0, k) with probabilitypi₀−_,k1(xi−1) =

αi,kFi−1(xi−1), where k= 1,2, . . . , mi

So that the sojourn time ofi-th patient is governed by a continuous-time Markov chain with infinitesimal generator Qi=           0 0 0 0 . . . 0 0 S1 S01α2 0 . . . 0 0 0 S2 . .. . . . 0 .. . ... . .. . .. S0_i−2αi−1 0 0 0 . . . 0 Si−1 S0i−1αi S0_n 0 . . . 0 0 Si           ,

with initial-state distribution (0,p_i₋₁(xi−1),αiFi−1(xi−1)) and submatrix

Si =         S1 S01α2 0 . . . 0 0 S2 . .. . . . 0 .. . . .. . .. S0_i−2αi−1 0 0 . . . 0 Si−1 S0i−1αi 0 . . . 0 0 Si         .

So that the sojourn time of the i-th patient is given by the following system of differential equations

dp_i(t)

dt =pi(t)Si

with initial conditionp_i(0) = (p_i₋₁(xi−1),αiFi−1(xi−1)). The solution is given by

p_i(t) = (p_i−1(xi−1),αiFi−1(xi−1))eSit, fort≥0.

So for finite number of patients we can compute the individual sojourn time distributions. We summarize this recursive procedure and state it as a proposition.

Proposition 3.12. If the interarrival times are xi for i= 1,2, . . . , N then the sojourn time

distributions Fi(t) for (patient) i= 1,2, . . . , N+ 1are given by

(30)

3.4. RECURSIVE PROCEDURE FOR COMPUTING SOJOURN TIMES 29

where

p₁(t) =α1eS1t, S1 =S1

p_i(t) = (p_i−1(xi−1),αiFi−1(xi−1))eSit for i= 2,3, . . . , N+ 1.

The mean sojourn time is E[Si] =−(pi−1(xi−1),αiFi−1(xi−1))S_i−11.

This proposition can be proved by induction. The expression for the mean sojourn times follow directly from the properties of phase-type distributions, see Section 3.1. Observe that

p_i relies completely on p_i−1 and its dimension is expanded from

Pi−1

j=1mj to Pij=1mj, the

phases of the new patient are added to continuous-time Markov chain. Also, the distribution of the i-th patient is a function of x1, . . . , xi−1 only, because of the first-come, first-served discipline. When all phase-type distributions are exponential distributions we are in the simple case described extensively in the previous subsection.

In this chapter we introduced phase-type distributions and proved their ability to approxi-mate distributions with a positive support arbitrarily accurately. After which we gave explicit and easy-to-use formulas to fit distributions roughly. In order to do so, we divided distribution functions into two categories:

• Distribution functions with a SCV ≤1 are fitted by a EK−1,K(µ;p) distribution.

• Distribution functions with a SCV ≥1 are fitted by a H2(µ1, µ2;p1, p2) distribution. Furthermore, we demonstrated a recursive procedure for phase-type distributions to compute the patient’s individual sojourn time distributions. This procedure will be used in the next chapter for optimization.

(31)

Chapter 4

Optimization methods

In this chapter we obtain optimal schedules in different settings. The fact that the distribution functions can be approximated by phase-type distributions gives us the opportunity to use the recursive procedure described in the latter chapter. So thefirst step is to approximate the service times by an appropriate phase-type distribution. Thesecond step is to implement the recursive procedure on these phase-type ‘distributed’ service times to find the sojourn times.

Third, we optimize over these sojourn times to find optimal schedules.

Since optimizing simultaneously for all patients is highly complex, we use numerical meth-ods to compute optimal interarrival times. Further, we compare the simultaneous approach with the sequential counterpart introduced by Kemper et al. (2011). This approach arose as a trade-off between computational time and a sufficiently close-to-optimal schedule. Vink et al. (2012) introduced the so called lag-order method, which spans all trade-offs between the sequential and simultaneous approach. We will describe the idea behind the lag-order method for the sake of completeness in Section 4.3.

Observe that the optimization problem is over t2, . . . , tN+1. By the relation between ti’s

and xi’s defined by ti = Pi_j−₌₁1xj, see Chapter 2, the problem is equivalent with

minimiz-ing over the interarrival times x1, . . . , xN only. Let us first describe different optimization

methods.

4.1

Simultaneous optimization

In the classical case one minimizes the system’s loss over all possible schedules. This means that all patients are scheduled simultaneously, so that we minimize as follows:

min t2,...,tN+1 R(t2, . . . , tN+1) = min x1,...,xN R(x1, . . . , xN) = min x1,...,xN N+1 X i=2 E[l(Si−1−xi−1)] = min x1,...,xN N X i=1 E[l(Si−xi)], (4.1)

wherel(x) is convex function. To optimize simultaneously there is almost no tractable deriva-tion possible. Only the exponential case has a tractable soluderiva-tion, see Wang (1997). The difficulty is that the xi’s are too much interlinked. The choice of an optimal xi depends

(32)

4.2. SEQUENTIAL OPTIMIZATION 31

implicitly on the previous optimal interarrival timesxi−1, . . . , x1.This is because the arrivals of preceding patients influence the waiting and idle times of patienti.

Moreover, the fact that phase-type distributions have no ‘nice’ representations makes com-putations intractable. Therefore in this case one wants to uses numerical algorithms to find the optimum interarrival times, see the appendix for such an outline of such an algorithm. A solution which leads to tractable solution is to consider the problem sequentially as we will do so in the next section.

4.2

Sequential optimization

A solution to the complexity of optimizing simultaneously is to consider the optimization problem sequentially. This approach is introduced by Kemper et al. (2011) [16]. If one optimizes the schedule sequentially, one minimizes for all i∈ {2, . . . N + 1}, given that you know ti−1, ti−2. . . , t1 min ti R(ti, ti−1, . . . , t1) = min ti E [g(Ii) +h(Ii)] = min xi−1E [l(Si−1−xi−1)]. (4.2)

In this formula Si−1 depends implicitly on the previous arrival epochs ti−1, . . . , t1 or

equiv-alentlyxi−2, . . . , x1, but these are fixed, since we consider the optimization problem sequen-tially. This reduces the problem drastically, such that there is an optimal schedule under some conditions, see Theorem 4.1 below, from Kemper (2011) [16].

Theorem 4.1. Let l : R → R≥0 be a convex function with l(0) = 0. Let B1, . . . , BN+1 be

independent non-negative random variables such that

E " l N X i=1 Bi+y !# <∞

holds for ally∈R≥0. Let schedule W defined by

t1 = 0, ti = i−1 X j=1 x∗_j, ∀i= 2, . . . , N+ 1,

where x∗_j ≤0 is the value at which

−∂R (l)_(x

j−1) ∂xj−1

=El0(Sj−1−xj−1)= 0

or changes sign. In case there is no x∗

j satisfying this condition then it is set equal to ∞.

Then the scheduleW sequentially minimizes the loss.

Proof. First we show finiteness:

E[l(Sj−xj)]≤E " l N X i=1 Bi−xj !# <∞, xj <0 E[l(Sj−xj)]≤E[l(Sj)] +l(xj)≤E " l N X i=1 Bi !# +l(xj)<∞, xj ≥0

(33)

32 CHAPTER 4. OPTIMIZATION METHODS

Because of convexity, l0_(x

j) is monotone and the non-negativity of l(xj) imply that for all

a≤b Z b a l0(x_j) dx_j ≤l(b) +l(a).

By Fubini’s Theorem we have

Z b a El0(S_j−x_j) dxj =E Z b a l0(S_j −x_j) dx_j , combining this with the results above we find

Z b a El0(S_j−x_j) dxj ≤E[l(Sj−b)] +E[l(Sj −a)]≤ ∞

so that E[l(Sj −xj)] is absolutely continuous with derivative −E[l0(Sj −xj)] and therefore

convex. Hence there exists a minimum, since l(x) ∈ _R≥0. Moreover, E[l0(Sj −xj)] is

non-increasing in xj and non-negative at xj = 0 i.e. E[l0(Sj)] ≥ 0 always. Therefore x∗j is

non-negative∀j.

Since weighted linear and quadratic cost functions satisfy the conditions of Theorem 4.1, we can derive optimal interarrival times, which we present here as examples.

4.2.1 Quadratic loss

ConsiderRi the weighted quadratic cost function, c.f. equation (2.8). We optimize

sequen-tially, so one has to minimize the following function over xi givenxi−1, . . . , x1

Ri(xi, . . . , x1) =E[l(Si−x∗i)] =E h α(Si−xi)21{Si−x∗_i<0}<0+ (1−α) (Si−xi) 2₁ {Si−x∗_i<0}>0 i . Using Theorem 4.1 one obtains x∗

i by solving El0(Si−x∗i) =−(1−α)E h (Si−x∗i) 1{Si−x∗i>0} i −αE h (Si−x∗i) 1{Si−x∗i<0} i =−αE[Si−x∗i]−(1−2α)E h (Si−x∗i) 1{Si−x∗i<0} i =αE[x∗i −Si]−(1−2α) Z ∞ x∗ i P[Si > t] dt= 0.

Where the latter integral follows from

E h (Si−xi)∗1{Si−x∗i<0} i = Z ∞ 0 (s−x∗_i)1{s−x∗_i<0} dFSi(s) = Z ∞ x∗_i Z s−x∗i 0 dtdFSi(s) (Fubini) = Z ∞ 0 Z ∞ t+x∗ i dFSi(s)dt = Z ∞ 0 P[Si> t+x∗i]dt = Z ∞ x∗ i P[Si> t]dt.

(34)

4.3. LAG-ORDER METHOD 33

Suppose the special case that α = 1₂ (waiting and idle times are equally weighted) then the solution reduces to

x∗_i =ESi.

Thus the optimal interarrival times are equal to the mean of the (corresponding) sojourn times.

4.2.2 Absolute loss

ConsiderRi the weighted absolute loss function, c.f. equation (2.6). Optimizing sequentially

means that we minimize the following function overxi given xi−1, . . . , x1

Ri(xi, . . . , x1) =E[l(Si−xi∗)] =E[α|Si−xi|1Si−xi<0+ (1−α)|Si−xi|1Si−xi>0]

By Fubini’s Theorem the latter can be rewritten

E[l(Si−x∗i)] = Z xi 0 Z xi s α dtdFSi(s) + Z ∞ xi Z s xi (1−α) dtdFSi(s) (Fubini) = Z xi 0 Z t 0 α dFSi(s)dt+ Z ∞ xi Z ∞ t (1−α) dFSi(s)dt =α Z xi 0 FSi(t) dt+ (1−α) Z ∞ xi (1−FSi(t))dt.

Now, using Theorem 4.1 one obtains x∗

i by solving El0(Si−x∗i) =αFSi(x ∗ i)−(1−α) (1−FSi(x ∗ i)) = 0⇔x∗i =FS−i1(1−α).

So in this example the optimal interarrival times are quantiles of the (corresponding) sojourn time distributions.

4.3

Lag-order method

In this section we discuss the lag-order method introduced by Vink et al. (2012) [25]. The key observation is that patient i’s waiting time depends on all preceding interarrival times. However, we can restrict the influence of preceding interarrival times which are far away from patient i’s, since interarrival times further away from xi affect thexi less. Hence, we consider

for all i= 1, . . . , N min xi Ri(xi, xi−1, . . . , xi−k) = min xi E [l(Si(xi−1, . . . , xi−k)−xi)], (4.3) instead of min xi Ri(xi, xi−1, . . . , x1) = min xi E [l(Si(xi−1, . . . , x1)−xi)].

This procedure is an optimization method which contains every compromise between op-timizing sequentially, k = 1, to optimizing simultaneously, k = N −1. For the practical consideration of this approach we refer to the paper by Vink et al.

(35)

34 CHAPTER 4. OPTIMIZATION METHODS

4.4

Computational results for transient cases

In this section we evaluate the simultaneous and sequential optimization for three differ-ent SCV values where we set the mean equal to one. We choose similar SCV values as in Wang (1997) [26]. Herein, Wang chooses (as an example) a Coxian-3, exponential and Coxian-2 distribution to model aSCV = 0.7186,SCV = 1 and aSCV = 1.6036. We use our moment matching procedure, see Section 3.3, to match mean one and the aboveSCV values. We summarize our parameters.

• To model aSCV = 0.7186 Wang uses a Coxian distribution with three phases, namely, (µ1, µ2, µ3) = (4₃,8₃,4) and (p1, p2) = (₂1,1₂) so thatα= (1,0,0) and

S =   −4 3 2 3 0 0 −8₃ 4₃ 0 0 −4  

We model the same SCV by an EK−1,K(µ;p) distribution with parameters K = 2,

µ= 1.60026 andp= 0.399744.I.eα= (1,0) and

S = −1.60026 0.960563 0 −1.60026 .

• SCV = 1 Wang uses an exponential distribution with µ = 1 to match this SCV, by using the phase-type fit we get exactly the same distribution.

• To model a SCV = 1.6036 Wang uses a Coxian distribution with only two phases, (µ1, µ2) = (1.3,0.4333) and p1 = 0.1, so thatα= (1,0) and

S =

−1.3 0.13 0 −0.4333

We model this SCV by aH2(µ1, µ2;p1, p2) distribution with parameters chosen by our matching method. I

Appointment scheduling in healthcare. Master Thesis