Managing Customer Arrivals in Service Systems with Multiple Servers

(1)

Managing Customer Arrivals in Service Systems

with Multiple Servers

Christos Zacharias

Department of Management Science, School of Business Administration, University of Miami, Coral Gables, FL 33146.

czacharias@bus.miami.edu

Michael Pinedo

Department of Information, Operations & Management Sciences, Stern School of Business, New York University, New York, NY 10012.

mpinedo@stern.nyu.edu

We analyze a discrete multi-server model for scheduling customer arrivals under no-shows. Customers may have different waiting cost coefficients and different no-show rates, reflecting their type and their history in attending scheduled appointments respectively. The challenge is to assign customers to time slots so that the service system utilizes its resources efficiently, and customers experience short waiting times. Theoretical and heuristic guidelines are provided for the effective practice of appointment overbooking to offset no-shows. For the case of heterogeneous customers, we provide structural properties of an optimal schedule and we introduce a new sequencing rule. When customers come from a homogeneous pool, recursive expressions for the performance measures of interest are derived and we provide an upper bound for the optimal overbooking level. Extensive numerical experiments reveal further properties and patterns that appear in the optimal solution, and motivate the development of two very well performing and computationally inexpensive heuris-tic solutions. Our analysis demonstrates the benefits of resource-pooling in containing operational costs and increasing customer throughput.

Key words: service systems; scheduling; no-shows; overbooking; discrete queues; parallel servers.

History: working paper, last edited on August 15 2015.

1. Introduction

Appointment scheduling systems are widely used as a tool for managing customer arrivals and matching supply and demand for services. It is common for customers to not show up for their scheduled services. Missed appointments result in under-utilization of a service system’s valuable resources and limit the access for other customers who could have filled the empty slots.

Appointment overbooking is one operational strategy employed by service providers to address the issue of no-shows and at the same time increase customers’ access to services. On the other hand, overbooking potentially results in an overcrowded facility, with increased customers’ waits and system’s overtime. In this study we demonstrate that a sensible practice of appointment over-booking can significantly improve the operational performance of a service system, while customers experience short waiting times and better access services.

We address the problem of scheduling customer arrivals at a parallel-server system under no-shows. Customers have different waiting cost coefficients and different no-show rates, reflecting

(2)

their type and their history in attending scheduled appointments respectively. An optimal sched-ule balances the trade-offs between the benefits of efficient resource utilization and the costs of customers’ waiting time.

This study demonstrates that an informed strategy for appointment overbooking can significantly improve the operational performance of a service system, while customers experience short waiting times and better access to care. A discrete queueing model captures the random evolution of the system’s workload, based on which we derive recursive expressions for the performance measures of interest. The task of finding an optimal schedule is modeled as an integer stochastic program which is analytically intractable and computationally expensive. A tight upper-bound (a solution to a convex program) restricts our search for an optimal schedule on a contained solution space. Our theoretical and experimental analysis reveals properties and patterns that appear in the optimal scheduling strategy, and informs the development of two highly efficient heuristic solutions.

While motivated by the pressing needs of the healthcare sector, our model is applicable for a wide range of service systems with appointment driven arrivals. We avoid a reference to a particular application domain throughout this study, by using generic terms as “servers” and “service system”. In outpatient care, our “service system” can be used to model a diagnostic facility where it is crucial to utilize resources (e.g., CT scan, X-ray generator, MRI) efficiently. Doctors are modeled as “parallel servers” in settings where continuity of care (see Balasubramanian et al. (2010)) is not a concern. Nurses are modeled as “parallel servers” when they are the bottleneck resource, and/or the presence of a doctor is not required (e.g., vaccination and immunization, routine lab testing, etc.). Other application domains include in-office consultation (e.g., financial, legal), on-site customer support (e.g., Apple Genius Bar), entertainment, cosmetic services.

2. Related Literature

Many papers have appeared in the literature on appointment scheduling, mostly motivated by healthcare applications. Cayirli and Veral (2003), Gupta and Denton (2008) provide overviews of the literature, the research challenges and opportunities. Hall (2012) provides a comprehensive review of models and methods used for scheduling the delivery of patient care for all parts of the healthcare system. The analysis may be based on anyone of a variety of approaches, including stochastic programming (e.g., Mancilla and Storer (2012), Mak et al. (2014)), queueing theory (e.g., Green and Savin (2008), Liu and Ziya (2013), Kuiper et al. (2014)), and stylized scheduling models (e.g., Robinson and Chen (2010), LaGanga and Lawrence (2012), Zacharias and Pinedo (2014)).

In the case of homogeneous customers it is of interest to determine how many customers to schedule any given day and how to allocate these customers to slots. The sequencing of the cus-tomers is also of interest when cuscus-tomers have different characteristics. In most cases, finding an

(3)

optimal schedule is analytically intractable, and thus, the majority the literature uses enumeration, search algorithms, simulation-based techniques and/or heuristics.

A service system typically starts empty at the beginning of a working day, operates for a finite amount of time, and shuts down until the next period. Therefore, it is important to perform tran-sient analysis for the random evolution of such systems. As pointed out in Bandi and Bertsimas (2012), transient queues are difficult to analyze via classical queueing techniques. Typically the analysis of rich queueing systems over finite time horizons is addressed either by computer simu-lation (e.g., Millhiser and Veral (2014), Klassen and Yoogalingam (2009)) or approximations (e.g., Araman and Glynn (2012), Honnappa et al. (2014), Zacharias and Armony (2015)).

Most of the literature focuses on single-server models. Kaandorp and Koole (2007), Hassin and Mendel (2008), Klassen and Yoogalingam (2009), Robinson and Chen (2010), Millhiser and Veral (2014) are some recent works that consider the appointment scheduling problem with homogeneous customers who arrive on time for their scheduled appointments, if they do show up. Begen and Queyranne (2011), Cayirli et al. (2012), LaGanga and Lawrence (2012), Zacharias and Pinedo (2014) account for customer heterogeneity as well.

Even though the literature for the single server system is quite extensive, the multi-server case has received limited attention. As pointed out by Gupta and Wang (2012) as well, appointment scheduling models become intractable if multiple features are considered simultaneously. Very few studies analyze service systems with more than one server, and, to the best of our knowledge, in that case only simulation studies have been conducted. For example, Sickinger and Kolisch (2009) propose and evaluate heuristic scheduling policies for a medical center with two computer tomog-raphy (CT) scanners, Liu and Liu (1998) study a block appointment system for clinic operations with multiple random arriving doctor, Zhu et al. (2009) construct a discrete event simulation model to study specialist outpatient clinics. A stylized discrete queueing model withs≥1 servers has not been studied analytically in the literature and bears unique challenges.

The goal of this study is to develop and analyze a discrete multi-server queueing model for designing appointment systems. We address the following questions: (a) how to effectively over-book capacity in order to offset customer no-shows, (b) how to allocate the available service slots throughout the working day and furthermore, (c) how to account for patient heterogeneity. The objective function is the weighted sum of expected system’s idle time, system’s overtime and cus-tomers’ waiting time.

3. Homogeneous Customers

In this section we introduce a stylized discrete queueing model that captures the random evolution of the system’s workload over time, based on which we derive recursive expressions for the

(4)

perfor-mance measures of interest in transient state. Consequently we present the optimization problem under study and characterize the optimal overbooking strategy.

3.1. Discrete Multi-Server Scheduling Model

Consider s identical service providers working in parallel. Each one has in her regular schedulen time slots available to serve customers in a working day. Beyond thesenregular slots, each one can serve customers in overtime slots as well. The system’s regular capacity is thereforen×scustomers per day.

Customer arrivals are driven by scheduled appointments and the scheduler’s task is to assign a number of customers to each time slot. We assume that customers show up with probability p= 1−q∈(0,1) at the beginning of their assigned slot and require deterministic service of one time slot. The number of customers to be scheduled throughout the working day (a decision variable) is denoted bym, and we let y=m−n×sdenote the overbooking level.

We consider only schedules that assign all m customers to the nregular time slots. That is, no customer is assigned a priori at the outset to an overtime slot. However, if the service providers have not served all customers by the end of then-th slot, then some or all of them have to continue working overtime until the queue empties out, while overtime costs are being incurred.

Since customers are homogeneous in this section, their sequencing is irrelevant, and a schedule can be completely characterized by the vectorx= (x1, . . . , xn)∈Zn, wherextis the number of customers

assigned to slot t, with m=Pn

t=1xt. In §5 we address the optimal sequencing of heterogeneous customers.

Before we introduce the optimization problem, we characterize the random evolution of the systems workload under any given schedulex. The number of new arrivals at the beginning of each slot is a binomial random variable, not necessarily identically distributed, since we allow to assign a different number of customers to different slots. The backlog of customers at the beginning of slott, denoted byZt, is captured by the recursion

Zt= max{Zt−1+At−1−s,0}, fort≥2, (1) and Z1= 0,

whereAt∼Binomial (xt, p) denotes the number of new arrivals at slott. Our service system evolves

randomly over time as a discrete multi-server queue with group arrivals, i.e., a DAt_/D/s _queue.

Note that a similar recursion with (1) describes the waiting timein a D/G/1 queue, see Janssen and van Leeuwaarden (2005).

The probability distribution of the queue length in transient state is derived as follows. Let f(k;n, p) be the probability that a Binomial (n, p) random variable takes a value equal to k, i.e.,

(5)

f(k;n, p) =

n k

pk₍₁₋_p)n−k_{, and let} _πj

t(x) = Pr(Zt=j) denote the probability of a backlog of

j customers at the the beginning of slot t under schedule x. Let lt=

Pt

τ=1(xτ−s) denote the

maximum possible backlog at the beginning of slot t. Assuming that the system is empty at the beginning of the working day, then π0

1(x) = 1 and π

j

t(x) can be expressed recursively for t=

2,3, ..., n+ 1 as

πi t(x) =

        

       

min(s,l_t−1) X

j=0

πt−j 1(x)

s−j

X

k=0

f(k;xt−1, p) fori= 0

min(s+i,lt−1) X

j=max(0,s+i−xt)

π_t−j 1(x)f(s+i−j;xt−1, p) for 1≤i≤lt

0 otherwise.

(2)

Let I(x),O(x) andW(x) denote the expected servers’ total idle time, overtime, and customers’ aggregate waiting time, respectively, associated with schedule x. Note that

#(idle slots) + #(customers who show up) =ns+ (#overtime slots),

and therefore

I(x) =O(x) +ns−mp. (3)

The performance measures of interest can be expressed respectively based on (2) and (3) as

O(x) =E(Zn+1) = l_n₊₁

X

j=0

jπnj+1(x), (4)

I(x) =

l_n₊₁

X

j=0

jπnj+1(x) +ns−pm, (5)

W(x) =

n

X

t=1

xt X

i=1

lt X

j=max(0,s−i+1) πtj(x)

i−1

X

k=max(0,s−j)

rf(k;i−1, p)j+k−s+1

s

. (6)

3.2. Optimization Problem

There are three costs (penalties) associated with an appointment schedule: customers’ waiting cost, servers’ idle time and overtime costs. If there are less thanscustomers present at the beginning of any one of the regular ntime slots, then one or more providers remain idle and for each provider being idle an idle time costc_I is incurred. The scheduler may overbook certain time slots and assign more thanscustomers in order to compensate for the no-show behavior. If more thanscustomers are present at the beginning of a time slot due to overbooking, then all but s of these customers have to wait. A waiting cost w is incurred for each time slot that a customer has to wait before starting service. Finally, an overtime cost c_O is incurred for each overtime slot. We normalize the

(6)

objective function with respect to c_I, i.e. c_I= 1, and we consider the following nonlinear integer program:

min

(m,x) V(x) =I(x) +cOO(x) +wW(x)

s.t. xtnonnegatve integer for all t= 1,2, ..., n, n

X

t=1

xt=m.

(P1)

We denote the optimal solution with (m∗_,_x∗_{) and the optimal overbooking level with}_y∗₌_m∗₋_ns.

Lemma 1. x∗t≥s for all t= 1,2, ..., n.

Lemma 1 is a direct consequence of the recursion in (1) and our model’s assumptions. The optimal schedule has at leastscustomers assigned to each slot, and the optimization problem now becomes to identify which slots (if any) to overbook and by how much. Let

Xy n ={x:

n

X

t=1

xt=ns+y, xt≥s, fort= 1,2, ..., n}

denote the set of all feasible schedules that allocate y overbooked customers to the n slots, with every slot having at leastscustomers assigned to it.

3.3. Upper-Bound on the Optimal Overbooking Level

In this section we demonstrate that the optimal solution y∗ of (P1) is bounded above by the solution of a discrete convex optimization problem.

Let xy= (s+y, s, ..., s) for some nonnegative integery be a schedule where all the overbooking

(if any) occurs during the first time slot and exactly s customers are assigned to slots 2,3, ..., n, and letA={xy:y nonnegative integer}. Such schedules turn out to be optimal when we focus on

optimizing system’s efficiency by ignoring customers’ waiting cost:

min

(m,x) I(x) +cOO(x)

s.t.xtnonnegative integer for all t= 1,2, ..., n,

xt≥sfor allt= 1,2, ..., n, n

X

t=1

xt=m.

(P2)

The optimization problem (P2) balances the trade-offs between maintaining high resource utiliza-tion during the regular length of the workday and incurring low overtime costs.

Lemma 2. There is a y¯≥0 such that xy¯ is a solution to the optimization problem (P2).

When we look past customers’ waiting cost, it is optimal to overbook a number customers at the very beginning of the working day. This policy guarantees that the system is as busy as possible,

(7)

while overtime costs are kept at a moderate level (the overbooked customers are absorbed by potential no-shows throughout the working day). A schedule within class A is a solution to (P1) (minimizing the weighted sum of all three costs) if the servers’ cost coefficients, c_I and c_O, are an order of magnitude larger than the customers’ weight. In practice, such system could correspond to one where providers’ availability and system’s resources are sufficiently more costly than having customers waiting. Further, the optimal overbooking level ¯y for (P2) turns out to provide an analytically tractable upper bound to the optimal overbooking levely∗ _{for (}_P

1).

Let My∼Binomial (ns+y, p) denote the total number of customers who will show up for their

appointment under schedulexy= (s+y, s, ..., s). Then

O(xy) =E[max(0, My−ns)]

=E[My−ns] +E[max(0, ns−My)]

=pm−ns+

ns

X

k=0

(ns−k)f(k;m, p), (7)

and from (5),

I(xy) =O(xy) +ns−pm

=

ns

X

k=0

(ns−k)f(k;m, p). (8)

Recall that all appointment schedules within class A minimize the idle time and overtime costs, and therefore (P2) can be written as

min

y≥0,x∈Xny

[I(x) +c_OO(x)] = min

y≥0,x∈Xny∩A

[I(x) +c_OO(x)] = min

y≥0[I(xy) +cOO(xy)]. Theorem 1. (i) I(xy) is decreasing and discretely convex in y on {0,1, ...}.

(ii) O(xy) is increasing and discretely convex in y on {0,1, ...}.

(iii) y∗≤y¯.

Since O(xy) andI(xy) are discretely convex iny, efficient computational procedures can provide

¯

y, which is an upper bound to y∗_{. As demonstrated in} _§_{4, this upper bound is tight and contains}

useful information regarding the optimal solution to (P1).

For the rest of §3.3 we provide an analytic characterization for the upper bound ¯y based on the continuous relaxation of (P2). Consider the differentiable extension of the binomial coefficient to non-integers 0≤v≤u defined as u_v

=_Γ(_v_+1)Γ(Γ(u+1)_u−v₊₁₎, where Γ(t) =R∞

0 x

t−1_e−x_dx _{is the} _Gamma

function. Note note that d u_v

du =

Γ(u+ 1)

(8)

where Ψ(t) = Γ_Γ(0(_tt₎) is the Digamma function, the logarithmic derivative of the Gamma function. Whenvis integer, then d(

u v) du =

u v

Pv−1

i=0 1

v−i. Therefore, either ¯y= 0 (boundary solution), or ¯y=byˆc,

or ¯y=dyˆe, where ˆy satisfies a first order condition

(1 +c_O)

ns

X

k=0

(ns−k)

ns+ ˆy k

pk(1−p)ns+ˆy−k

"

ln(1−p) +

k−1

X

i=0 1 ns+ ˆy−i

#

+pc_O= 0. (9)

The optimal overbooking level y∗ _{is bounded above by ¯}_y_{= ¯}_m₋_ns.

3.4. Periodic Overbooking Heuristic

The optimization problem (P1) is computationally very intense; the size of the solution space is |Xy

n|=

(y+n−1)!

y!(n−1)!, exponential both iny and n, with y being subject to optimization as well. In§3.3 an upper bound for the overbooking level was developed, by considering the problem wherew= 0. It is demonstrated in §4 that, as the waiting cost coefficient w increases, the optimal schedules become more uniform, without necessarily observing a decrease in the optimal overbooking level. Furthermore, periodic patterns appear in the middle segment of the schedule for large values of n. It appears that one can think of an optimal schedule to consist of three segments: a beginning

start-up segment, a middlestationary segment, and a final emptying-out segment. In the start-up segment, the schedule tends to be subject to more overbooking than in the other two. The middle segment appears to be more uniform and regular. The overbooked customers in the last segment of the schedule tend to taper off (in order to avoid high overtime costs). If n is large then the middle segment of the schedule is quite substantial. However, whennis small, the middle segment of the schedule may tend to disappear. The uniformity and regularity of the middle segment of the schedule is a motivation for the heuristic described in what follows.

We propose a computationally inexpensive heuristic solution based on the evolution of a discrete queue. Let vn₀,y₀ = (s+y0, s, s, ..., s) be a sub-schedule of length n0≤n. Let m0 be the number

of customers allocated to segment vn₀,y₀, i.e., m0 =n0s+y0, and letN0=b

n

n₀c be the number of

consecutive such segments. We consider periodic schedules of the form

xn₀,y₀ = (vn₀,y₀;vn₀,y₀;...;vn₀,y₀

| {z }

N₀times

).

In other words, we consider schedules that demonstrate a periodic spike of size y₀ every n₀ slots, see Figure 1.

Periodic schedules give a renewal flavor to our arrival process, rendering our transient queue tractable. If we focus only on the queue length at the beginning of time slotst1, t2, ..., tN₀+1, where

(9)

Figure 1 A schedule with periodic overbooking.

t(slot index)

xt

(c

u

st

om

er

s

p

er

sl

ot

)_s₊_y 0

s

t1 t2 t3 . . .

and overtime. Consider the discrete-time bulk queueing system that evolves according to (1), with Ai being independent and identically distributed as A∼Binomial (m₀, p), i= 1,2, ..., N₀. In the

queueing literature such a system is referred to as a discreteDA_/D/sn

0 queue. As in Janssen and

van Leeuwaarden (2005), the expected queue length right before the beginning of segment ti is

given explicitly, not recursively, by Spitzer’s identity (see Spitzer (1956)) as

E(Zti) = i−1

X

τ=1 1

τE[max(0, Yτ)]

=

i−1

X

τ=1 1

τ [E[Yτ] +E[max(0,−Yτ)]]

=

i−1

X

τ=1 1 τ

"

pτ m0−τ sn₀+

τ sn₀

X

k=0

(τ n₀s−k)f(k;τ m₀, p)

#

= (i−1)(pm0−sn₀) +

i−1

X

τ=1 1 τ

τ sn₀

X

k=0

(τ sn₀−k)f(k;τ m₀, p), (10)

whereYτ=

Pτ

i=1(Ai−s) denotes theτ-fold convolution of (A−s). The expected system’s overtime and idle time can now be expressed as

O(xn₀,y₀) =E(ZtN

0+1) =pm0N0

−sn₀N₀+

N₀

X

τ=1 1 τ

τ sn₀

X

k=0

(τ sn₀−k)f(k;τ m₀, p), (11)

and I(xn₀,y₀) =O(xn₀,y₀) +sn0N0−pm0N0=

N₀

X

τ=1 1 τ

τ sn₀

X

k=0

(τ sn₀−k)f(k;τ m₀, p). (12)

Note that (7) and (8) are special cases of (11) and (12) respectively whenn₀=n.

Theorem 2. (i) I(xn₀,y₀) is decreasing and discretely convex iny0 on {0,1, ...}. (ii) O(xn₀,y₀) is increasing and discretely convex iny0 on {0,1, ...}.

Periodic schedules, besides being analytically tractable, turn out to yield computationally inex-pensive and very well performing heuristic solutions. Algorithm 1 describes the Periodic Overbook-ing Heuristic (POH) in terms of solvOverbook-ing n convex programs. As demonstrated in §4, the middle segment of the optimal schedule often has the pattern of one of the following three special cases: (a) y₀= 1 for somen₀>1 corresponding to moderate overbooking,

(b) n₀= 1 corresponding to frequent overbooking when no-show rates are high, (c) n₀=n when it is optimal to overbook only at the beginning of the schedule.

(10)

Algorithm 1 Periodic Overbooking Heuristic (POH) 1: procedure_POH(n, s, w, q, c_O)

2: x_POH←sen .initiation state - no overbooking,enis the vector ofnone’s,

3: cost_POH←(1−p)ns .only idle time cost

4: forn0= 1,2, ..., ndo

5: y₀∗←arg miny₀[I(xn₀,y₀) +c0O(xn0,y0)] .discrete convex program

6: l←n mod n0

7: x←(xn₀,y∗

0;sel) .elis the vector oflone’s

8: cost←[I(x) +c₀O(x) +wW(x)] 9: if cost<cost_POH then

10: xPOH←x

11: costPOH←cost

12: end if

13: end for

14: return x_POH and cost_POH 15: end procedure

3.5. Front-Loading Heuristic

The overbooking level y∗ contains much of the information regarding the optimal schedules. As demonstrated in the following section, the overbooked customers are allocated according to a front-loadedpattern: more customers are scheduled towards the beginning of the working day (in order to get an empty system running), and towards the end of the working day the a schedule becomes less dense (in order to avoid high overtime costs).

We propose a second heuristic, the Front-Loading Heuristic (FLH), which predicts the optimal overbooking level based on our numerical analysis in §4.2, and allocates the overbooked customers in a front-loaded manner. Algorithm 2 (see Appendix) describes in detail the FLH procedure. In

§4.3 we compare the two heuristic solutions and evaluate their performance.

4. Numerical Experiments

In this section we display and discuss the results of our numerical experiments. Our analysis reveals further properties and patterns that appear in the optimal schedules, and provide us with additional insights into their overall structure. Throughout our numerical analysis, following the literature (see for example Robinson and Chen (2010, 2011), Zacharias and Pinedo (2014)), we consider an overtime cost coefficient c_O= 1.5 and values of the waiting cost coefficientw between 0 and 0.60. Provider’s idleness costs more than customer’s waits, but it is less costly than provider’s overtime. We interpret that values of w between 0 and 0.10 correspond to an efficiency regime, values of w above 0.20 correspond to a quality regime, and between 0.10 and 0.20 to a hybrid quality and

(11)

Figure 2 Optimal schedules.n= 10,s∈ {1,2,3,4,5},w∈ {0.10,0.15,0.20},q∈ {0.05,0.10,0.15,0.20,0.25,0.30}.