4.7 Open Questions
5.1.3 Skills-Based Routing in the Efficiency-Driven Regime
For an efficiency-driven operation, one lets the agents’ utilization approach 100% in a way that, in the limit, all customers are delayed in queue. (The agent selection problem then becomes irrelevant.) As one takes limits, the number of agents either remains fixed, in which case the backlog of waiting calls grows without bound, or it is allowed to increase while controlling ASA, but at a rate slow enough so that the fraction delayed still approaches 100%.
In these conventional heavy-traffic conditions, the results of Gans and van Ryzin [55] imply that, even though there may be many ways in which arriving calls can be assigned to various pools of CSRs, one need only consider a small number of possible assignments when minimizing the system backlog. Harrison and Lopez [72] further characterize the nature of type-skill matchings (or minimal skill-overlaps) that make such small sets of assignments most efficient. In particular, they identify that, in heavy traffic, efficient sets of assignments enable complete resource pooling (CRP), a condition in which the set of CSRs act as a (pooled) single, virtual “super” server. Note that all of the designs of Figure16, except for “X”, satisfy this CRP condition; eliminating any of the four arrows in “X” would do as well.
This pooling condition is the corner-stone for analysis of efficiency-driven operations, and it seems likely to be relevant in the QED regime. Section 5 of Stolyar [135], which characterizes optimal policies for a more general resource pooling condition, compactly lays out the relevant literature. Section 3 and the beginning of Section 5 in Williams [149] have the full story.
Harrison and Lopez [72] is the first skills-based-routing model that resembles the reality of a call center. In [55] both the model, which lists only call-center-wide processing rates, and the measure of system congestion, the minimum time required to work off the entire backlog of waiting calls, are aggregate. In contrast, in [72] the assignment of individual calls to CSRs is explicitly modelled, and occupancy costs are defined as growing linearly with the backlog of each type of call. Both [55,72] consider discrete-review policies that process sets of calls in large batches, however. This class of policies is reasonable for emails, for example, but it is clearly inappropriate for inbound
calls.
In contrast, for the N-design of Figure 16, Bell and Williams [20] prove the asymptotic opti- mality of threshold controls. More specifically, they assume linear occupancy costs and that type-1 customers are VIP. They then establish that, whenever the length of the type-1 queue exceeds a critical threshold, type-1 calls should get priority over type-2 calls at CSR pool 2. Williams [149] conjectures that dynamic, threshold-based policies are also asymptotically optimal for the model in [72]. It is important to note, however, that the calculation of the conjectured thresholds requires prior processing (the solution of linear programs) that intimately depends on model parameters and topology.
An alternative to thresholds is provided by index controls: each queue is assigned an index, that depends only that queue’s state; the queue chosen for service is then the one with the highest index. A striking example is van Mieghem’s [141] analysis of the V-design with a single-server, which proves the asymptotic optimality of a simple Generalized cµ (Gcµ) rule for waiting costs that are convex increasing. By equipping each agent with its own index for call selection, Mandelbaum and Stolyar [109] verify that these Gcµ rules remain asymptotically optimal in the context of skills-based routing.
To elaborate, consider a general skills-based design in which type-i calls are served by pool-j agents at rate µij. (Here µij is the reciprocal of an average service time, and µij = 0 if j’s cannot serve i’s). Delay costs are quantified in terms of type-dependent increasing convex functions: Ci(w) is the cost incurred by an i customer that waits in queue w units of time, before being served. Then each server j that becomes idle at time t adheres to the following Gcµ rule: choose to serve the longest-waiting i∗ customer for which
i∗ ∈ arg max i C
0
i(Wi(t)) µij. (22)
Here Ci0 is the derivative of Ci, and Wi(t) is the longest waiting time (that of the head-of-the- line customer) in queue i at time t. In [109] it is proved that, under complete resource pooling (as in [72, 149]), and for costs with Ci(0) = Ci0(0) = 0, the above parsimonious Gcµ rule is asymptotically optimal in heavy traffic. Qualitatively speaking, the result demonstrates that an exceedingly simple call-selection index performs well for system that are efficiency-driven – even within complex routing designs. (In these circumstances, agent-selection arises infrequently enough to be handled arbitrarily.)
We note that quadratic costs recover the aging factor of [86, 87, 118] that is introduced in the previous subsection. The assumption Ci0(0) = 0 rules out linear costs, but it is conjectured in [109] that these can be accommodated by carefully choosing aging factors that vary with system parameters.
More importantly, the natures of threshold and GCµ controls differ fundamentally. The former require careful prior calculations (of thresholds) and management by exception: type-2 calls get priority at CSR pool 2 until the number of waiting type-1 calls “crosses” an “emergency” boundary. In contrast, GCµ rules are simple and robust (surprisingly enough, they do not depend even on arrival rates), but they are based on a continuous reevaluation of the state-dependent indices.
To summarize, conventional heavy traffic analysis had yielded strikingly simple classes of policies that should perform well in efficiency-driven environments. This regime is appropriate for slower- turnaround work, such as emails or faxes, that may be processed after some delay. It is not appropriate for work that must be performed in the quality or QED regimes, however.