• No results found

The cost induced by preference violations in the linearized model is equal to that in the quadratic model

Hospital physicians can’t get no long-term satisfaction – An indicator for fairness in preference fulfillment on duty schedules

Theorem 3. The cost induced by preference violations in the linearized model is equal to that in the quadratic model

Proof. According to theorem 2, we know that the cost values for individual physicians in the linearized model are equal to those in the quadratic model. It is therefore trivial to see that the sum over all physicians j must also show this equality.

With these modifications to our model, we now have an exact MILP for the ESD strategy. This model is provably equivalent in the objective function regarding physician preference violations. In summary, the models for all our strategies are now linear decision models and can be solved by any MILP solver.

4 Case Study

We use our satisfaction indicator σjas presented in (16) to quantify individual physicians’

satisfaction with a schedule. As each schedule spans a month (4 or 5 weeks) and we need

to calculate the satisfaction indicator for each schedule, we add an index m∈ M to σj, with M being the set of all months in our study, and define σjm as the satisfaction of physician j with the schedule in month m. The focus of our evaluation is on two key performance indicators:

1. Variance of average satisfaction indicator per physician over all planning horizons, i.e., months m

2. Average of satisfaction indicator variance per physician between planning horizons, i.e., months m

For both key performance indicators, lower values indicate higher schedule quality.

For the AP S indicator, low values indicate that over all planning horizons all physicians are equally satisfied. Lower values for the ASV indicator signify that all physicians have a lower fluctuation of their personal satisfaction between planning horizons.

4.1 Real-life data and computational setup

To verify the impact of our model on long-term fairness, we apply it to real-world data from the anesthesiology department at our partner hospital. Note that our model is based on the scheduling model which is currently in use at this department and was also created by us. This department employs about 100 physicians at any time and is responsible for covering 6 overnight duties per day, requiring one physician per duty.

The data spans a total of 24 months, starting in November 2015 and ending in October 2017. In total, the data contains 9 082 requests for a duty and for being off duty. During this time span, the data contains 158 distinct physicians employed for varying amounts of time at the department. However, only 47 of them are employed for the entire time horizon, accounting for 4 142 requests. As schedules are generated monthly, our data is

also collected monthly, with each planning horizon spanning one month (4 or 5 weeks).

We run all three variants of our model implementing the different updating strategies for the satisfaction-based weights for each of the 24 months, resulting in 72 model itera-tions. Our model is implemented in CMPL 1.11.0, using the IBM ILOG CPLEX 12.7.1.0 solver on Xubuntu 16.04.3, kernel 4.10.0-40. The system is run as a VirtualBox 5.2.2 virtual machine with one virtual core of an Intel Core i5-4310M CPU and 4 GB of main memory. The software we use to conduct our experiments is written in Python 3 and can be found on GitHub1, along with the data required to run our experiments. For privacy reasons, we do not provide the raw data from our partner hospital, but only generated data derived from it (see below).

C ESA ESD

AP S 0.006532 0.006493 0.006503

difference to C −0.000039 −0.000029

difference in % −0.60 % −0.44 %

ASV 0.011292 0.011321 0.011191

difference to C +0.000029 −0.000101

difference in % +0.26 % −0.89 %

Table 1: Values of the performance indicators for all three satisfaction indicator updating strategies using the real-life data set

For our smoothing constant γ, we need to choose a value between 0 and 1. Higher values let the satisfaction indicator depend more strongly on more recent values. Lower values for γ will lead to a stronger dependence on older data for the satisfaction indicator.

However, since most physicians will likely remember their happiness with plans at most one or two months in the past, we recommend choosing values closer to 1 than to 0. For our case study, we select γ = 0.8. Preliminary testing with other values of γ indicated that the key results remain the same. We apply the raw data from our partner hospital to our model with all three different updating strategies and solve each run to optimality within 2 seconds. As only 47 physicians are employed continuously for the entire time horizon, our evaluations only take into account these physicians. For all other physicians

1https://github.com/chrisnig/long-term-fairness

newly joining our workforce in a month, we set their historic satisfaction to 1 but do not include them into the calculations for our performance indicators. We display the values of the AP S and ASV indicators for the C, ESA, and ESD strategies for the continuously employed physicians in table 1. Only very small differences below ±1 % between our different satisfaction indicator updating strategies can be observed. This can be explained by the structure of preferences inherent in the data. Our cooperation hospital uses a web-based software system to track physician preferences. Each physician can enter their own preferences and simultaneously see all preferences of other physicians with the same qualification. The system counts the existing preferences and compares them with the demand for each duty, showing the physician which days of duty have not been requested yet. While physicians are not prevented by the system from requesting a duty that another physician has already requested, workplace culture at the department dictates that you do not submit requests competing with other physicians in general. In light of this preference elicitation method, it is not surprising that for all our updating strategies, 97 % of the preferences in the data can be fulfilled. Our model can therefore rarely make a trade-off between different physicians when fulfilling preferences because there are no competing preferences. In conclusion, respecting long-term satisfaction using our satisfaction indicator is not overly effective when used without competing preferences.

Most real-world settings do not exhibit this feature. However, our model does not perform worse than strategies without long-term satisfaction. Our model respecting long-term fairness could therefore be used for scheduling problems with no conflicting preferences and will produce equally good or slightly better results than models without long-term fairness considerations. Nevertheless, in reality, competing preference requests are usual and so we show the superior behavior of our models for such situations next.

4.2 Generated data

In order to evaluate the differences between our updating strategies for satisfaction-based weights, we generate different instances of data for our model. We categorize our

generated data in terms of preference rate and conflict rate. We define the preference rate as the number of preferences divided by the number of days of physician availability.

As we only generate preferences for a duty and it is not possible for a physician to request two consecutive duties due to working time regulations, the preference rate for our generated data can be at most 50 %. A physician’s preference is in conflict with another preference when another physician has a preference for the same duty on the same date. The conflict rate is then defined as the number of preferences which are in conflict with at least one other preference divided by the total number of preferences.

Note that when generating data with a target conflict rate, this also bounds the preference rate. As the maximum number of total preferences is limited, it might be required to remove conflicting preferences to reach a target conflict rate.

Assumptions for our generated data are based on the first planning horizon of real-life data, which spans 5 weeks in the month of November 2015. Respecting absences, we derive a supply of 2 978 days of physician availability (including weekends, 7 days per week). This amount of days can be covered by 5 weeks·7 days2 978 ≈ 85 physicians employed full-time without any absences. Furthermore, we find that 20 % of physicians have two qualifications, whereas the rest has only one qualification. Based on these observations, we generate data for 85 physicians who are each employed for the entire time horizon of 24 months so that each has at least one qualification for a duty, with a 20 % probability for each physician to hold an additional qualification for a different duty. This ensures that all physicians are employed continuously and that we can calculate our performance indicators over all physicians. Additionally, completely removing absences ensures that all physicians have the same chance of a high satisfaction in a given time span. The same holds true for generating qualifications which do not change during the planning horizon. Furthermore, we generate data for 6 duties with a demand of one physician per day (unchanged from the real-life data).

Our preference generation routine is described by algorithm 1 and operates as follows.

On each day and for each physician, we assume a pon = 80 % probability that the physician