3.4 Methodology and Analytics
3.4.1 Behavioral Measures
In this section, we define a number of behavioral measures that can be computed for a single respondent. We had the following goals and preferences in mind while defining these measures:
First, preferably the measures should be based on the choices made by an individual respondent, since we are looking for a way to describe the behavior of individuals.
Second, the measures should be based on information available or reasonably obtainable by a public transport operator, such as information on delays and disrup- tions (i.e. the realized timetable), the utilization rate of the vehicles (by combining smart card data and a rolling stock schedule) or the information communicated to passengers via apps or websites.
Third, the measures should be as simple as possible. Although advanced time series techniques could be applied, we prefer measures that look back a single round for reasons of simplicity.
Fourth and finally, the measures should have a scale that is easy to interpret, preferably [0, 1].
In order to introduce the measures we picked based on these preferences, we will use the notation introduced in Section 3.3.2. Since we want to measure the behavior of a single respondent, we first assume that the respondent number j is fixed, as are the choice vector C and the satisfaction vector S. The measures are defined on either one or both of these vectors and a window k. The window restricts the number of observations that are used, so if we want to compute a certain measure for only the first ten rounds, we use a k of 10. Furthermore, while some measures such as the number of switches can be computed without specific knowledge of the respondent
group, others require the specific group of j, i.e. qj, δjand ρj. Only the measures that
require knowledge of the respondent group will be parametrized by j.
The choices of a certain respondent j that were recorded by the online environment
are stored in a vector C = [c1, c2, . . . , c20] and the satisfaction scores in a vector
S = [s1, s2, . . . , s20]. Formally, the set C of all possibly choice vectors is defined as
C= {1, 2, . . . , 7}20. In addition, a vector X = [x1, x2, . . . x19]indicating whether the next
choice will be different from the current one. Formally xr=
1 if cr= cr+1
0 if cr= cr+1
Note that X solely depends on C and is only defined in order to simplify the mathematical expression of some of our measures.
In order to keep the measures as simple as possible, we will mostly focus on the switching behavior of a respondent.
A central measure for the behavior of a respondent is the probability the respondent switches. This can be computed by dividing the number of switches by the number of rounds. We will call this measure sp. This measure is formally defined as follows:
3.4 Methodology and Analytics 57
Table 3.3: Overview of behavioral measures used for analysis
Measure and Description of measure
short name
Crowd avoidance: ca
The probability of switching in case of high crowding or to remain in case of low crowding during the previous round. Formally, it can be viewed as Pr((xi=1| crj(i, C) = 2) ∪ (xi=0| crj(i, C) = 0))
Reactive switching coefficient: rsc
The probability of switching probability when the satisfaction was either 4 or 5 or to remain if the satisfaction was either 1 or 2 during the previous round.
Formally it can be written as Pr((xi=1|si<3) ∪ (xi=0|si>3)).
Adjustive reactive switching coefficient: rsca
The probability of switching if the satisfaction during the previous round was higher than the average observed satisfaction of respon- dent j. Its formal definition is the same as the definition of rsc, where the constant 3 is replaced with the observed average.
Delay responsive switching: dr
The probability of switching when the delay of the previous round was higher than the average witnessed delay or to remain at the previous travel option otherwise. Formally it can be regarded as: Pr((xi=1| trdj(i, ci) avgtrdj(i))∪ (xi=0| trdj(i, ci) <avgtrdj(i)))
Last switch: ls
The index of the round the last switch occurred divided by the total number of rounds considered. Indicates the time a respondent sticks to a single choice.
Minimum time for a choice set cardinality of 2:
mtsc2
The index of the last round after which only two choices are con- sidered. Indicates the time someone restricts their choices to two options.
Average satisfaction:
avgS The average of the satisfaction scores reported by the respondent.
Maximum streak: maxstr
The longest sequence of rounds where no switch was observed. Is an indication of the sequential stability of a respondents choices.
Sensitivity to information: si
Measures whether a respondent seems to ignore information com- pletely or whether a respondent always choosen for an option with a certain type of information. Can only be computed in the second phase of the experiment.
56 Time Choice Data for Public Transport Optimization
3.4.1 Behavioral Measures
In this section, we define a number of behavioral measures that can be computed for a single respondent. We had the following goals and preferences in mind while defining these measures:
First, preferably the measures should be based on the choices made by an individual respondent, since we are looking for a way to describe the behavior of individuals.
Second, the measures should be based on information available or reasonably obtainable by a public transport operator, such as information on delays and disrup- tions (i.e. the realized timetable), the utilization rate of the vehicles (by combining smart card data and a rolling stock schedule) or the information communicated to passengers via apps or websites.
Third, the measures should be as simple as possible. Although advanced time series techniques could be applied, we prefer measures that look back a single round for reasons of simplicity.
Fourth and finally, the measures should have a scale that is easy to interpret, preferably [0, 1].
In order to introduce the measures we picked based on these preferences, we will use the notation introduced in Section 3.3.2. Since we want to measure the behavior of a single respondent, we first assume that the respondent number j is fixed, as are the choice vector C and the satisfaction vector S. The measures are defined on either one or both of these vectors and a window k. The window restricts the number of observations that are used, so if we want to compute a certain measure for only the first ten rounds, we use a k of 10. Furthermore, while some measures such as the number of switches can be computed without specific knowledge of the respondent
group, others require the specific group of j, i.e. qj, δjand ρj. Only the measures that
require knowledge of the respondent group will be parametrized by j.
The choices of a certain respondent j that were recorded by the online environment
are stored in a vector C = [c1, c2, . . . , c20] and the satisfaction scores in a vector
S = [s1, s2, . . . , s20]. Formally, the set C of all possibly choice vectors is defined as
C= {1, 2, . . . , 7}20. In addition, a vector X = [x1, x2, . . . x19]indicating whether the next
choice will be different from the current one. Formally
xr=
1 if cr= cr+1
0 if cr= cr+1
Note that X solely depends on C and is only defined in order to simplify the mathematical expression of some of our measures.
In order to keep the measures as simple as possible, we will mostly focus on the switching behavior of a respondent.
A central measure for the behavior of a respondent is the probability the respondent switches. This can be computed by dividing the number of switches by the number of rounds. We will call this measure sp. This measure is formally defined as follows:
3.4 Methodology and Analytics 57
Table 3.3: Overview of behavioral measures used for analysis
Measure and Description of measure
short name
Crowd avoidance: ca
The probability of switching in case of high crowding or to remain in case of low crowding during the previous round. Formally, it can be viewed as Pr((xi=1| crj(i, C) = 2) ∪ (xi=0| crj(i, C) = 0))
Reactive switching coefficient: rsc
The probability of switching probability when the satisfaction was either 4 or 5 or to remain if the satisfaction was either 1 or 2 during the previous round.
Formally it can be written as Pr((xi=1|si<3) ∪ (xi=0|si>3)).
Adjustive reactive switching coefficient: rsca
The probability of switching if the satisfaction during the previous round was higher than the average observed satisfaction of respon- dent j. Its formal definition is the same as the definition of rsc, where the constant 3 is replaced with the observed average.
Delay responsive switching: dr
The probability of switching when the delay of the previous round was higher than the average witnessed delay or to remain at the previous travel option otherwise. Formally it can be regarded as: Pr((xi=1| trdj(i, ci) avgtrdj(i))∪ (xi=0| trdj(i, ci) <avgtrdj(i)))
Last switch: ls
The index of the round the last switch occurred divided by the total number of rounds considered. Indicates the time a respondent sticks to a single choice.
Minimum time for a choice set cardinality of 2:
mtsc2
The index of the last round after which only two choices are con- sidered. Indicates the time someone restricts their choices to two options.
Average satisfaction:
avgS The average of the satisfaction scores reported by the respondent.
Maximum streak: maxstr
The longest sequence of rounds where no switch was observed. Is an indication of the sequential stability of a respondents choices.
Sensitivity to information: si
Measures whether a respondent seems to ignore information com- pletely or whether a respondent always choosen for an option with a certain type of information. Can only be computed in the second phase of the experiment.
58 Time Choice Data for Public Transport Optimization sp(C, k) := 1 k k r=1 xr
A primary design principle behind many of our proposed behavioral measures is that we want to capture whether a certain event during one round of the experiment will have an influence on the switching behavior of the respondent for its choice during the next round. Let us consider the probability that an individual will switch in round i given a choice set, which can be written as
Pr(xi=1) = sp(C, i)
Now suppose that event E occurred during round i. A typical way to detect whether this event affects the switching behavior is to check whether switching is independent from the event E, i.e. if
Pr(xi=1|E) = Pr(xi=1)
In case of inequality, it is also useful to know whether Pr(xi=1|E) is greater than or
equal to Pr(xi=1).
Through application of Bayes’ theorem, we can derive
Pr(xi=1|E) =
Pr(E|xi=1) Pr(xi=1)
Pr(E) =
Pr(E|xi=1)
Pr(E) sp(C, i)
By application of Bayes’ theorem we can see that when Pr(xi=1|E) > Pr(xi=1)
holds, we should also observe that the fraction of Pr(E|xi =1) divided by Pr(E) is
greater than 1. When Pr(xi=1|E) is smaller than Pr(xi=1), the fraction should be
strictly smaller than 1.
Thus, this fraction gives useful information about the influence of an event on the
switching probability. However, the scale of this value is [0, sp−1]and thus dependent
on sp. As such it is not suitable to compare different respondents, since the base rate sp will likely be different for these respondents. For that reason, we prefer to use
measures based on Pr(xi=1|E) instead. We developed the following measures that
fit this general patterns, which are presented in the first rows of Table 3.3.
In addition to the effect of a certain event on the switching probability, we also defined a number of measures on other aspects of behavior. The following measures all have a natural maximum (e.g. the number of rounds considered) which can be used to normalize them to the [0, 1] scale. These measures are presented in the bottoms rows of Table 3.3.
The formal definitions of these measures are discussed in Section 3.4.2.
3.4 Methodology and Analytics 59