Chapter 3 A Chain Event Graph
3.1.2 The Data Set
The data set has 476 tourists of which 402 arrived in Dunedin by cruise ships and the 74 others arrived by other ways such as an aircraft, a car or a bus. In my analyses, I restrict the attention to the cruise passengers for two reasons. Methodologically the small sample of non-cruise tourists does not allow us to obtain robust results because there are very few individuals associated with each category of non-cruise tourists. In terms of domain interests, this decision is also justified because the goal is to understand the train booking process associated with the cruise passengers and not all type of tourists.
During the interview a tourist could indicate up to17different type of PCs. How- ever, I have decided to represent this information as a binary variable distinguish- ing between a tourist who goes to a PC under the control of the cruise company (s- Ship) or visits any other type of PCs (o- Others). In this case, the category Ship represents only one type of PC - a PC in the cruise ship or in the cruise company’s website - and the category Others aggregates the other 16 non-Ship types of PCs.
per category and a huge number of parameters to learn. These problem would get particularly worst if effects of variable interactions were modelled. So adopting a binary variable enables us to construct CEG models that are not only simpler and consequently more easily interpreted by decision makers but is also more robust. Table 3.1 shows us that in each stageiof the PC sequence the numbers of tourists who visited a PC Ship or a PC Others are almost the same order of magnitude. The maximum number of clients visiting a PC Others at any stageiis not greater than 200. Tourists also visit the 16 different types of PCs included in the category Others in a sparse way. So disaggregating them into subcategories would cause some instability in the results since each subcategory would have only 12 tourists on average. The binary simplification is also supported by the domain particularity. This happens because the train company have different marketing strategies for PCs under the control of a cruise company and the others PCs where a potential costumer can be expected to have less restrictions to pursue their objectives.
Category P C1 P C2 P C3 P C4 P C5 P Ca P Cb P Cc
Ship 223 198 50 16 4 209 196 66
Other 179 191 85 23 1 193 193 69
Table 3.1: Number of clients that visit each Point of Contact. P Ci, i= 1, . . . ,5, is
theith PC visited when it is considered a sequence of five PCs. P Cj, j =a, b, c is the
jth PC visited when it is considered only the last three visited PCs. If a client visited
less than four PCs, we then have that: P Ca = P C1, P Cb =P C2, P Cc =P C3. If a
client went to four PCs, we then have that: P Ca=P C2,P Cb =P C3,P Cc =P C4. If
a client went to five PCs, we then have that: P Ca=P C3,P Cb =P C4,P Cc =P C5.
Table 3.2 reveals that most tourists (87%) prefer booking a train when they are visiting their second or third PC. This means that although data is trustworthy the fourth and fifth stages in the PC sequence do not have sufficient individuals to support reliable results. Note that this small number of clients should also be split according their previous visited PCs. To avoid this technical problem, I therefore restricted the PC sequence to three visited PCs. This implies that
I will use only the last three PCs visited by a client who went to four or more different PCs. Observe that there is no change if a client visited less than four PCs: P Ca =P C1, P Cb = P C2, P Cc =P C3. However, if a client went to four
PCs, then P Ca, P Cb and P Cc are, respectively, the second (P C2), third (P C3)
and fourth (P C4) PCs that he visited. Note that in this caseP C1 is discarded. If
a client went to five PCs, P Ca, P Cb and P Cc are, respectively, the third (P C3),
fourth (P C4) and fifth (P C5) PCs that he visited. Here we do not considerP C1
and P C2. An analogous transformation is also applied to variable F.
Category F1 F2 F3 F4 F5 Fa Fb Fc
Booked 13 254 96 34 5 13 254 135
Searching 389 135 39 5 0 389 135 0
Table 3.2: Number of clients that booked a train over time. Fi, i= 1, . . . ,5, indicates
if the client booked a train during his ith visit when it is considered a sequence of five PCs. P Cj, j=a, b, c indicates if the client booked a train during hisjth visit when it is
considered only the last three PCs. If a client visited less than four PCs, we then have that: Fa = F1, Fb = F2, Fc = F3. If a client went to four PCs, we then have that:
Fa=F2,Fb =F3,Fc =F4. If a client went to five PCs, we then have that: Fa=F3,
Fb =F4,Fc =F5.
The demographic variables are defined as follows:
1. Country (C) - a binary variable differentiating between passengers from Aus- tralia or New Zealand (l- Local), or other world regions (o - overseas); 2. Age (A) - a binary variable differentiating between young passengers (y-
Young) (≤45), or mature passengers (m- Mature) (≥46); and
3. Visit (V) - a categorical variable differentiating passengers with a weak (w) (at maximum 1 cruise journey) , a moderate (m) (between 2 and 5 cruise journeys) or a strong (s) (6 or more cruise journeys) tendency to enjoy cruise ships.
all, the variables Country and Visit have a well-balanced number of individuals per category. This does not happen with the variable Age where the great majority of tourists (67%) are mature. In particular, Australian or New Zealand tourists prefer local trains (57%) whilst overseas individuals have a stronger disposition (64%) to buy a train package organised by the cruise company. The proportion (56%) of young tourists who chose public trains is almost the same to that of mature tourists who took the cruise-organised trains (55%). Passengers with low inclina- tion for cruise journey do not apparently have any clear preference between train options 1 and 2. However, passengers with a moderate and a strong propensity for cruise journeys prefer to take a local train (58%) and a cruise-organised train (61%), respectively.
Train Country Age Visit
Option Local Overseas Young Mature Weak Moderate Strong
1 133 62 74 121 55 81 59
2 97 110 57 150 57 58 92
Total 230 172 131 271 112 139 151
Table 3.3: Number of passengers that booked each type of train according to their nationality, age and number of prior cruise travels.