6.3 Data Description and Hypotheses
6.3.1 Data description
All data were obtained from the TV channel VOX, which broadcasts “The Perfect Dinner,” by collecting data from their website and re-watching episodes to obtain missing data3. The ratings for the show during the time period investigated (2006-
2011) were kindly provided by VOX. We excluded the first 24 weeks of our dataset for estimating all but one specification of the econometric models, where we test for the effect of the initial 24 weeks on voting behavior. They are excluded as they represent the time span between the filming and the broadcast of the first show. Since one of the factors we want to measure is the impact of past evaluation behavior, we must exclude this time span in all but one specification because there is no possibility for contestants to have any knowledge about previous voting behavior prior to the first broadcast. Without the first 24 weeks, the final sample consists of 3735 cooking assessments observed in 237 rounds of games. Including the first 24 weeks leads to 4322 observations.
Table 6.1 gives an overview of the descriptive statistics for the variables used in the econometric analysis, discussed in more detail in section 4. The points given to the cook by the other contestants are set as a dependent variable (“Points”). Its distribution is skewed to left which is due to the very rare occurrence of poor evaluations. The mode and median both take the value of eight and hence are rel- atively close to the highest possible value. Table 6.2 displays the distribution of “Points” for the estimation sample. The weekday (Monday-Friday) the contestant to be evaluated is performing enters the econometric model as a set of four binary indicators each indicating one weekday (Tuesday-Friday). Monday is the base cat- egory and therefore does not appear in the table. Since the time schedule does not vary across rounds, the weekday and the order of cooking are equivalent. That is cooking on Monday implies cooking first, cooking on Tuesday implies second, etc. “Evaluator Already Cooked” is equal to one if a participant has already cooked at the time he must evaluate a dinner and equals zero otherwise. “number of ingre- dients” represents the absolute number of ingredients used in a diner. “number of
3Part of our data set is available in a disaggregated form on the German TV channel VOX broad-
casting homepage of “Das perfekte Dinner”:http://www.vox.de/kochen/das-perfekte-dinner/ details.
ingredients2/100” is the square of “number of ingredients”, divided by 100. This
factor of division is used to ensure that the explanatory variables are numerically of similar magnitude, rendering the estimation procedure more stable and the co- efficients more easily comparable. “Level” measures the difficulty level, “price” the price level of menu4. “Av. evaluation level,” reports the average points given during the last 24 weeks before a contest. “Av. share viewers” represents the tv-market share of the respective show. The minimum equals 7.85%, while the maximum is 12.23%. For example, if 10 million people were watching TV on Monday during the airing of “The Perfect Dinner”, a 10% share implies that 1 Million people were watching the show. A five percentage point difference might not seem large, but for a show in the relevant market segment, this represents the difference between a mediocre and a successful show. “Population” measures the size of the popula- tion in the town in which the show is being filmed in millions of inhabitants. For example, the largest city in our sample with 8,1 million inhabitants has a value of 8,1, the smallest with 8000 inhabitants a value of 0,008. “Foreign” accounts for the filming of the show in locations outside of Germany, the value being 1 if it is outside and 0 if the show takes place in Germany. “Time” is the number of days since the first recording of the show divided by 1000. “time2/1000” is the square of “time”
divided by 1000 5. The division factor of 1000 has been chosen in order to ensure better comparability between the coefficients in our analysis later on. Furthermore, we report the descriptive statistics on a range of individual characteristics such as gender, migration status, age, profession and hair color. Besides age, all individual characteristics are dummy variables. ”dissimilar” measures the social dissimilarity of the evaluator and the cook, based on the range of individual social character- istics, listed in table 6.1. The closer the value is to one, the less similar the two contestants are. In computational terms, “dissimilar” is the rescaled Mahalanobis distance between the vectors of two participants’ socioeconomic characteristics6.
As a scaled Mahalanobis distance the variable dissimilarity is constructed from weighted sum of squared (and cross-products of) deviations in the considered so- cioeconomic characteristics. Hence it is meant to capture a potential effect of overall socioeconomic proximity between cook and evaluator. One may, however, question a
4The difficulty level and the price level are determined by experts on the website http://www.
kochbar.de.
5The division factor of 1000 is chosen in order to ensure similar magnitudes of the explanatory
variables; see above.
6The Mahalanobis distance (MDij) is defined q(x
i− xj)0V (x)−1(xi− xj) with xi and xj de-
noting the column vectors of socioeconomic characteristics of individuals i and j, respectively, and V(x) denoting the (estimated) variance-covariance matrix of socioeconomic characteristics x. The variable dissimilar is defined MDij/ max (MD). That is, dissimilar is normalized to one for the
most differential pair of individuals in the sample, while it takes the value of zero for a pair of individuals who share all considered socioeconomic characteristicshe Mahalanobis
general effect of dissimilarity and argue that it is rather dissimilarity with respect to specific characteristics that matters. Moreover, similarity in these relevant charac- teristics may not have a homogeneous effect on the depended variable. Dissimilarity with respect to age for example, might result in less complaisant evaluations while heterogeneousness with respect to gender might result in more positive evaluations. In order to address this argument, we also estimated specifications that separately include squared deviations in specific socioeconomic characteristics as explanatory variables. As most characteristics enter the model as indicators, these variables - with the exception of age – are dummies indicating a difference with respect the considered characteristic. Because of (near) collinearity, the data does not allow for estimating a model with full set squared deviations and we focus on a subset of characteristics (gender, immigration status, age, hair color), see Table 6.1.
Table 6.1: Descriptive Statistics
Mean S.D. Median Min Max
Points 7.616 1.363 8 1 10 cooking order Second 0.203 0.402 0 0 1 Third 0.202 0.401 0 0 1 Fourth 0.200 0.400 0 0 1 fifth 0.193 0.395 0 0 1
evaluator already cooked 0.500 0.500 0 0 1
number of ingredients 52.633 16.945 51 16 134
number of ingredients2/100 30.573 21.594 26.01 2.56 179.56
level 1.874.675 .3459764 2 1 3
price 1.859.798 .3616571 2 1 3
av. evaluation level
7.597 0.272 7.552 7.073 8.123
(previous 24 weeks) av. share viewers
9.470 1.564 8.332 7.846 12.234
(previous 24 weeks)
population (city of venue) 0.989 1.099 0.580 0.008 8.100
foreign (city of venue) 0.035 0.185 0 0 1
time 1.110 0.491 1.206 0.273 1.908
time2/1000 1.474 1.094 1.454 0.075 3.640
female 0.534 0.499 1 0 1 immigrant 0.074 0.263 0 0 1 age 38.284 11.177 38 18 71 student 0.070 0.255 0 0 1 civil servant 0.032 0.175 0 0 1 artist 0.071 0.256 0 0 1 entrepreneur 0.210 0.408 0 0 1 pensioner 0.006 0.077 0 0 1 employee 0.533 0.499 1 0 1 academic 0.089 0.285 0 0 1 trainee 0.009 0.096 0 0 1 pupil 0.009 0.092 0 0 1 blond 0.377 0.485 0 0 1 dissimilarity overall dissimilarity 0.344 0.162 0.337 0 1 (Mahalanobis dist.)
female (squared diff.) 0.593 0.491 1 0 1
immigrant (squared diff.) 0.133 0.340 0 0 1
age (squared diff.) 0.262 0.341 0.121 0 2.500
blond (squared diff.) 0.471 0.499 0 0 1
Table 6.2: Distribution of dependent variable (Points)
value 1 2 3 4 5 6 7 8 9 10
abs. frequency 1 2 17 53 152 450 1031 1073 643 313
cum. percentage 0.03 0.08 0.54 1.95 6.02 18.07 45.68 74.4 91.62 100