Validation of Data and Variable Selection in the Context of DEA

CHAPTER V: OPERATIONALISATION

V.3 Sampling Frame, Dataset and Variable Selection

V.3.3 Validation of Data and Variable Selection in the Context of DEA

In this section, we justify and validate the definition and selection of the dataset and variables for carrying out performance benchmarking by means of DEA.

V.3.3.1 Data accuracy

Inaccurate data regarding a DMU can have an impact on efficiency scores depending on whether it makes incorrectly the DMU in question efficient or inefficient. Collected data for all DMUs must therefore be as accurate as possible. This is why we used various data sources and crosschecked information provided by each of them. In case of conflicting information, we recorded data from primary sources. We also relied on our expert understanding of container terminal operations to review and correct reported data that looked inconsistent with the size and operational arrangements of the container terminals in the sample.

We also checked data and variable selection against congestion. In economics, congestion takes place when reductions (increases) in one or more inputs generate an increase (decrease) in one or more outputs, for instance when an increase of the number of stevedores and other port labour is associated with lower throughput and production levels. Much of the problems associated with congestion are attributable to the choice of input and output variables. The DEA literature provides several models for measuring congestion (see for instance Brocket et al., 1984 and Cooper et al., 2004) but in this study, none of these models was needed since both input and output variables have been selected in ways that avoid the occurrence of congestion.

V.3.3.2 Homogeneity

As discussed in Chapter III, the variations in traffic and operational arrangements between world container ports and terminals may breach the requirement of homogeneity across sampled terminals. To reduce the lack of homogeneity, we defined and selected terminal DMUs according to their operational and technology features as specified in the previous sections. Even though, instances of non-homogeneity may occur in the dataset. For instance, looking at the summary statistics in Table 17, the standard deviation for the yard-stacking index is higher than the mean, implying that the sample is not very homogenous. This is simply because there are large terminals in the sample alongside small ones, each with a different set of crane equipment and handling configuration. In either case, we additionally apply returns-to-scale (DEA-BCC) and sensitivity (e.g. measure-specific DEA) models in order to identify different scale properties and performance layers of the production frontier.

V.3.3.3 Number of DMUs

In DEA, the number of units in the dataset should be greater than the number of inputs and outputs combined to ensure sufficient degrees of freedom (see for instance Dyson et

al. (1990) and Bowlin (1998) for a review of this aspect). A general rule of thumb is that three (3) DMUs are needed for each input and output variable. In our case, the use of composite indicators such as the STS-crane index and the yard-stacking index helped reducing the number of the input/output set. When DEA cross-sectional analysis is applied, the ratio of DMUs (60) to the number of inputs and outputs (8) is 7.5 (>3), which ensures sufficient degrees of freedom. When DEA panel data analysis is applied, the number of DMUs is increased to 420 (60 terminals×7yeras) which increases the ratio of DMUs to the number of variables to 52.5 (>3).

V.3.3.4 Data scaling

Whenever possible, data should be scaled down so that input-output levels do not take excessively large values and reduce potential round-off errors in solving DEA models.

This is why we recorded both terminal throughput and area in 1000 TEUs and 1000 m², respectively.

V.3.3.5 Exclusivity and exhaustiveness

The property of exclusivity and exclusiveness requires, subject to the exogeneity of the variables under consideration, that only the inputs selected should influence the output levels and that this influence should only be limited to the selected output variables. It is important to recognise this property because in many instances the output produced or the input utilised may be an assigned task that is exogenously determined.

To establish exclusivity and exhaustiveness between variables, we first narrow down input and output variables of the model by identifying the type of performance being assed (operational efficiency) and the spatial and operational scope of the DMU under study (container-terminal). We then draw from expert analysis and the results of IDEF0 modelling to include the input variables that capture all container terminal operational resources and the output variables that account for all the outcome of terminal operations.

V.3.3.6 Positivity

Generally, the DEA formulation requires that the input and output variables be positive or greater than zero. In Chapter III, we discussed the problems related to zero values under DEA and in the context of container-port operations. In our case, all input and output values are positive and no further treatment is necessary.

V.3.3.7 Isotonicity

To satisfy the isotonicity premise, we carried out a Pearson correlation test. The correlation coefficients ( ²) in table 18 show a p-value of less than 0.05 ( <0.05) across all inter-correlations, which satisfies the isotonicity requirement. When relevant, some

variables are reported in ways that satisfy the isotonicity requirement. For instance, the output variable cargo dwell time, which is used later in the analysis, is reported as a reciprocal of the average number of days during which containers remain in the yard.

Table 18: Correlation coefficients between input and output variables

Variable Terminal throughput

Terminal area ²=0.486 ( =0.0001) Maximum draft ²=0.9678 ( =0.0001) Length overall ²=0.7361 ( =0.0001) STS crane index ²=0.9199 (p=0.0001) Yard stacking index ²=0.9372 ( =0.0001) Internal trucks ²=0.9124 ( =0.0001)

Gates ²=0.4225 ( =0.0001)

Throughput ²=0.4897 ( =0.0001)

V.4 Chapter Conclusion

Following the design of the research approach in the previous chapter, this chapter deals with the operationalisation and formalisation of the analytical methods and techniques selected for this study; as well as the sampling frame, data collection, and variable selection.

We started first by mapping container terminals’ flow processes through IDEF0 modelling. Following the specification of a top-level diagram for container terminal operations and its corresponding ICOM semantics, the parent function is decomposed into three linked functions, each reflecting the operations of a terminal site or sub-system. Further decomposition by operational and process flow arrangements resulted into three IDEF0 models corresponding to import, export, and transhipment flows, respectively. The results of IDEF0 modelling were later used to identify the spatial scope of security regulations and define the relevant variables for benchmarking and productivity change analyses.

Regarding the formalisation of the analytical models, we formulated several DEA models, namely the conventional slack-based model, the measure specific model, and the supply chain model; and justified the benefit of applying both contemporaneous and inter-temporal analyses. We then specified the Malmquist Productivity Index (MPI) and decompose it into three sources of efficiency; technical efficiency, scale efficiency, and technological change. In order to measure productivity change before

and after security implementation, we applied a step-wise MPI in terms of multi-year and regulatory-run assessments.

Starting with an original sample of 127 terminals from 43 ports and ending up with a final sample of 60 container terminals belonging to 39 ports, we defined the sampling frame and procedures with the objective of achieving homogeneity and operational consistency. We then relied on the results of IDEF0 modelling and previous discussion on container-port operations and security regulations to define the relevant variables (8 primary variables and 3 additional variables) and the time frame (the period from 2000 till 2006) for the study, the combination of which has resulted into a panel dataset of 420 terminal-years or DMUs. We described the methods and sources of data collection and methodology. We then validated variable selection in view of DEA analysis, including such aspects as number of DMUs, data scaling, homogeneity, exclusivity and exhaustiveness, positivity, and isotonicity.

In document A Benchmarking Study of the Impacts of Security Regulations on Container Port Efficiency (Page 111-115)