2.2 Sources of Data and Construction of the Data Set
2.2.2 Undertaking Data
Second source of the data set is the reported figures from undertakings and covers 18 months. These include information about
• Providing facility (origin of production)
3In the estimations in Chapter 3 and 5, only data spanning 18 months is used. 4[1−L][1−L12]Q
m= [1 +γ1L][1 +γ1L12]emwhere Q is quantity, m is month, e is the error term and L is the lag operator. This model is also called as the airline model followingCleveland and Tiao(1976) that study airlines.
5Three years in data exhibit more volatility than remaining 5. If volatile years are excluded, [(0,1,1)(0,1,1)] has the highest likelihood in all specifications. If volatile years are included N1P1 has different form [(1,0,0)(1,0,0)]. For the sake of consistency I use [(0,1,1)(0,1,1)] in all specifications.
• Type of the product
• Destination county
• Distance of the facility to the buyer’s location as reported by undertakings
• Identity and type of the customer (vertical relations, buyer’s line of business)
• Total revenue, quantity and transportation cost (if any) in each transaction
• Monthly unit price of some inputs
• Information on the rebates and discounts, if any is provided to consumers and how they are set.
One problem encountered with handling the dataset is a structural one. The data set is fairly old and the access is provided to the rawest format of data. This necessitates doing serious refinement where interaction with the original providers of data is not possible6.
Dataset contains revenue and volume for each transaction but not the price. There are some observations with zero quantity, zero revenue or no customer identity7. These transactions have been omitted. The transaction price is found by dividing total revenue to total quantity. If multiple transactions are reported at customer/location/provider/month level, these transactions are aggregated by taking quantity weighted average. To give coefficients a more intuitive interpretation, price is normalized with average price in the competitive period. There are two types of sales. Some sales are delivered to buyer’s location by the providers. Consequently, the revenue reported includes the freight as well. In some other sales, product is picked up by customers at the origin of production. Consequently, freight is not included in the invoiced revenue. To assure conformity between two types of sales, all sales have been converted into delivered sales. This has been preferred instead of the other way around for two reasons: First, delivered sales are more frequent in the dataset, even though the difference in frequency is small (8 %). Second, the assumption done by opting for delivered pricing is that any buyer is equally efficient in transporting the product with the producing firms. This is more reasonable then the assumption that the transportation cost reported by all firms are as they actually realize -not diverging from actual transportation cost.
Undertakings report transportation cost differently. For any consumer c located at l, provider j, month t and transaction i,
T Ci =vidjltjt (2.1)
6However, this have been considerably mitigated by the helpful inputs of the experts at the TCA.
would hold whereT Ci is the total transportation cost incurred at the transactioni,
vi is the volume of transaction,dlj is the distance between providerj and locationl
andtjtis the transportation cost for providerjper unit of volume per unit distance in
montht. Some undertakings reportT Ci,vi, anddjl; some others reportvialong with
djl andtjt; and some others reportvi,djl,tjt. The conversion of mill price has been
done by benefiting from transportation cost calculations of one of the firms which reportstjt for each month. As transportation is highly standardized, I do not expect
monthly unit transportation cost to differ greatly across undertakings, hence use that monthly figure for all undertakings tjt =t¯jt =tt
. Following transformation is used in converting prices that do not include transportation cost to calculate counterfactual delivered price.
pi =
REVi+vidjltt
vi
(2.2)
whereREVi refers to revenue reported from transactioni. Price is then the ratio of
revenue including actual or potential freight to total quantity8.
Some of the revenue / volume figures are flawed, as they suggest prices equal to zero or infinity9. In some cases it is straightforward to detect the source of anomaly,
e.g. skipping / adding a decimal, or reporting the transportation cost as the price. In these cases corrections have been done. Regarding less straightforward cases, following rule of thumb is applied: i) If there is one other entry for that customer in the same month, that value is taken as the price of both shipments. ii) If there are multiple, the average price of non-anomalous entries is used as the price of the anomalous entry. iii) If there are none, the average of first month before and first month after is taken as the price. iv) If the anomalous entry belongs to the last month, the average of two preceding months is used; if it belongs to the first month, the average of two proceeding months are used. Inspection of the data showed that sale reports of one undertaking are inaccurate for one month. For these cases, the price of the shipment is taken equal to the price in proceeding month10.
Regarding the customer location, in some cases no province information is present. These transactions have been omitted11. For some other cases, province is provided
without accompanying county information. These transactions are taken to be destined to the administrative centre of the province12. In some cases, county names
8For delivered sales of some undertakings, it is difficult to understand whether the reported revenue already includes the transportation cost or transportation cost should be added on
top of that revenue figure. For the undertakings where there is a doubt, to understand
whether the revenue already includes the freight or not, the transactions are analysed on customer/location/provider basis to see if any one customer demanded both delivered and mill sales from a certain location, in the same month. In this case it becomes easier to understand whether the reported figures already include freight. A separate log outlining the decision making process is kept in the process, which may be provided upon request.
9The observations omitted this sort is % 0.3 of the sample. 10The observations in this category is % 0.1 of the sample. 11The observations omitted this sort is % 0.4 of the sample. 12The observations in this category is % 1.1 of the sample.
are reported as province names or are matched with incorrect provinces. These cases have been corrected with the assumption that the most specific information, the county, is accurate.
The dataset includes customer identity information. This information is first used in distinguishing transactions within a vertically integrated undertaking, or transactions across rival undertakings (consumers). These two types of transactions are omitted from the analysis and the attention is confined to what we may think of as “commercial sales”, sales destined to third parties. Second, I use customer identity information in approximating customer size. In the assessment of the customer size, transactions from all providers are taken into account. Since, the same customers are not always registered under the same name across different providers, first task is harmonizing customer identities. In many cases, the difference is simply related to using abbreviations or differences in spelling. For these cases, taking name, province, county, customer type information together it is not difficult to identify the same customers registered under different names13. Customers with
different identities are taken independent from each other14.
Finally, we have information about buyer’s line of business, and presence of any vertical relation between the buyer and provider. In this work, to analyse the potential impact of vertical relations on price, an indicator variable, which marks the transactions between provider and a vertically related buyer, is constructed.