• No results found

Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics

N/A
N/A
Protected

Academic year: 2021

Share "Integration of Registers and Survey-based Data in the Production of Agricultural and Forestry Economics Statistics"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Integration of Registers and Survey-based Data in

the Production of Agricultural and Forestry

Economics Statistics

Paavo Väisänen,

Statistics Finland, e-mail: [email protected]

Abstract

The agricultural enterprise and income statistics describe farm income subject to taxation and the expenditure, assets and debts of farms, as well as changes in the fixed assets of farms. Income data were collected from the Tax Register and production sectors were received form the Farm Register. The challenge was to combine the two administrative registers with different register units into a statistical register in which a new statistical unit, an agricultural enterprise, was defined for the combined register. The definition of the new statistical unit was formed to correspond as well as possible with real units of the farm population. In addition, a statistical survey of agricultural enterprises was used to collect more detailed data from the sales of agricultural products and expenses. An exceptional feature was large item non-response because large farms seemed to have difficulties in answering a questionnaire. The statistical register of agricultural enterprises was used to complete partially filled questionnaires. In most cases, simple ratio imputation was used but for farmers with several different products multiple imputation was applied. After imputation, the item nonresponse group was accepted as respondents in the final data, when response rates rose to 76% - 79%. The response behaviour of nonresponse was studied using the variables of the statistical register. Sample design contained a rotating panel in which a half of the sample remained the same in successive years. The data collection is automated by building the questionnaire into the bookkeeping software that decreases the farmers' response burden in future, and automation of sending a questionnaire via the Internet is expected to enhance the number of respondents. In estimation processes, the register data were used as an auxiliary information to adjust nonresponse bias and to increase accuracy.

Keywords: Agricultural statistics, statistical register, sample survey

1. Introduction

.

The exploitation of administrative data in statistics production has been increasing at Statistics Finland over the past decade. For example, the Population Register, Business Register, Register of Completed Education and Degrees, Tax Register, Farm Register, etc., have been used as data sources for statistics production. One of the main problems has been the suitability of a register unit as a survey or statistical unit, and in the most cases register units need some modification for statistical purposes as has become evident when data have been combined from the Tax Register, the Farm Register and a statistical survey of farm enterprises for Agricultural and Forestry Economics Statistics (AFES). The statistical units are agricultural enterprises and the

(2)

data collection concerns farming income subject to taxation, expenditure, assets and debts of farms as well as changes in the fixed assets of farms. The AFES describe and analyze the formation of income from agricultural economic activity. The survey population of the AFES consists of the active farms taxed under the Farm Income Tax Act and the statistical register of the agricultural enterprises constitutes the complete list of the population.

The production of the new AFES started in 2004, and it replaced two previous sets of statistics, the Agricultural Enterprise and Income Statistics and the Income and Taxation Statistics of Farms which had been compiled unchanged since 1973 (Väisänen, 1995) but the history of producing corresponding statistics on agricultural income and expenses originate behind 30 years. Several changes in agricultural tax legislation and changes in source data have made comparisons between years or time series analysis difficult. Combining the registers of the Ministry of Agriculture and Forestry and the Tax Register forms the target population of the AFES. Changes in the taxation data made it possible to compile statistics on farm income from total data. The previous method was based on a sample drawn from the Farm Register and the data were collected from the sample using statistical forms into which the Tax Authorities entered the taxable income of the selected farms. The reform of agricultural taxation opened the possibility to utilize the taxation data of all units taxed under the Farm Income Tax Act. Since 2004, the tax forms of all farms have been entered into the database of the Tax Administration and the data are now available for the years 2004, 2005 and 2006. During this period the average number of agricultural taxation units has varied between 143,900 and 145,000 and at the same time the Farm Register had contained slightly under 70,000 farms. The difference of the numbers in units between these registers was due to the fact that the Tax Register contains all farms taxed according the Farm Income Tax Act and 74,000 - 75,000 are passive farms, which had no agricultural activities or production but they had to fill an agriculture taxation form. Passive farms have, for instance, rent income from arable land or they were hobby farms. In 2005, the Farm Register contained 2,687 farms, which were not taxed by the Farm Income Tax Act (Maliniemi, 2008:2). These were active farms, which in taxation were kept as hobby farms and some of farms were taxed according business taxation. Since 2006, the target population has, in addition, contained forest owners living in cities (Maliniemi, 2008:1).

The biggest challenge was to combine the two administrative registers with different register units into a statistical register in which a new statistical unit, an agricultural enterprise, was defined for the combined register. The Farm Register uses a farm identification number but the Tax Register uses an enterprise or personal identification code. Matching variables were farm identification number, person identification code and enterprise code, which were received from a customer register of the Ministry of Agriculture and Forestry. The customer register and the Farm Register were matched by using the farm numbers resulting an assisting register, which linked farms to the owners. The definition of an agricultural enterprise was based on the farm identification numbers of the Farm Register and person numbers from the Tax Register with which the statistical register was composed to correspond the farm population. Six different functions can be defined for the variables of a statistical register (Wallgren, 2007): identifying variables as farm and person numbers, communication variables as addresses of farms, reference variables containing relations of statistical units, time reference, technical variables and

(3)

statistical variables used in statistic production. In general, register variables cannot be used directly for statistics why editing processes are needed. Farm income, subsidies, expenses, profit and losses were collected and edited from the variables of the Tax Register. The statistical register of agriculture was used as the sampling frame of the statistical survey of farm enterprises as well as in the imputation of missing values, in the calibration estimators and nonresponse analysis.

The AFES is published annually. A typical feature in using administrative data as sources in statistical production is a delay between the reference point of time and publishing time. The statistical register of the AFES is completed one year after the taxation year, so we have one year delay publishing income statistics. Statistics based on agricultural survey are published two months later.

2. Survey of agricultural enterprises

The register data were supplemented by the sample survey in which data on income according to farm production were collected by questionnaires. Farmers returned the questionnaire by mail or filled in it at the Internet. A new data collection method was introduced in which the questionnaire was built into the farm bookkeeping software, which searches the sales and purchases of products and fills in the form and sends the data automatically from the farmer's computer to Statistics Finland via the Internet. In 2006, the statistical questionnaire was included in the most part of agricultural bookkeeping software packages. The automated data collection decreases the framers' response burden and speeds up the data collection process. Once the majority of farmers have updated their bookkeeping software packages, we can expect increased response rates and especially lower item non-response. Until today, main part of the questionnaires is returned by mail, and only about 5% were returned via the Internet. The statistical register of the AFES served as the sampling frame, from which 9000 agricultural enterprises were selected. The sampling design was stratified simple random sampling including a rotating panel in which one half of the sample changes yearly, and in this sampling scheme, a survey unit stays in the sample under two years. Stratification variables were production sector and farm size, which were cross-classified to 31 strata in 2006. In 2004 and 2005 there were small variation in stratification depending on changes in production sectors and size classes (Maliniemi, 2008:1). The units were allocated to the strata using Neyman allocation where the agricultural profit served as an allocation variable. The frame in 2004 and 2005 included over-coverage because at the change stage of the AFES, the tax data were not available when sample was selected but in future we can use the tax data of preceding year to drop off over-coverage already in sample selection. In 2004 and 2005, the over-coverage was removed from the sample after the data collection when it was found. Under-coverage is more serious because it cannot be observed. Farms which were missing from the Farm Register or the farm owners which were not in the Tax Register, stay in under-coverage, and one, perhaps the most common reason of under-coverage, was the failed linkage of register data.

Unit nonresponse arose when a farmer did not return the questionnaire. The data from the statistical register of the agriculture was gathered for unit nonresponse, so our unit nonresponse was not like usually defined total nonresponse when no data are received for selected units. Item nonresponse was defined to occur when a farmer responded

(4)

but some parts of a questionnaire were not filled as for instance sales of products or expenses of purchases were missing. In many Finish Surveys, response rates are higher on countryside than in cities, for instance, in the Labour Force Survey (LFS) the average response rate of 2007 was 77 % in bigger cities, 83% in densely populated municipalities and 85% in countryside. People co-operate in surveys more willingly in the countryside than in cities, so in general, non-response is smaller in the domain of farmers than in the whole population. The LFS is short telephone interview but the AFES is a mail enquiry and, in addition, more complex by collecting income from different sources why response rates of the AFES were lower than in the LFS varying 41% - 48% in the survey period 2004 - 2006. Item nonresponse, where the sources of income were missing, was 30% - 36% in 2004 - 2006 that is almost the same size as response rates (Appendix, Table 1). After imputing the item nonresponse to complete responses, the joined rates of respondents and item nonresponse group were 79% in 2004, 76% in 2005 and 78% in 2006.

The item non-response had higher income, expenses and subsidies than respondents. The analysis of item non-response showed that farms with high income from cattle, pig and poultry farming were included in the item non-response group with higher probability than cereal farms. Farms with high income seemed to have difficulties to fill in the questionnaire with detailed data on their sales of agricultural products or expenditure. An exceptional feature was that in the crop and cereal production sectors the income of the nonresponse group was higher than the income of the item non-response or non-response groups (Appendix, Table 3). Cereal and other crop production had higher nonresponse rates than livestock production (Appendix, Table 1). Farmers of cereal production had often side jobs outside a farm. They might work as excavating entrepreneurs, taxicabs, leisure services, renting weekend cottages etc. The agricultural enterprises of the nonresponse group were smaller and they have lower average income, expenses and profit than respondents, but there are great differences between the production sectors as the following cite from Maliniemi (2008:2) shows; the level of farm-specific result is determined, among other things, by the enterprise’s line of production and farm size; e.g. the result of farms engaged in pig or poultry production exceeded EUR 30,000, whereas the result of cereal farms stood at roughly EUR 7,500, and the result of dairy farms was EUR 28,117. The same distribution according to production sector was found in the item nonresponse and nonresponse groups (Appendix, table 2). The item nonresponse group had higher income, expenses and profit of farming than at other groups. In livestock production, the most part of the sample belonged to the item nonresponse, and they had higher income than farmers at respondent and nonresponse group. (Appendix, Table 3).

Missing entries were imputed to item nonresponse. Ratio imputation using the statistical register was applied in cases when farm had production only in one production sector. Imputation was applied more often for livestock products than cereal production because production lines remained normally same in cattle, pig, sheep and poultry farming when in the cereal production the crop rotation brought changes in cultivated plants. Because the imputation of the cereal production was not possible, these farms were classified to nonresponse, that was one reason for the higher nonresponse rates than those in other production sectors. When several entries were missing, multiple imputation was applied. Because the distributions of income, expenses, profits and losses were skew, the values were transformed to logarithmic scale before multiple imputation to get normally distributed variables, and for each

(5)

imputation group 50 complete data were formed. Calculations were done by using SAS MI-procedure. Imputation enabled us to include the farmers of the item non-response group as respondents in the final data.

Register data were used to increase the precision of survey estimates by using post-stratification according to legal form of farms and agricultural subsidy districts. The modified income data of the statistical register were utilized by applying calibration techniques, in which the estimates of the totals of the register variables were benchmarked to the true values. The CLAN program was applied for the estimation (Andersson and Nordberg 1998). The vector of auxiliary variables were juridical classes, production sector, province, agricultural region, arable land, income of animal sales, livestock production, cereal income, other production, purchases with the Value Add Tax (VAT) 22%, purchases with the VAT 17%, other expenses, subsides from government, profit and losses. These variables gave 65 elements to the calibration vector.

In the panel, farms behaved at the same way in both years. If a farm belonged to the respondent group in the first year it belonged to the same group next year in 29% -32% of all cases. The same kind of behaviour was observed for other responses, in which 21% - 24% of units belonged to item nonresponse and 15% - 16% of the units belonged to nonresponse in successive years. Changes in response behaviour were found fewer than 10% of the panel parts. As an example, a change from nonresponse in 2004 to respondent in 2005 was 4% and a change from nonresponse in 2005 to respondent in 2006 was 6% of all units (Appendix, Table 2). In the panel part of 2004 sample, 70% of farms in respondent group belonged to respondents in 2005 and corresponding 81% of respondents of 2005 belonged to respondents in 2006. 63% of item nonresponse in 2004 belonged to item nonresponse in 2006 and corresponding 59% of item nonresponse in 2005 belonged to item nonresponse in 2006. 71% from nonrespondents in 2004 belonged to nonrespondents in 2005 and corresponding 65% of respondents were nonrespondents in 2006. About one fifth of nonrespondents in 2004 were respondent in 2005 and one quarter of nonrespondents in 2005 were respondents in 2006. The change from respondents to nonresponse was from 7.4% to 8.5%.

The composite data opened new possibilities for research into agricultural economics by allowing detailed analyses of production sectors, decisions on the activities of farms, and changes in the economic situation.

3 Concluding remarks

Lately global food crises have highlighted the importance of effective agriculture production and further demands from proper statistics for decision-makers. The new statistical register of agricultural enterprises opens possibilities for more accurate agricultural statistics and research, and in future, yearly statistical registers compose a longitudinal data base giving time series on development of agricultural income, expenses and profit, and the data of the whole population offers possibilities for studies on small farmer groups or areas. The main part of the AFES was based on register data, the gathering of which was cost effective. The agricultural survey supported the register-based statistics. The data collection of the survey is automated

(6)

by adding the statistical questionnaire into the bookkeeping software of farmers that decreases response burden and quickens editing processes. The statistical register was used as an additional information in the survey giving possibilities for stratification and calibration, which benchmarked the estimated values to the true values. Ratio, regression and multiple imputation were used to complete partially filled questionnaires.

References

Andersson, C., and L. Nordberg (1998), A User's Guide to CLAN 97 - a SAS- program for computation of poin- and standard error estimates in sample surveys, Statistics Sweden, ISBN:91-618-0965-9, Örebro.

Maliniemi, Hannu (2008:1) Quality Report of Agricultural and Forestry Economics Statistics, Statistics Finland, http://tilastokeskus.fi/til/mmtal/2006/mmtal_2006 _2008-07-09_laa_001_fi.html (in Finnish)

Maliniemi, Hannu (2008:2) Statistics on the finances of agricultural and forestry enterprises. Web publication of Agricultural Statistics, Statistics Finland (http://tilastokeskus.fi/til/mmtal)

Wallgren, A., Wallgren, B. (2007): Register-based Statistics - Administrative Data for Statistical Purposes. John Wiley & Sons, Ltd..

Väisänen, P. (1995). The sampling design of Finnish Agricultural Income Statistics. Proceedings of the 1995 Kansas State University Conference on Applied Statistics in Agriculture. Kansas State University, Kansas. 8+3

APPENDIX

Table 1 Response, item nonresponse and nonresponse rates (%) by production sectors

in 2004 - 2006

2004 2005 2006

Production sector

Respon

dent %Itemnonres

ponse % Nonres ponse % Respon dent % Item nonres ponse % Nonres -ponse % Respon dent % Item nonres ponse % Nonres -ponse % Dairy products 41 58 1 38 61 1 33 67 0 Beef products 44 47 8 39 49 12 49 50 1 Pig farming 46 45 8 43 47 10 45 45 10 Poultry 49 41 11 44 28 28 53 40 7 Other livestock 31 44 25 37 33 30 54 23 23 Cereal production 45 17 38 42 14 43 41 36 23 Other crop production 43 22 36 43 19 38 48 13 39 Other production 39 20 41 40 29 31 39 26 35

All 43 36 21 41 35 24 48 30 22

(7)

Table 2 Response model of the rotating panel

Year 2004 Year 2005 N %

RESPONSE RESPONSE 1,231 29.0

RESPONSE ITEM NONRESPONSE 368 8.6

RESPONSE NONRESPONSE 148 3.5

ITEM NONRESPONSE RESPONSE 408 9.6

ITEM NONRESPONSE ITEM NONRESPONSE 1,012 23.8 ITEM NONRESPONSE NONRESPONSE 181 4.3

NONRESPONSE RESPONSE 165 3.9

NONRESPONSE ITEM NONRESPONSE 98 2.3

NONRESPONSE NONRESPONSE 641 15.1

Year 2005 Year 2006 N %

RESPONSE RESPONSE 1353 32.4

RESPONSE ITEM NONRESPONSE 189 4.5

RESPONSE NONRESPONSE 124 3.0

ITEM NONRESPONSE RESPONSE 518 12.4 ITEM NONRESPONSE ITEM NONRESPONSE 888 21.3 ITEM NONRESPONSE NONRESPONSE 98 2.3

NONRESPONSE RESPONSE 254 6.1

NONRESPONSE ITEM NONRESPONSE 99 2.4

NONRESPONSE NONRESPONSE 655 15.7

Source: the micro data of the statistical register of agricultural enterprises

Table 3 Total income of respondents, item nonresponse and nonresponse by

production sector

Respondent Item nonresponse Nonresponse Production sector 2004 2005 2006 2004 2005 2006 2004 2005 2006 Dairy products 117321 117344 133992 118151 123571 140691 68802 66023 89764 Beef products 80335 84136 107381 86865 103279 110536 59460 65024 67863 Pig farming 175732 221889 202826 177690 210111 224949 141864 178014 153368 Poultry 229646 229303 235514 265631 324881 272713 269435 175790 224122 Other livestock 18718 17413 16007 21046 26488 29754 22827 17409 26050 Cereal production 54129 55270 56552 55198 53276 45564 59092 61136 56750 Other crop product. 64499 63572 63282 49340 44760 35936 59875 70161 71554 Other production 23771 26742 43127 48862 20425 57124 80622 58064 19944 All 92032 93421 101704 106849 114872 120517 64799 69694 68589

References

Related documents

Standardization of herbal raw drugs include passport data of raw plant drugs, botanical authentification, microscopic & molecular examination, identification of

effort to develop few novel hybridized derivatives of murrayanine (an active carbazole derivative) by reacting with various small ligands like urea, chloroacetyl chloride,

Therefore the aim of this observational study was to assess the utility of the MYMOP2 and W-BQ12 health outcomes measures for measuring clinical change asso- ciated with a course

Also, both diabetic groups there were a positive immunoreactivity of the photoreceptor inner segment, and this was also seen among control ani- mals treated with a

Tracings of electrocardiograms (lead V4) in patients with hyperkalemic familial periodic paralysis before and during spontaneous attacks of paralysis and after disappearance of

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

The objective of this study was to develop Fourier transform infrared (FTIR) spectroscopy in combination with multivariate calibration of partial least square (PLS) and

A Virtual personal Network (VPN) is that the extension of a personal network that encompasses links across shared or public networks just like the web.. A VPN