Chapter 4 RESEARCH METHODOLOGY
4.2 Data Collection
4.2.2 Data sources
In this study, secondary data are used as a sole basis, which is one of the three main usages of secondary data (Emory & Cooper, 1991). When using secondary data, the most important issue is the fit of the data to the research questions (Smith, 2008). For this study, longitudinal data are required to uncover the change in impact of explanatory variables on RIC. Secondary data can serve this requirement better than primary data. Besides, the official documents and statistics used in this thesis have some advantages. Official statistics are permanent, and not time consuming or costly (Bryman & Bell, 2003; Emory & Cooper, 1991). They can also result in unforseen discoveries (Saunders, Lewis, & Thornhill, 2003). What is more important is they are feasible for longitudinal studies (Bryman & Bell, 2003; Saunders, et al., 2003) and they can be used to make powerful comparisons between different groups, societies and nations (Smith, 2008). Moreover, official statistics, such as statistic yearbooks, are not based on samples, so a complete picture can be obtained.
In this study we will investigate the impact of the drivers on RIC from 1991 to 2005. According to the transitional process the period can be divided into two phases. Phase One starts in 1991 and extends to 1998, which fits the fourth stage of the reform. Phase Two extends from 1999 to 2005, which is the fifth stage in the long run. Considering time lag between input and output, the time range for output variables is from 1992 to 2009.
Quantitative data, such as official statistics, is mainly used in this study, and qualitative data, such as government documents, is used to supplement quantitative analysis. The quantitative data for the thesis are from three types of yearbooks;
78
Patent Statistic Yearbook (PSY) from 1991 to 2009; China Statistic Yearbook (CSY) from 1992 to 2011; and China Statistic Yearbook on Science and Technology (CSYST), from 1992 to 2009. The sources of each variable are listed in Table 4.1.
PSY are achieved from the website of State Intellectual Property Office of P.R.C12,
and data on patent counts are all from PSY. CSY 1992 to 1995 was retrieved from
the database of China Knowledge of Infrastructure (CNKI)13, and CSY 1996 to 2009
was retrieved from the website of National Bureau of Statistics of China14. Data
about GDP, population, FDI, imports and exports, and employment rate were all collected from CSY. Data about funding for S&T, engineers and scientists employed full time, value of domestic technology contracts, the number of higher education institutions and the number of large and medium-sized enterprises were extracted
from CSYST, and CSYST were all retrieved from CNKI15.
Aside from the statistics, qualitative data were also collected. Qualitative data were used to supplement the quantitative analysis, and to assist in uncovering the big picture and understanding the results. The main qualitative data for the thesis were government documents, including implemented developmental plans, policies, laws, and regulations, published development and research reports written during the study period, and related information from newspapers, journals, and industry associations. Government documents are the guidelines of innovation activities, record the history and may lead to historical changes. Hence, the information from qualitative data will
12http://www.sipo.gov.cn/tjxx/
13
CNKI is an e-library in China, including knowledge information, such as information from journals, conferences, newspapers, and published statistics from government departments, in various areas.
14 http://www.stats.gov.cn/tjsj/ndsj/
79
help in understanding the transitional path of innovation development and the changing impact of drivers over time and across the regions.
Government documents were collected from government websites, particularly the Ministry of Science and Technology and National Development and Reform
Commission16. Other documents were sourced from newspapers, for instance China
Daily and industry associations, such as China Association for Science and Technology.
4.3 Measures
Based on the framework developed above, this section describes how the variables were operated for the empirical analysis. The definition and sources of variables are summarised in Table 4.1. To enable comparison of regions of vastly different sizes, all financial variables were divided by regional GDP and other variables were divided by regional population. To ensure distributions are approximately normal, the logarithm transformation of most metric variables was used.
80
Table 4.1 Definitions and sources of variables
Variable Definition Source
Dependent variables
lgPApm The number of total patent applications per
million people (in logarithm) PSY: 1992-2006
lgPGpm The number of overall granted patents per
million people (in logarithm)
PSY: 1992-2009
lgIPGpm The number of granted invention patents
per million people (in logarithm)
lgUMPGpm The number of granted utility model
patents per million people (in logarithm)
Independent variables Innovation actors
lgNHEIpb Number of higher education institutions per
billion people (in logarithm)
CSY:1992-2006 CSYST: 1992-2006
lgNLMEpm Number of large and medium-sized
industrial enterprises per million people (in logarithm)
Innovation inputs
lgGDPpp GDP per person (in logarithm) CSY: 1992-2006
lgFSTpthGDP Funding for science and technology
activities per thousand GDP (in logarithm) CSY:1992-2006
CSYST: 1992-2006
lgFTE_SEpm Full time employed scientists and engineers
per million person (in logarithm)
Emprate Employment rate CSY: 1992-2006
Interaction
lgFDIpthGDP Inward foreign direct investment per
thousand GDP (in logarithm) CSY: 1992-2006
lgEITpthGDP Import and export trade per thousand GDP
lgVDTCpthGDP Value of Domestic technology contract per thousand GDP (in logarithm)
CSYST: 1992-2006
Note: prior to logarithm, the scale of each variable is as follow: lgPApm, lgPGpm, lgIPGpm, and lgUMPGpm -- item per million people; lgNHEIpb – unit per billion people; lgNLMEpm – unit per million people; the rest are with no scales as they are all calculated based on two indicators with the same scale originally.
81