Data sources - Data Collection - RESEARCH METHODOLOGY

Chapter 4 RESEARCH METHODOLOGY

4.2 Data Collection

4.2.2 Data sources

In this study, secondary data are used as a sole basis, which is one of the three main usages of secondary data (Emory & Cooper, 1991). When using secondary data, the most important issue is the fit of the data to the research questions (Smith, 2008). For this study, longitudinal data are required to uncover the change in impact of explanatory variables on RIC. Secondary data can serve this requirement better than primary data. Besides, the official documents and statistics used in this thesis have some advantages. Official statistics are permanent, and not time consuming or costly (Bryman & Bell, 2003; Emory & Cooper, 1991). They can also result in unforseen discoveries (Saunders, Lewis, & Thornhill, 2003). What is more important is they are feasible for longitudinal studies (Bryman & Bell, 2003; Saunders, et al., 2003) and they can be used to make powerful comparisons between different groups, societies and nations (Smith, 2008). Moreover, official statistics, such as statistic yearbooks, are not based on samples, so a complete picture can be obtained.

In this study we will investigate the impact of the drivers on RIC from 1991 to 2005. According to the transitional process the period can be divided into two phases. Phase One starts in 1991 and extends to 1998, which fits the fourth stage of the reform. Phase Two extends from 1999 to 2005, which is the fifth stage in the long run. Considering time lag between input and output, the time range for output variables is from 1992 to 2009.

Quantitative data, such as official statistics, is mainly used in this study, and qualitative data, such as government documents, is used to supplement quantitative analysis. The quantitative data for the thesis are from three types of yearbooks;

Patent Statistic Yearbook (PSY) from 1991 to 2009; China Statistic Yearbook (CSY) from 1992 to 2011; and China Statistic Yearbook on Science and Technology (CSYST), from 1992 to 2009. The sources of each variable are listed in Table 4.1.

PSY are achieved from the website of State Intellectual Property Office of P.R.C12,

and data on patent counts are all from PSY. CSY 1992 to 1995 was retrieved from

the database of China Knowledge of Infrastructure (CNKI)13, and CSY 1996 to 2009

was retrieved from the website of National Bureau of Statistics of China14. Data

about GDP, population, FDI, imports and exports, and employment rate were all collected from CSY. Data about funding for S&T, engineers and scientists employed full time, value of domestic technology contracts, the number of higher education institutions and the number of large and medium-sized enterprises were extracted

from CSYST, and CSYST were all retrieved from CNKI15.

Aside from the statistics, qualitative data were also collected. Qualitative data were used to supplement the quantitative analysis, and to assist in uncovering the big picture and understanding the results. The main qualitative data for the thesis were government documents, including implemented developmental plans, policies, laws, and regulations, published development and research reports written during the study period, and related information from newspapers, journals, and industry associations. Government documents are the guidelines of innovation activities, record the history and may lead to historical changes. Hence, the information from qualitative data will

12_{http://www.sipo.gov.cn/tjxx/}

CNKI is an e-library in China, including knowledge information, such as information from journals, conferences, newspapers, and published statistics from government departments, in various areas.

14_{http://www.stats.gov.cn/tjsj/ndsj/}

help in understanding the transitional path of innovation development and the changing impact of drivers over time and across the regions.

Government documents were collected from government websites, particularly the Ministry of Science and Technology and National Development and Reform

Commission16. Other documents were sourced from newspapers, for instance China

Daily and industry associations, such as China Association for Science and Technology.

4.3 Measures

Based on the framework developed above, this section describes how the variables were operated for the empirical analysis. The definition and sources of variables are summarised in Table 4.1. To enable comparison of regions of vastly different sizes, all financial variables were divided by regional GDP and other variables were divided by regional population. To ensure distributions are approximately normal, the logarithm transformation of most metric variables was used.

Table 4.1 Definitions and sources of variables

Variable Definition Source

Dependent variables

lgPApm The number of total patent applications per

million people (in logarithm) PSY: 1992-2006

lgPGpm The number of overall granted patents per

million people (in logarithm)

PSY: 1992-2009

lgIPGpm The number of granted invention patents

per million people (in logarithm)

lgUMPGpm The number of granted utility model

patents per million people (in logarithm)

Independent variables Innovation actors

lgNHEIpb Number of higher education institutions per

billion people (in logarithm)

CSY:1992-2006 CSYST: 1992-2006

lgNLMEpm Number of large and medium-sized

industrial enterprises per million people (in logarithm)

Innovation inputs

lgGDPpp GDP per person (in logarithm) CSY: 1992-2006

lgFSTpthGDP Funding for science and technology

activities per thousand GDP (in logarithm) CSY:1992-2006

CSYST: 1992-2006

lgFTE_SEpm Full time employed scientists and engineers

per million person (in logarithm)

Emprate Employment rate CSY: 1992-2006

Interaction

lgFDIpthGDP Inward foreign direct investment per

thousand GDP (in logarithm) _{CSY: 1992-2006}

lgEITpthGDP Import and export trade per thousand GDP

lgVDTCpthGDP Value of Domestic technology contract per thousand GDP (in logarithm)

CSYST: 1992-2006

Note: prior to logarithm, the scale of each variable is as follow: lgPApm, lgPGpm, lgIPGpm, and lgUMPGpm -- item per million people; lgNHEIpb – unit per billion people; lgNLMEpm – unit per million people; the rest are with no scales as they are all calculated based on two indicators with the same scale originally.

In document Determinants of regional innovation capacity in China (Page 90-94)