CHAPTER 4. Data and Measures
4.2. Variables and Measures
4.2.1.Dependent Variables
The GT/RIETI survey asks respondents whether the patented invention was
commercially used and, if not, why it was not. The modes of commercial use asked include 1) commercialized in a product/process/service by the applicant/owner, 2) “licensed by (one of) the patent-holder(s) to an independent party,” 3) if the patent is licensed whether it is a part of a cross-license, and 4) whether the patent was
commercially exploited by the respondent or any of respondent’s co-inventors for starting a new company. First, we constructed a variable “any use” by coding it 1 for the patented
19
For example, the sum of all types of R&D efforts should be 100%. We removed those observation whose sum of R&D efforts are not either 99% or 100%.
invention falling in any of these three categories. We coded 0 for those observations who explicitly reported that the patented invention was not used for all three commercial modes. Some answered to only some of these three questions. This affects identifying nonuse. In the survey, then, we asked for the reasons a patent was not used. Thus, we regard those observations who answered to the reasons for nonuse but only partially reported nonuse in the three use questions as nonuse. Out of 1,239 complete cases, 657 (53.0%) patents are reported to be used. There will be some gap between the time when an invention was completed and the time when it was put into actual use. Also that gap may vary by industry- or technology-specific factors. All patents in the sample were first filed20 between 2000 and 2003 inclusive, but the granted date spans from 2000 to 2006. Therefore, we tested whether the rate of actual use of patents differs by the year filed or issued. Unequal variance t-test shows that there are no significant differences between the two groups in terms of issue year and filed year.21
4.2.2.Explanatory Variables
We operationalize technological maturity using the familiarity index of technological components devised by Fleming (2001). The component familiarity captures the degree to which a patentee is familiar with the technological components that were used in his patent. The basic assumption is that as a technology matures (therefore, the population of technological artifacts increases), technological trajectories based on this technology
20
Indeed, the first priority patent in the triadic family.
21
The chi-square test results show significant differences in issue year by used2 (chi-sq statistic=11.84 with d.f.=5, prob=.0371). However, the trends are not clearly monotonic, although year 2001 had the highest ratio (67.44%) and 2006 the lowest (48.21%). The second highest ratio of commercialization was shown in year 2005 (56.41%). Therefore, we could not confirm that commercialization should be right truncated in the sample.
become more foreseeable (Dosi, 1982). Component familiarity, as suggested by Fleming, averages the number of patents previously assigned to the same technology classes as the focal patent and applies a knowledge attenuation factor by temporal distance between the focal patent and the referred patents. He has empirically shown that component
familiarity had an inverted-U relationship with the uncertainty of utility of the patent as measured by the variation of forward citation counts.
In order to construct this variable, first we count the number of U.S. patents filed from 1976 to 1999 in each technology class.
Component familiarity for patent i =
å
å
δ
i j i c C k to from filed k patents all j Con
kattenuati
c
subclass
to
assigned
k
patent
N
1999 1976}
{
1
1
Where Ci=
{
patent subclassassigned to patent i}
, jc = patent subclass identifier,
i
C
N = number of different patent subclasses assigned to patent i.
And knowledge attenuation factor, kattenuationk= ) loss knowledge of constant time k patent of distance temporal exp( ,
Where temporal distance of patent k=
4.5 if patent k was filed from 1995 to 1999 9.5 if patent k was filed from 1990 to 1994 16.5 if patent k was filed from 1976 to 1989
Time constant of knowledge loss is set to 5 following Fleming (2001). We rescaled component familiarity by dividing it by 1,000.
Complex technology areas are identified using the survey. The GT/RIETI survey asks the inventors “how many domestic patents are jointly used in the commercial application of the invention.” It provided eight categories: 1, 2 to 5, 6 to 10, 11 to 50, 51 to 100, 101 to 500, 501 to 1,000, and more than 1,000 patents. We averaged the median values over 30 subgroups of technologies and constructed a variable, “technological complexity.” Then, a dichotomous variable, complexity of product technology, is coded 1 if the technological complexity of the subclass to which the focal patent belonged was higher than the median value of technology complexity. The complex technologies classified in this way include information technology, semiconductors, telecommunication, electronics, biotechnology, and chemical engineering. The non-complex (or discrete) technologies include textile, pharmaceuticals, agriculture and food, construction, and transportation. This
classification is consistent with the previous literature (Cohen, Nelson, and Walsh, 2000; Kusunoki, Nonaka, and Nagata, 1998; Reitzig, 2004) but, we believe, more evidence- based.
Following Hall & Ziedonis (2001) and Ziedonis (2004), we measure capital intensity using the deflated book value (constant U.S. dollars in 2000) of property, plant, and equipment divided by number of employees. In order to mitigate yearly fluctuation and reduce missing values, we use a three-year running average centered on the filed year of a focal patent. Main data sources are COMPUSTAT North America–Fundamentals Annual
and COMPUSTAT Global–Fundamentals Annual.22 For a few firms, we directly found
22
We use consolidated financial reports. Therefore, many subsidiaries in our sample are regarded as a parent company whose financial information is available.
the data from their web sites. In the sample, about a quarter of the firms are either private or foreign whose financial information is not available in either the COMPUSTAT or alternative sources mentioned above. They are coded as a dummy variable named “dummy for missing capital intensity.”
4.2.3.Controls
We control the area of technology, the technological value of the focal patent, the nature of invention (product vs. process), initial purpose of the research that led to the patented invention, the proportion of R&D efforts devoted to the basic research, technological breadth of patents as measured by the number of different technological classes
belonging to the focal patent, number of inventors registered in the patent document, the number of independent claims contained in the patent documents, and the logarithm of age of the invention at the time of survey as measured by the incipient date of completed survey subtracted by the filed date of the patent in the months.
Technological assets
We use patent stock as a proxy for technological assets of a firm. Patent stock is
calculated as the number of granted U.S. patents assigned to the first assignee in the focal patent and filed before the filed year of the focal patent. Patent stock of firm i for a focal patent filed in year t is:
where d represents the constant depreciation of knowledge, which is set to 15% following the previous studies (Grimpe and Hussinger, 2008; Hall, 1990).
Similar to the way we construct the capital intensity, subsidiary firms are consolidated into their ultimate parents. Patent stock of merged and acquired firms is also consolidated into the merger. We use the PATSTAT database (April 2008 version) compiled by the EPO. There are two advantages using the PATSTAT for this purpose. First, the PATSTAT provides relational tables and an SQL interface for the bibliometric information of the U.S. patents, which make data extraction much easier than other available data sources. Second, PATSTAT provides standard ID numbers of assignees, which corrected many small differences in spellings. We further cleaned the data by manually searching and correcting the list of assignees in our sample.
External knowledge flows
The GT/RIETI survey asks how important the various knowledge sources were in either suggesting or completing the research that led to the patented invention. The measure is a six-point Likert scale with 0 for “did not use,” 1 for “not important,” and 5 for “very important.” The sources listed are scientific and technical literature, patent literature, fair or exhibition, technical conferences and workshops, standard documents, universities, government research organizations, customers or product users, suppliers, competitors, and others. Responses in the others category include consultant, education, or experience. Then we identified six items (patent literature, fair or exhibition, standard documents,
customers or product users, suppliers, and competitors) as “industrial knowledge” and the remaining four items (scientific literature, technical conferences and workshops,
universities, and government research organization) as “public knowledge.”
External collaboration
It is well known that networks affect the outcomes of innovation (Afuah, 2000; Dyer and Nobeoka, 2000; Gulati and Higgins, 2003; Gulati and Sytch, 2007; Powell, Koput, and Smith-Doerr, 1996; Rothaermel and Deeds, 2004; Shan, Walker, and Kogut, 1994). The network literature consistently finds that firm performance is positively associated with R&D collaboration (Powell, Koput, and Smith-Doerr, 1996; Rothaermel and Deeds, 2004), networking with suppliers (Dyer and Nobeoka, 2000; Gulati and Sytch, 2007), or quality of networks (Powell, Koput, and Smith-Doerr, 1996; Uzzi, 1996). The GT/RIETI Survey asks whether the focal patent was developed with inventors who belong to various external organizations and whether the focal patent was developed through formal or informal collaboration with external organizations. The survey presents 8 distinct categories for external organizations including suppliers, customers and product users, competitors, non-competitors within the same industry, other firms, universities, government research organizations, hospitals, and other. We construct a collaboration dummy variable by coding 1 for the inventions with any external collaborators.
Inventor in manufacturing unit
In Teecian arguments, the complementary assets interfere with mode choice of innovation in three different points. If the invention does not require complementary
assets, it is immediately commercialized by the inventor. When the invention requires complementary assets for commercialization, the degree of specialization and the
ownership of those assets play a role. Empirically, it is hard to assess whether a particular invention requires complementary assets for its commercialization and how specialized those assets should be. Therefore, we assume that every invention requires a certain type of downstream assets, such as manufacturing facility, and that those assets are somewhat co-specialized. An invention from a manufacturing unit is already, or ready to be,
coupled with downstream co-specialized assets. The GT/RIETI survey asks which organizational unit the inventor belongs to. A variable “Inventor in manufacturing unit” is coded 1 if the inventor belongs to the manufacturing unit and 0 for the R&D unit (either independent or sub-unit attached to non-R&D function), software development, sales and marketing, and others.
R&D for base technology
This variable discriminates the business needs of the invention. Using our survey, we code 1 for this variable if the reported purpose of research is “enhancing the technology base of the firm or the long-term cultivation of technology seeds.”
Proportion of basic R&D
This variable is a proxy measuring the position of the invention on a basic-applied spectrum. In the survey, we asked the inventor how much effort (in percentage) he put into basic research. The other categories presented are “applied research,” “design and/or development,” and “technical services.”
Technological value of patents
In our survey, we ask the inventor to assess the technical significance of the invention relative to other technical developments in the field during the year the focal patent was applied for. We code 4 for the top 10%, 3 for the top 25% (but not top 10%), 2 for the top 50% (but not top 25%), and 1 for the bottom half.
Number of inventors
We control the number of inventors as registered in the U.S. patent publication.
Type of innovation
Product innovation is observed to differ in some aspects from process innovation (Cohen, Nelson, and Walsh, 2000). Therefore, we controlled for the type of innovation. A variable “product innovation” is constructed from the survey. The reference category is composed of process innovation or mixed innovation in which product and process innovation are mixed.
Number of claims
We control the scope of patent by including the number of independent claims. Each claim may be regarded as an independent patent (Tong and Frame, 1994)23 and, thus, the number of claims is known to measure the breadth of utility or applicability of the patent. In U.S. patent law, there are two types of claims: independent and dependent or multiple
23
In judging patent infringement in the U.S., infringing any single claim in a patent is regarded as infringement on the patent.
dependent. While an independent claim stands alone, a dependent claim refers to a claim previously set forth and specifies a particular embodiment or limitation of the invention (35 U.S.C. 112). Because of this distinction, counting dependent claims may not (or in a fractional way) properly reflect the technological scope of inventions. Therefore, we count only the independent claims. We regard any claim that contains a reference to another claim as a dependent claim and subtract them from the total number of claims. We take a natural logarithm of it, assuming marginally decreasing nonlinear effects.
Age of invention
The mode of use may vary by the length for which an invention has come out and been publicized. The variable “age of invention” measures how many months have elapsed at the time of the survey since the invention was filed.
Industry dummies
We distinguish six different industries using OST/INPI/ISI nomenclature24 based on International Patent Class.
24
This is a widely used nomenclature, especially among European researchers, focusing on industry characteristics. This system was developed and updated by three European research institutes: the Observatoire Science et Technology, the INPI (Institute Nationale Proprieté Industrielle), and Fraunhofer Institute for Systems of Innovation Research.
Table 4.2 Variables and descriptions (N=1239)
Variable Mean Std.
Dev. Min Max Data source
Any commercialization 0.530 0.499 0 1 Survey
Explanatory variables
Component familiarity (/1000) 0.087 0.159 0 2.489 USPTO
Capital intensity (M$/employee) 0.073 0.118 0 1.086 COMPUSTAT
Dummy for missing capital
intensity 0.262 0.440 0 1 COMPUSTAT
Controls
Large firm (employees > 500) 0.859 0.348 0 1 Survey & Patent
Ln(patent stock) 5.466 2.753 0 9.865 PATSTAT
Inventor in manufacturing unit 0.084 0.277 0 1 Survey
Industrial knowledge 0.268 0.189 0 1 Survey
Public knowledge 0.266 0.208 0 1 Survey
Dummy for collaboration 0.293 0.455 0 1 Survey & Patent
Technological value 2.211 1.069 1 4 Survey
No immediate demand 0.224 0.417 0 1 Survey
% Basic R&D (/100) 0.082 0.176 0 1 Survey
Product invention 0.513 0.500 0 1 Survey
Man-month (normalized) 0.182 0.229 0 1 Survey
Number of inventors 2.796 1.911 1 16 Patent
Complexity of technology (#
USPC) 4.431 3.535 1 30 Patent
Number of claims 22.826 15.689 1 181 Patent
Age of invention (months) 68.873 12.029 37 92 Patent
Electrical engineering 0.256 0.437 0 1 Patent
Instruments 0.209 0.407 0 1 Patent
Chemistry, pharmaceuticals 0.237 0.426 0 1 Patent
Process eng., special equipment 0.136 0.343 0 1 Patent
Mechanical eng., machinery 0.134 0.341 0 1 Patent