IV. Drivers of scientists’ entry into new fields
2 Literature Review
3.1 Dependent variables: entry into new scientific fields
We define entry into new research fields by first constructing, for each researcher, an indicator with the value one whenever a scientist publishes in a field in which she hasn’t published before,100 and follows-up with at least another publication in that field in subsequent years.101 The latter means that we require a new field entry to achieve a minimum level of ‘stickiness’ and excludes a one-time only excursion into another field.102
To do so, we use historical publication data from Thomson ISI dating back to 1971 in order to accurately depict the fields in which each scientist has published in every year. The taxonomy of scientific fields defined in WoS has 247 unique fields, shown in Table IV.4 in Appendix. The fields cover all domains of science – science, social sciences, arts & humanities – and are attributed to journals and other publications. The researchers in our sample publish in 216 of these fields throughout their careers at the University of Leuven.103
Since some fields have many sub-classifications (e.g. chemistry is subdivided into analytical, applied, inorganic & nuclear) while others do not (e.g. microbiology is not further subdivided), usage of the complete taxonomy would imply that almost all entries in a subfield are classified as a ‘new field entry’. In order to deal with this and remaining imbalances in the WoS taxonomy that may affect our measure of field entry, i.e. some fields may be more narrow than others
98 The construction of the data set is described in Kelchtermans & Veugelers (2005).
99 In order to retrieve each researcher’s publication history, we have performed searches by name and affiliation at
the University of Leuven. For the latter, we used the Organization-Enhanced field provided by Thomson Reuters, which links all possible variations of an organisation’s name found in WoS publications. The search term ‘KU Leuven’ captures all the different names used in publications to refer to the university of Leuven; we also added the search term ‘University Hospital Leuven’ to retrieve all publications by scientists affiliated to the university’s medical departments.
100 We take into account the following types of publications: ‘Article’, ‘Article; Book Chapter’, ‘Article;
Proceedings Paper’, ‘Letter’, ‘Meeting Abstract’, ‘Note’, ‘Proceedings Paper’, ‘Review’.
101 The first-ever publication while at KU Leuven is not counted as a new field entry.
102 This restriction also partially addresses the concern that a field entry may be due to co-authorships. While this is
not a problem per se, it exaggerates new field entry for the focal scientist if the co-author simply supplies own expertise without there being much interaction between co-authors.
103 Researchers’ publication records were retrieved from WoS using name and, to avoid false matches due to
homonyms, affiliation. While we took into account spelling variations in both names and affiliation, the retrieved publications are restricted to those where the scientist mentions her University of Leuven affiliation. Although this may imply incomplete coverage of publication records, mobility of researchers is known to be very limited, especially in the time period we consider and it is reasonable to assume that the publication records are generally accurate.
142
and/or may differ less from neighbouring fields, we construct an alternative dependent variable. For each scientist, we define in each year t the maximum dissimilarity or distance between her main field – in which she published most before year t – and all new fields that she enters in year t.
We first calculate a yearly cosine similarity matrix between fields based on cross-citations between journals and their respective scientific fields attributed by Thomson Reuters.104 Dissimilarity between each field pair is then defined as (1-similarity), and pairs of fields which are completely dissimilar have a distance value of one.105 We finally calculate our dependent
variable as the maximum dissimilarity or distance between each researcher’s main field prior to year t and all fields that he publishes in for the first time in t (and at least once more afterwards).
Overall, the yearly average number of fields a researcher publishes in is 4.44 in the 1992-2001 period, while the average maximum number of fields per scientist and per year is 7.43. Three quarters of researchers enter new scientific fields at least once in the ten-year period, and the average maximum distance to new fields is 0.28, indicating that exploration happens mostly in closely related fields. However, the standard deviation of the maximum distance variable is 0.42, indicating a fair amount of variation between scientists.
Table IV.1 illustrates in its left panel the variation in entry percentages and overall number of new fields, and maximum distances and sum of distances to new fields in the right panel. The between-variation shows how much average behaviour differs from one scientist to another, while the within-variation illustrates individual behavioural differences across time. We observe that overall, both the entry indicators and the distances to new fields vary more across time than between scientists.
104 For example, the most similar scientific fields in 1992 were ‘Remote Sensing’ and ‘Imaging Science & Photo
Technology’, with a cosine similarity value of 0.692. The least similar in the same year (with a non-zero cosine value) are ‘Astronomy & Astrophysics’ and ‘Biochemistry & Molecular Biology’. In 2002, ‘Cell Biology’ and ‘Biochemistry & ‘Molecular Biology’ are most similar, while ‘Astronomy & Astrophysics’ and ‘Health Policy & Services’ are farthest apart.
143
Table IV.1 Variation in field entry rates and distances from main fields
Mean s.d. Mean s.d.
Entry in new fields overall 0.30 0.46 Maximum distance overall 0.28 0.42
between 0.27 between 0.25
within 0.39 within 0.36
Sum of new fields overall 0.55 1.07 Sum distances overall 0.64 1.71
between 0.60 between 0.86
within 0.91 within 1.50
Looking closer at different entry patterns per main discipline in Table IV.6 in Appendix, we notice that Agriculture and Biology researchers are, on average, more prone to enter new fields in the observation period.106 Researchers in these disciplines publish in new fields 40.3% and
37% of the time respectively, while, for example, the rate among mathematicians is 19%. Engineers and agriculture researchers have the highest averages of number of new fields entered per scientist and per year, at 0,76 and 0,79 respectively. Similarly, scientists with agriculture or biology as main disciplines stray farthest away from their fields when entering new ones, followed by engineers.
As the large standard deviations in Table IV.1 already indicate, there is still substantial heterogeneity within disciplines. As an example, Figure IV.1 displays the cumulative number of field entries of four bioscientists which are, to improve comparability, part of the post-1992 entry cohort. Different patterns occur: zero variation, modest and strong increase in the stock of field entries at low and high levels of output.107
106 The classification of disciplines was developed by the Centre for Research & Development Monitoring
(ECOOM) from the University of Leuven, independently from the Web of Science field classification. It distinguishes among 13 main disciplines, shown in Table IV.5 in Appendix. A researcher’s main discipline is defined as the one in which she has the majority of her publications. Including it in the analysis allows controlling for both the fact that it may be ‘easier’ to enter new fields due to imbalances in the field taxonomy, as well as for different propensities between disciplines to enter new fields, e.g. due to scientific opportunities.
107 Note that scientists enter the analysis from the moment they become professors at KU Leuven. Since we track
144
Figure IV.1 Cumulative number of new field entries for 4 bioscientists