Boston University
OpenBU http://open.bu.edu
University Libraries BU Libraries Scholarly Papers and Presentations
2017-10
Evaluating and promoting open
data practices in open access
journals
https://hdl.handle.net/2144/24792
Evaluating and Promoting Open Data
Practices in Open Access Journals
ele ni ca st r o
,
merce
`
c r osas
,
a l e x
g a r n e t t
,
kasey sh er ida n
, and
m ic ah
a lt m a n
The last decade has seen a dramatic increase in attention from the scholarly communications and research community to open access (OA) and open data practices. These are potentially related because journal publication policies and practices both signal disciplinary norms and provide direct incentives for data sharing and citation. However, there is little research evaluating the data policies of OA journals. In this study we analyse the state of data policies for OA journals by employing random sampling of the Directory of Open Access Journals and Open Journal Systems journal directories and applying a coding framework that integrates both previous studies and emerging taxonomies of data sharing and citation. This study, for the first time, reveals both the low prevalence of data-sharing policies and practices in OA journals, which differs from the previous studies of commercial journals in specific disciplines.
Keywords: open access, open data, data sharing, data citation
int rodu ction
With the open access (OA) movement celebrating its fifteenth anniver-sary in 2017, and open data (OD) moving closer to becoming an estab-lished research practice, we now have sufficient information to observe how these two worlds intersect when it comes to the data-sharing and citation practices of OA journals. In this study we look at the prevalence of data-sharing policies and analyse the data-sharing and citation charac-teristics of OA journals. We review previous studies on journal data policies from various disciplines, along with related efforts from the research data community, scholarly societies, funders, and flagship journals to help understand the prevalence of policies and best practices for data sharing and citation. While a number of studies have analysed
data-sharing policies and practices in scholarly journals (mostly com-mercial) within specific disciplines, little is known about the overall prevalence and characteristics of OA journals’ data policies. We evaluate the state of such policies by employing random sampling of the Direc-tory of Open Access Journals (DOAJ) and Open Journal Systems (OJS) journal directories and then applying to that sample a coding framework that integrates both previous studies and emerging taxonomies of data sharing and citation.
The Increasing Importance of Open Access and Open Data
Since the 2002 Budapest Open Access Initiative, which gave a name to making published research freely available online, there are now (as of March 2017) over 9419 OA peer-reviewed journals listed in the DOAJ. According to Peter Suber, director of the Harvard Open Access Project and one of thede factoleaders of the OA movement, ‘OA makes knowl-edge a public good in practice’ and allows researchers to share knowlknowl-edge and accelerate research without the economic and sharing restrictions put in place by the commercial publishing model.1Many recent studies
have shown clear incentives for authors to choose OA publishing over the traditional commercial model. For example, Wagner’s 2010 annotated bibliography lists thirty-nine studies where researchers found a significant citation advantage for OA articles.2Since then, in 2015, Scholarly Publish-ing and Academic Resources Coalition (SPARC) Europe listed on its website seventy studies, of which forty-nine observed a citation advantage for articles published in OA journals from various disciplines.3 Further-more, given the continuous rise in subscription costs for scholarly jour-nals,4 it is becoming increasingly important for OA journals to help make research more widely accessible so that scholarly communication will thrive in the twenty-first century.
In parallel with the OA movement, the desire to have OD5as a tool
for improving scholarly communication and research has also increased in importance over the past decade. The increased importance of OD is particularly evident in recent government agency, funder, and scholarly society mandates for researchers to make their publicly funded data openly available for other researchers to access and reuse.6 Moreover,
OA scholarly societies such as SPARC include OD in their current policy priorities. SPARC’s 2017 plan discusses the need to ‘promote partner-ships that leverage resources to sustain crucial infrastructure supporting
Open Access, Open Data, and OER [Open Educational Resources].’7
More broadly, advocates of OD practices claim that they promote trans-parency, innovation, and efficiency in the public and private sectors. However, despite widespread support for data sharing, recent research has found that most academic researchers are not making their research data available to others and that more direct incentives are thus needed to encourage data sharing.8
Given the overall lack of strong data-sharing policies for scholarly journals that require authors to submit data with their article, OA journals can play a critical role in helping researchers openly publish the research data associated with their articles.9 Over fifteen years ago, OA journals
started a paradigm shift in publishing, and since they are already the best advocates for making research articles publicly available, they can do the same by pushing for OD practices. A data policy introduced in 2014 by the Public Library of Science (PLOS), one of the most influential OA publishers, provides a clear example of this potential role for OA journals. PLOS’s data policy directly connects the OA mission to the sharing of data: ‘Access to research results, immediately and without restriction, has always been at the heart of PLOS’ mission and the wider Open Access movement. However, without similar access to the data
underlying the findings, the article can be of limited use’ (emphasis original).10Given the potential for OA to support OD with an alignment of interests, it is important to evaluate the current state of data sharing in OA journals.
In the next two sections we review key elements in the data policies of journals that were identified in previous studies and we explain our sampling method and coding framework for evaluating the data policies of OA journals. In the sections to follow we present the results from coding our journal sample and review some exemplary OD practices in place elsewhere. Finally, we interpret the findings from our own sample and point readers to some resources that provide guidance on promoting OD.
key char acter istics of f ormal data citation and shar ing p olicies
Multiple studies have characterized and analysed the data-sharing and citation policies of (mostly) commercial journals in specific domains and disciplines. These studies share a number of key elements that can
be assembled into a discipline-agnostic rubric to evaluate the data policies of OA journals.
Life and Environmental Sciences
In the biomedical domain, several studies have examined journal policies related to data sharing and citation. From data collected in 2006, Piwowar and Chapman looked at high-level characteristics of data-sharing policies (i.e., policy was absent, weak, or strong) in journals that primarily publish articles on gene expression profiling, and their policies on sharing microarray data.11 At time of their study, they found that only 6 per cent of journals had an OA publishing model, so most of the journals they analysed were commercial. Their findings showed that the prevalence of mandatory data sharing was quite low, even for journals with very strict sharing requirements (persistent identifier or accession number made available prior to publication). Writing some years later, in 2012, Stodden, Guo, and Ma found that some journals in bioinfor-matics and life sciences were 1) making their requirements stricter by requiring data sharing as a condition for publication (barring exceptions) and 2) including a policy for sharing code, which would help with verifi-ability and allow others to reuse the data or replicate the results more easily.12With regard to OA journals, the authors opined that, with such small changes to OA policy from 2011 to 2012, OA did not appear to be driving changes in data and code sharing policies. A more recent study by Vasilevsky and co-authors in 2016, which used a rubric adapted from Stodden, Guo, and Ma, confirmed the results of earlier research, namely, that only a small number of biomedical journals required data sharing. They also found that OA journals ‘were not more likely to require data sharing than subscription journals’ and that most data-sharing policies lacked any specific guidance on how to make data available and reusable.13
With respect to incentives for authors, evidence that OD policies boosted citation was found by Piwowar and Vision in their 2013 study, in which they concluded that, at least for gene expression microarray data, there was a robust citation benefit from OD that had steadily increased from 2003.14
As for the environmental sciences, in a 2010 study, Weber, Piwowar, and Vision looked for the presence of data-sharing and citation policies in journals and discovered that some journals were also explicitly indicat-ing where researchers should deposit their data, as well as offerindicat-ing peer-review guidelines. However, they found that an overwhelming majority
of journals (seven out of every eight) ‘fail to provide explicit directions for sharing and citing data.’15 The authors concluded that funding agencies and journals could encourage researchers to share more if they required data submission as a condition for publication, provided re-searchers with some guidelines or best practices, and, most important, made researchers aware of the benefits of sharing, such as increased citation rates.
Social Sciences
Similarly, studies of data-sharing and citation practices in disciplines of the social sciences have looked at the prevalence of relevant policies in scholarly journals. However, unlike similar studies of other disciplines, these studies have also focused on the presence of replication policies, which, in addition to transparency, permit verifiability of published research.16
A study by Gherghina and Katsanidou in 2013 found, however, that only 18 of 120 political science and international relations journals had such a policy of replication.17They also found that while many journals
had mandatory data-sharing policies, not many of them provided specific guidelines for when and where to deposit data for long-term preservation and access. However, most of the journals they considered did provide authors with guidelines on what they should make available (raw data, documentation, code, etc.). In addition, a 2016 study by Key found that the strongest predictor of data availability was whether a journal had a policy mandating that data and/or code be made publicly available at the time of publication.18
In the field of economics, building on the work of US economist B. D. McCullough, Vlaeminck (2013) found that out of 141 journals, 20.6 per cent had a data availability policy (only one of these was OA) and even fewer (7.8 per cent) had a replication policy.19In addition to
study-ing the extent of such policies, Vlaeminck also looked at the quality of the available policies. The author found that the majority of journals with policies had one similar to journals published by the American Economic Association; those journals make data submission mandatory (whenever possible) and specify what data and files should be submitted to the journal prior to publication. Furthermore, although some journals had a replication policy, none had a dedicated replication section in the journal that would encourage authors to put in the effort required to provide replication data for their articles.
data a nd me thod s
For our own study we wanted to look at the prevalence of data policies in OA journals in particular, but not OA journals in any one academic domain. We thus used the DOAJ as a sampling source since it is an actively maintained and well-established OA journal index with clear inclusion criteria. Once we defined our population of OA journals, we conducted a simple random sampling of all scholarly journals in the DOAJ, removing any that were predatory,20 theoretical, non-empirical, or non-English. As a comparison to see if we would find a similar prev-alence in policies from a different sampling source, we also did a parallel random sampling of all active21 journals using OJS as their journal
management system. Approximately 10,000 OA journals worldwide use OJS.22We gathered our samples between January and May 2015, with a
targeted follow-up in March 2017.
We did a test of our initial sample of journals and made some ex-clusions based on three criteria. As shown in Table 1, a relatively small percentage were potentially predatory and/or non-empirical, but the ini-tial OJS sample yielded a substanini-tially larger percentage of non-English journals — reflecting the popularity of the OJS system internationally, and especially in the Global South. We eliminated journals that were non-empirical, non-English, or predatory and drew additional random samples to obtain a set of fifty randomly selected journals, stratified by database (twenty-five journals from the DOAJ and twenty-five journals from OJS) that met all selection criteria. All further analysis was performed using this set.
For our coding framework we included the data source, journal name, journal homepage HTTP Uniform Resource Identifier (URI), field of study, and whether the journal was questionable/predatory, in the English
tabl e 1. Distribution of selection criteria characteristics in the initial OA journal
sample that warranted exclusion
Characteristic DOAJ OJS
Questionable/predatory 16% 0%
Non-empirical 20% 18%
Non-English language 24% 60%
language, and empirical in its published research. We manually reviewed each journal’s website looking for relevant guidance by checking for sub-mission guidelines, journal policies, author guidelines, and similar terms; if we discovered no guidance or reference on the journal website, we coded the journal as having no data policy. To identify relevant sections on the website, we searched for the terms ‘data,’ ‘citation/cite,’ ‘share,’ ‘sharing,’ ‘replication,’ ‘reproducible,’ ‘repository,’ ‘supplemental materials,’ and ‘supplemental data.’
To code the strength of data-sharing policies, we adapted the five-point scale of Stodden, Guo, and Ma,23 and in addition we recorded whether a non-required but explicit policy actively encourages data shar-ing. We applied this same scale to measure the strength of data citing policies. To allow for comparison of our study with other data-sharing studies, we measured additional characteristics of data-sharing policies, including whether the place of deposit is specified (for comparability with Weber, Piwowar, and Vision24), when data sharing is required (for comparability with Gherghina and Katsanidou25), and whether there are
exemptions from the data policy (for comparability with Vlaeminck26).
For comparison with the broadly accepted Joint Declaration of Data Cita-tion Principles (see Altman and Crosas27), we measured additional char-acteristics of citation policies, including recommended/required location of data citation, recommended/required elements of data citation, and presence of example data citations. To support replication and analysis, these coded data and the full list of coding measures are permanently archived and available through the Harvard Dataverse, a public data repository.28
re sults
The representation of our fifty sampled OA journals across academic domains is depicted in Figure 1. The journals were distributed across all the sciences, but the domain of health science showed the greatest concentration.
In Figure 2 we summarize the frequency of journals with a data-sharing policy (left panel) and with a data-citation policy (right panel) across the sample. A number of patterns emerge from this frequency. The vast majority of OA journals sampled (74 per cent) do not have any data policy — even an implied one. Furthermore, only 6 per cent of these journals require data sharing. Moreover, the journals’ policies on data
citation are even weaker, as data citation is discussed by only 4 per cent of the journals sampled and never explicitly required.
To detect differences between our sampling sources, we compared the proportion of journals with any data policy in the DOAJ and OJS subsets. Data policies were approximately 25 per cent more frequent in DOAJ journals than in OJS journals. This difference was only marginally statis-tically significant (p<0.10) and should only be considered suggestive. We conjecture that the difference may be due to the greater proportion of international journals in the OJS database.
Table 2 provides more detail on the specific elements of journal data policies. The vast majority of policies we found in journals did not include requirements more specific than a general assurance of data availability. The most common specific policy details included an exam-ple of data citation (14 per cent) and specification of the place of deposit (8 per cent).
For comparison with previous studies, we aggregated data-sharing policies into three categories: strong (a stated policy with any require-ments), weak (any explicit or implied recommendations or referrals to the area, lacking specific requirements), and none. We then compared the proportion of OA journals in our sample with previous samples from five prior studies (Figure 3). Finally, we conducted a targeted follow-up in March 2017 for each of the journals in our samples and evaluated the 2017 policy using only this three-level coding (Figure 4).
figure 1. Distribution by fields in our OA journal sample
fig u r e 2 . Distributions of data-sharing policies (left) and data-citation policies (right) in the OA journals sample
Several surprises are in evidence from these comparisons. First, the overall level of strong data sharing in this sample is smaller than in most other studies with samples including commercial and/or area-specific journals. Second, OA journals in this sample are less likely to have strong policies than are the commercial and area-specific journals previously studied. Third, policies do not show signs of increasing in strength over time in our sampled journals.
compar ison w ith fl agship p olicies
As a point of comparison with the OA journals we sampled, we also identified and reviewed some exemplary data policies from flagship journals and major publishers (OA and commercial), disciplinary associations, scholarly societies, and funders of research grants.29
Several notable disciplinary associations and scholarly societies have put together helpful guidelines for journals wanting to adopt strong data policies. For example, in 2014 the American Political Science Asso-ciation (APSA) worked with publishers to jointly publish ‘Data Access and Research Transparency (DA-RT): A Joint Statement by Political Science Journal Editors’ to improve the quality of data-sharing, citation, and replication guidelines for authors submitting data to political science journals.30 In addition, the APSA Section for Qualitative and Multi-Method Research has created its own website to address research trans-parency in qualitative research.31One discipline-agnostic example is the
Transparency and Openness Promotion (TOP) Guidelines32 (partly in-spired by APSA’s DA-RT33), which were developed in 2015 by a group of journal editors and the Center for Open Science to help ensure the
tabl e 2. Most frequent specific elements in journals’ data policies
Element of policy
Percentage of sample (n¼50) Citation: example citations provided 14%
Sharing: place of deposit specified 8% Sharing: deposit is required 6% Citation: persistent identifier required 6% Sharing: replication data required 4%
figure 3 . Comparison of policy strength in the sample of OA journals, where strong sharing is indicated in blue and weak sharing in light red
fig u r e 4 . Change in policy strength from 2015 to 2017 77
availability as well as the reproducibility of research published in scholarly journals. As of March 2017, 757 journals and 63 organizations have become TOP signatories, many of them from OA publishers like Biomed Central and Ubiquity Press. They offer three different levels of transparency, from milder/entry-level to stronger requirements for authors. For data citation in particular, both the FORCE11 Joint Declaration of Data Cita-tion Principles (JDDCP),34 discussed by Altman and co-authors,35 and
Data Cite36 provide guidance on community-driven data-citation prac-tices that are both human understandable and machine actionable, requiring both a persistent identifier for the data set and a minimum amount of metadata to allow for attribution and reuse. In addition, several of the authors behind the JDDCP are currently working on a publisher-agnostic road map (now in preprint) with detailed instruc-tions to help with implementing JDDCP-compliant data citation.37
From the OA publisher perspective, several OA journals — all of which appear to focus on the sciences — have stronger data-sharing policies than those of journals in our sample. PLOS, as previously mentioned, has had a data policy in place since 2014, stipulating that authors must provide a ‘data availability statement’ to be published with the accepted article. While this does not, strictly speaking, require deposition of data in a publicly available repository, PLOS recommends data repositories and stresses that refusal to share data is grounds for rejection. Since 2013, BioMed Central has had an OD policy that requires authors to apply a Creative Commons CC0 waiver to all published data in their articles,38
which ensures that data are easier for other researchers to reuse. Giga-Science, an OA, OD, and open peer-review journal,39 has a data policy requiring authors to deposit their data in a publicly accessible repository, such as GigaDB, and requiring that any data cited in their article follow the JDDCP guidelines. Another noteworthy example is F1000Research, an open publishing platform that provides transparent refereeing of articles, which is unique among all the journals we reviewed in providing a specific list of requirements for data repositories that host data linked to any of its articles. Similar to the aforementioned exemplary journals, F1000Research requires that data be made available and includes detailed guidelines for data set submission.40
From the flagship commercial journals we reviewed, we found a few exemplary data policies. In the sciences domain, the publisher Nature (now Springer Nature41) has had a mandatory data-sharing policy since
2013, requiring that authors make their materials, data, code, and asso-ciated protocols promptly available to readers. In 2016, following the JDDCP, Nature introduced an updated policy with mandatory data cita-tion, which encourages including a persistent identifier — digital object identifier (DOI) — to the data set and the minimum information recom-mended by Data Cite.42 In contrast, Elsevier’s research data policy —
mindful of the challenges of sharing and making data accessible — does not make OD mandatory for publication but does encourages OD practices.43In the social sciences, several flagship journals have had long-standing data-sharing policies. In the field of economics, the American Economic Review, and by extension any of the journals from the American Economic Association, has a data availability policy in which authors are required to make their data available to reviewers.44 Several political
science journals take it one step further by mandating that authors share data not only with reviewers but also with the journal’s readers. For example, the journalPolitical Analysis requires that authors make repli-cation materials (data, code, and documentation) publicly available in the Harvard Dataverse prior to publication and that authors appropriately cite all ‘original and archival’ data (with citation examples given).45 The American Journal of Political Sciencegoes even further than most journals by having a replication and verification policy, which, as part of the publication workflow, states that all articles submitted must be replicable and will be verified by a third party to ensure this requirement is met prior to publication.46TheAmerican Journal of Political Sciencealso
pro-vides very detailed guidelines for what files and documentation authors should include to ensure that a study can be properly replicated.
In parallel, many funders have instituted strong data policies for research they fund or have put an emphasis on awarding grants to projects that look to make science more open. One notable example is the Laura and John Arnold Foundation, which awards Research Integrity grants47 to help support transparency, reproducibility, and rigorous research standards. Such grants have helped organizations such as the Center for Open Science push for more transparent and open research practices. More recently, the Bill and Melinda Gates Foundation updated its OA policy48 to include making any underlying data openly available
immediately with no embargo, as of 1 January 2017, and many of their data sets are already shared through the Harvard Dataverse. From the federal funder perspective, the National Institutes of Health have had a
data-sharing policy since 2003, a decade before the US Office of Science and Technology Policy (OSTP) memo, and they continue to strengthen it while also providing further guidance for researchers on what data they should share depending on the kind of research they produce (e.g., genomic data sharing).49
dis cussion
Given the revolutionary nature of the OA movement, which strives to make all research outputs open,50 the data-sharing policies of the OA journals we sampled are surprisingly weak. In comparison with studies of the data-sharing policies of commercial journals, OA journals are less or no more likely to have a policy, and much less likely to have a strong one.
There seems to be a stark contrast between the desire for openness of published results and the desire for openness of process and evidence. Approximately three-fourths of the OA journals we looked at have no data-sharing policies at all, even an implied one. Only 6 per cent have a formal requirement. Data-citation policies are even weaker, and are rarely mentioned explicitly. We observed that the policies in place lack specificity and do not provide guidance to the researcher for sharing data, including where to deposit, how to cite, and, where applicable, how to ensure the data can be replicated. According to McCullough, this is paradoxical: when one considers the OA movement’s ‘emphasis on making articles readily available, one would think that OA journals also would want to make data and code readily available.’51
What could be some of the possible reasons for such a low prevalence of data policies in OA journals? Excluding PLOS and some other excep-tions, OA journals may lack the resources or backing of older established journals and publishers, which would be helpful in pushing for strong data requirements from authors. In addition, some OA journals already have an article processing charge, so requiring data sharing would put an additional responsibility on authors. A few studies of commercial journals have found that the lower the Impact Factor of a journal, the less likely a journal is to have a data-sharing requirement.52 In a 2015
study of data policies for commercial and OA economics journals, Vlaeminck and Hermann lucidly noted that, in most cases, journals with strong data policies were among the top journals in their discipline and so could afford to implement such guidelines, ‘while a medium
or low-ranked journal planning to implement a DAP [data availability policy] could see a reduction in the amount of submissions it receives.’53 Given the relatively young age of most OA journals, it may take more time for them to establish a reputation.
Resources to Help OA Journals Adopt Data Policies
Conjectures aside, there are some notable resources for OA journals wishing to implement data-sharing policies. Although not specifically aimed at OA journals, several current projects and initiatives from the scholarly community at large are actively working to develop best prac-tices and guidelines on data sharing, data citation, and replication.
For example, the previously mentioned and widely endorsed TOP Guidelines provide discipline-agnostic instructions for journals wanting to adopt a data-sharing or replication policy.54OA journals can also use the exemplary data-sharing policies of the previously mentioned flagship journals and publishers such as PLOS, American Journal of Political Science, and Springer Nature. For guidance on data-citation policy, journals should look at GigaScience, Springer Nature, and Political Analysis, which provide exemplary policy text on their respective websites. For further examples of citation policies, the FORCE11 JDDCP website has a list of many publisher signatories.55
OA journals also require resources to enable them to enforce such policies. Journals using OJS (v.2+) as their journal management system can set controls to deposit their data in the Harvard Dataverse,56which
is open to any scholar regardless of institutional affiliation. Through OJS, journals can use the Dataverse plug-in, which adds a data deposition step to the article submission workflow.57This plug-in automatically submits, via SWORD API, research data associated with a journal article to a Data-verse repository and links the data back to the journal article.58Boilerplate
data policies have also been included in OJS to help journals get started.59
Journals can also directly partner with data archives and curated reposi-tories, which provide services for data management, curation, and/or verification for replication. Given the large and growing number of data repositories, journals can use re3data.org, a large registry of data reposi-tories, to help find (by content type or subject) a suitable archive that can help with managing research data. Some publishers, such as PLOS and Springer Nature, have also compiled lists of recommended reposi-tories, which are recognized and trusted within their respective com-munities, divided into domain-specific and generalist data repositories.60
In addition to data management support, some data archives also provide curation services such as ICPSR61for social science data (includ-ing codebooks and documentation) and Dryad62for data files associated with any published article in the sciences or medicine, as well as software scripts and other files. For data-verification services, one notable example is the American Journal of Political Science’s commitment to verifying and guaranteeing the reproducibility of analytical results using author-submitted replication materials. This replication policy has led to arrange-ments with the University of North Carolina’s Odum Institute for Re-search in Social Science to verify quantitative data and the staff at the Qualitative Data Repository at Syracuse University to verify qualitative analyses. Alternatively, if journals lack the resources to support this kind of verification, researcher Key describes a different option, which involves members of the scholarly community providing voluntary verifi-cation of the data sets that interest them.63This community-based model also provides an opportunity for students to learn through replication.
Extensions to Research
Since the original research we have reported here is preliminary in nature, further research is needed to develop more actionable conclusions. Such research could inform international scholarly societies and the OA community at large in coming up with creative ways to incentivize OA journals to start implementing strong data policies.
The design of our study only enabled sub-group comparisons between DOAJ and OJS journals. Conducting studies with larger sample sizes would allow sub-group analysis of characteristics such as discipline, age of the journal, and peer-review policy to determine if such characteristics of OA journals are associated with stronger data sharing. Since our sample covered 2015 to 2017, a longitudinal study could follow up in a few years’ time to check for any positive change in how OA journals (in the DOAJ and OJS) are implementing data policies. In addition, an analysis could be done of journals with existing strong data-sharing policies to see if they are enforcing these policies by looking for the presence of data cita-tions in their published articles.
Summary
Our preliminary research has shown surprisingly weak adoption of data policies by OA journals (excluding notable exceptions not in our
sample such as PLOS, Biomed Central, and GigaScience). There are, however, many freely available tools and resources for OA journals that would like to institute data-sharing, citation, and replication policies. In addition, there are plenty of opportunities to expand on the research we have done by focusing on particular characteristics of OA journals as they correlate with having a data policy.
acknowl ed ge ment a nd au thor statement
The writing of this report was supported by awards from the Sloan Foundation. We describe our contributions to the paper following a standard taxonomy (see L. Allen et al., ‘Credit Where Credit Is Due,’
Nature 508 [2014]: 312–3). Altman and Castro were the lead authors, taking responsibility for content and revisions. Both lead authors con-tributed to the conception of the report (including core ideas and state-ment of research questions), to the methodology, and to the writing through critical review and commentary. Altman authored the first draft of the manuscript and was responsible for the initial conceptualization and for the data analysis. Sheridan provided data collection and manage-ment. All authors contributed to review and commentary.
eleni c astrois OpenBU and ETD Program Librarian at Boston University. Eleni
has broad experience in technology and a passion for improving access to knowledge. She is an advocate for open data, open-source software, and open access practices that help increase interoperability and accessibility, while decreasing the digital divide.
merce` crosasis the Chief Data Science and Technology Officer at the Institute for
Quantitative Social Science (IQSS) at Harvard University. Together with the Director of IQSS, she leads the vision and strategic direction of all software projects at IQSS, including the Dataverse project for data sharing and archiving, the Zelig project for statistical analysis, and the Consilience project for text analysis. Her team includes data science specialists who offer training and consulting, as well as information scientists specializing in usability, data curation, and data management who provide expertise on these areas to all IQSS data projects.
alex ga rn et tis Research Data Management & Systems Librarian at Simon Fraser
University. His responsibilities include implementing and overseeing use of the university research data management system; working with faculty on creating data management plans to comply with new mandates and best practices; maintaining an open-source repository stack; collaborating with external partners (Harvard’s Institute for Quantitative Social Science and Stanford University’s mediaX) on related initiatives; and liaising with University IT on behalf of University Archives.
kas ey sher i da nis Research & Acquisitions Librarian at SunTrust. She received her MLIS from Simmons College in 2015. She previously worked as a research intern for the MIT Program on Information Science, focusing on open data and open access.
m ic a h altmanis Director of Research and Head/Scientist of the Program on
Information Science for the MIT Libraries at the Massachusetts Institute of Technology. Dr Altman previously served as a Non-Resident Senior Fellow at The Brookings Institution, as the Associate Director of the Harvard-MIT Data Center at Harvard University, as Archival Director of the Henry A. Murray Archive, and as Senior Research Scientist at the Institute for Quantitative Social Science. Dr Altman conducts work primarily in the fields of social science, information privacy, infor-mation science and research methods, and statistical computation — focusing on the intersections of information, technology, privacy, and politics. He also works on the dissemination, preservation, reliability, and governance of scientific knowledge.
note s
1. P. Suber, ‘Open Access to Research,’Open Society Foundations — Voices, September 12, 2012, https://www.opensocietyfoundations.org/voices/opening-access-research; P. Suber,Open Access(Cambridge, MA: MIT Press, 2012), http://bit.ly/oa-book. 2. A. B. Wagner, ‘Open Access Citation Advantage: An Annotated Bibliography,’
Issues in Science and Technology Librarianship (Winter 2010), doi:10.5062/ F4Q81B0W.
3. The Scholarly Publishing and Academic Resources Coalition (SPARC) Europe, ‘Setting the Default to Open: Open Access Citation Advantage (OACA) List,’ accessed May 23, 2015, http://sparceurope.org/what-we-do/open-access/sparc-europe-open-access-resources/open-access-citation-advantage-service-oaca/ oaca-list/.
4. T. A. Chavez, ‘Numeracy: Open-Access Publishing to Reduce the Cost of Scholarly Journals,’Numeracy3, no. 1 (2010): article 8, doi:10.5038/1936-4660.3.1.8. 5. According to the Open Knowledge Foundation’s Open Data Handbook(2015), ‘open data’ are defined as data that are open to others’ inspection, without any restrictions on reuse. See https://okfn.org/about/our-impact/handbook/. 6. P. Jones, ‘Are We at a Tipping Point for Open Data?,’ The Scholarly Kitchen
(blog), March 18, 2015, https://scholarlykitchen.sspnet.org/2015/03/18/are-we-at-a-tipping-point-for-open-data/.
7. SPARC, ‘2017 SPARC Program Plan,’ accessed September 1, 2017, https:// sparcopen.org/who-we-are/program-plan/.
8. B. Fecher, S. Friesike, and M. Hebing, ‘What Drives Academic Data Sharing?,’ PLOS ONE10, no. 2 (2015): e0118053.
9. S. Gherghina and A. Katsanidou, ‘Data Availability in Political Science Journals,’ European Political Science12, no. 3 (2013): 333–49; B. D. McCullough, ‘Open Ac-cess Economics Journals and the Market for Reproducible Economic Research,’ EAP: Economic Analysis and Policy39, no. 1 (2009): 117–26.
10. Liz Silva, ‘PLOS’ New Data Policy: Public Access to Data,’PLOS Blogs(blog), February 24, 2014, http://blogs.plos.org/everyone/2014/02/24/plos-new-data-policy-public-access-data-2/.
11. H. A. Piwowar and W. W. Chapman, ‘A Review of Journal Policies for Sharing Research Data’ (paper, ELPUB 2008 Conference on Electronic Publishing, Toronto, Canada, June 2008), http://precedings.nature.com/documents/1721/version/1/files/ npre20081721-1.pdf.
12. V. Stodden, P. Guo, and Z. Ma, ‘How Journals Are Adopting Open Data and Code Policies’ (presentation, International Association for the Study of the Commons, Louvain-la-Neuve, Belgium, September 12, 2012), http://hdl.handle. net/10535/9584.
13. N. A. Vasilevsky, J. Minnier, M. A. Haendel, and R. E. Champieux, ‘Reproducible and Reusable Research: Are Journal Data Sharing Policies Meeting the Mark?,’ PeerJ Preprints4 (November 2016): e2588v1, doi:10.7287/peerj.preprints.2588v1. 14. H. Piwowar and J. T. Vision, ‘Data Reuse and the Open Data Citation Advantage,’
PeerJ Preprints1 (October 2013): e1v1, doi:10.7287/peerj.preprints.1v1.
15. N. M. Weber, H. A. Piwowar, and T. J. Vision, ‘Evaluating Data Citation and Sharing Policies in the Environmental Sciences,’ Proceedings of the Association for Information Science and Technology47, no. 1 (November 2010): 1–2.
16. G. King, ‘Replication, Replication,’PS: Political Science & Politics28, no. 3 (1995): 444–52; McCullough, ‘Open Access Economics Journals’; J. Ishiyama, ‘Replica-tion, Research Transparency, and Journal Publications: Individualism, Com-munity Models, and the Future of Replication Studies,’ PS: Political Science & Politics47, no. 1 (2014): 78–83.
17. Gherghina and Katsanidou, ‘Data Availability in Political Science Journals.’ 18. E. M. Key, ‘How Are We Doing? Data Access and Replication in Political
Science,’ PS: Political Science & Politics 49, no. 2 (2016): 268–72, doi:10.1017/ S1049096516000184.
19. S. Vlaeminck, ‘Data Management in Scholarly Journals and Possible Roles for Libraries — Some Insights from EDaWaX,’LIBER Quarterly23, no. 1 (2013): 48–79. 20. We used Beall’s list of predatory publishers to flag journals as predatory initially
and then confirmed those through case-by-case analysis. Some of the predatory journals we found in 2014 may no longer be listed by the DOAJ today given its recent house-cleaning efforts to remove any journals that did not meet its stricter criteria. See http://www.nature.com/news/open-access-website-gets-tough-1.15674.
21. We define ‘active’ as an OJS journal having published at least ten articles within the last year.
22. J. Alperin, K. Stranack, and A. Garnett, ‘On the Peripheries of Scholarly Infra-structure: A Look at the Journals Using Open Journal Systems’ (presentation, 21st International Conference on Science and Technology Indicators, Vale`ncia, Spain, September 14–16, 2016), http://summit.sfu.ca/item/16763.
23. Stodden, Guo, and Ma, ‘How Journals Are Adopting Open Data and Code Policies.’
24. Weber, Piwowar, and Vision, ‘Evaluating Data Citation and Sharing Policies.’ 25. Gherghina and Katsanidou, ‘Data Availability in Political Science Journals.’ 26. Vlaeminck, ‘Data Management in Scholarly Journals.’
27. M. Altman and M. Crosas, ‘The Evolution of Data Citation: From Principles to Implementation,’IASSIST Quarterly37, no. 1–4 (2013): 62–70.
28. The record for our study archived with Harvard Dataverse has the following identifier: doi:10.7910/DVN/Y3WOOE.
29. This is not an exhaustive or comprehensive list of what exemplary data policies are currently out there. The Journal Research Data Policy Bank (JoRD) is work-ing toward developwork-ing such a database. See L. Naughton and D. Kernohan, ‘Making Sense of Journal Research Data Policies,’Insights29, no. 1 (2016): 84– 9, doi:10.1629/uksg.284.
30. A. Lupia and C. Elman, ‘Openness in Political Science: Data Access and Research Transparency: Introduction,’PS: Political Science & Politics47, no. 1 (2014): 19–42, doi:10.1017/S1049096513001716.
31. APSA Qualitative Transparency Deliberations, https://www.qualtd.net/. 32. Center for Open Science, ‘TOP Guidelines,’ accessed August 25, 2017, https://
cos.io/top/.
33. The DA-RT-informed TOP Guidelines, with which the authors agree, are avail-able at http://www.dartstatement.org/2015-cos-top-guidelines. For an alternative view, see J. C. Isaac, ‘For a More Public Political Science,’Perspectives on Politics 13, no. 2 (2015): 269–83.
34. Force11, ‘Joint Declaration of Data Citation Principles — Final,’ accessed August 26, 2017, https://www.force11.org/group/joint-declaration-data-citation-principles-final.
35. M. Altman, C. Borgman, M. Crosas, and M. Martone, ‘An Introduction to the Joint Principles for Data Citation,’ Bulletin of the Association for Information Science and Technology41, no. 3 (2015;): 43–5.
36. Data Cite, ‘Cite Your Data,’ accessed August 26, 2017, https://www.datacite.org/ cite-your-data.html.
37. H. Cousijin et al., ‘A Data Citation Roadmap for Scientific Publishers,’ bioRxiv Preprint Server (January 2017), doi:10.1101/100784.
38. I. Hrynaszkiewicz, S. Busch, and M. J. Cockerill, ‘Licensing the Future: Report on BioMed Central’s Public Consultation on Open Data in Peer-Reviewed Journals,’BMC Research Notes6 (2013): 318, doi:10.1186/1756-0500-6-318. 39. GigaScience, ‘Instructions to Authors,’ accessed August 26, 2017, https://academic.
oup.com/gigascience/pages/instructions_to_authors#PreparingSupportingInfor-mation.
40. F1000Research, ‘Data Guidelines,’ accessed August 26, 2017, https://f1000research. com/for-authors/data-guidelines.
41. Springer Nature, ‘Research Data Policies and Services,’ accessed August 26, 2017, http://www.springernature.com/gp/group/data-policy.
42. Nature, ‘Announcement: Where are the data?,’ September 7, 2016, doi:10.1038/ 537138a.
43. Elsevier, ‘Research Data,’ accessed August 26, 2017, https://www.elsevier.com/ about/our-business/policies/research-data.
44. American Economics Association, ‘Data Availability Policy,’ accessed August 26, 2017, https://www.aeaweb.org/journals/policies/data-availability-policy.
45.Political Analysis, ‘Instructions for Contributors,’ accessed September 1, 2017, https://www.cambridge.org/core/journals/political-analysis/information/instruc-tions-contributors#.
46.American Journal of Political Science, ‘AJPS Replication and Verification Policy,’ accessed August 26, 2017, https://ajps.org/ajps-replication-policy/.
47. Laura and John Arnold Foundation, ‘Research Integrity,’ accessed August 26, 2017, http://www.arnoldfoundation.org/initiative/research-integrity/.
48. Bill and Melinda Gates Foundation, ‘Open Access Policy,’ accessed August 26, 2017, https://www.gatesfoundation.org/How-We-Work/General-Information/Open-Access-Policy.
49. National Institutes of Health, ‘NIH Sharing Policies and Related Guidance on NIH-Funded Research Resources,’ accessed August 26, 2017, https://grants. nih.gov/policy/sharing.htm.
50. P. Suber, ‘Open Access Overview,’ last modified December 5, 2015, http://bit.ly/ oa-overview.
51. McCullough, ‘Open Access Economics Journals,’ 117.
52. H. A. Piwowar and W. W. Chapman, ‘Public Sharing of Research Datasets: A Pilot Study of Associations,’ Journal of Informetrics 4, no. 2 (2010): 148–56; Vasilevsky, Minnier, Haendel, and Champieux, ‘Reproducible and Reusable Research.’
53. S. Vlaeminck and L.-K. Herrmann, ‘Data Policies and Data Archives: A New Paradigm for Academic Publishing in Economic Sciences?,’ inNew Avenues for Electronic Publishing in the Age of Infinite Collections and Citizen Science: Scale, Openness and Trust, ed. B. Schmidt and M. Dobreva (Amsterdam: IOS Press, 2015), 145–55.
54. Although this has yet to materialize, there are a large number of non-English OA journals (as evidenced from our initial OJS sample) that would benefit from having access to translated versions of exemplary journal data policies and guidelines from TOP.
55. For the JDDCP signatories list, see Force11, ‘Endorse the Data Citation Principles,’ accessed August 26, 2017, https://www.force11.org/datacitation/endorsements. 56. The Dataverse Project is an open-source research data repository framework
sponsored by the Institute for Quantitative Social Science at Harvard University. See http://dataverse.org.
57. E. Castro and A. Garnett, ‘Building a Bridge between Journal Articles and Re-search Data: The PKP-Dataverse Integration Project, International Journal of Digital Curation9, no. 1 (2014): 176–84, doi:10.2218/ijdc.v9i1.311; M. Altman, E. Castro, M. Crosas, P. Durbin, A. Garnett, and J. Whitney, ‘Open Journal Systems and Dataverse Integration — Helping Journals to Upgrade Data Publication for Reusable Research,’ Code4Lib Journal 30 (2015), http://journal.code4lib.org/ articles/10989.
58. G. King, ‘An Introduction to the Dataverse Network as an Infrastructure for Data Sharing,’Sociological Methods and Research36 (2007): 173–99; M. Crosas, ‘The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data,’ D-Lib Magazine 17, no. 1/2 (2011), http://www.dlib.org/ dlib/january11/crosas/01crosas.html.
59. For boilerplate data policies in OJS, see http://projects.iq.harvard.edu/files/ojs-dvn/files/journaldatapoliciesguidelinestemplateojsdataverseplugin.pdf.
60. PLOS ONE, ‘Recommended Repositories,’ accessed August 26, 2017, http:// journals.plos.org/plosone/s/data-availability#loc-recommended-repositories; Springer Nature, ‘ Recommended Data Repositories,’ accessed August 26, 2017, http://www.nature.com/sdata/policies/repositories.
61. Interuniversity Consortium for Political and Social Research (ICPSR), ‘Data Management and Curation,’ accessed August 26, 2017, http://www.icpsr.umich. edu/icpsrweb/content/datamanagement/index.html.
62. Dryad Digital Repository has archived data sets from more than 560 journals. Its curation service and streamlined integration with journals is outlined in Dryad, ‘Frequently Asked Questions,’ last modified January 5, 2016, http://datadryad. org/pages/faq.