711
Infobox 2: Data repository versus (clinical trial) registry
Registries: A clinical trial registry is a collection of records about clinical trials according to an agreed upon set of metadata (27). In registries accepted by the World Health Organization (WHO) and included in their International Clinical Trials Registry Platform (ICTRP), see 9.1, these records contain a minimum amount of information as defined in the WHO Data Set (25). As of 2019, this data set does not define or require attached artifacts or files. Confusingly, the WHO calls the database behind its Search Portal "Central Repository" (27), when it is in fact a registry.
Data repositories: In contrast, a data repository is a (digital) collection of digital datasets. Although not mandatory, the term nowadays implies a function to make these datasets findable, accessible, and reusable (5) and allows for longer term storage. Technically, a repository consists at least of a backend, a database to store metadata and information, and file server to store the datasets and other digital artifacts, and a web-based frontend that allows users to access the backend.
13.1. Principles
712According to the FAIR data principles, research data should be findable, accessible, 713
interoperable, and reusable (4,5), see section 2. Principle F3 mandates that "(meta)data are 714
registered or indexed in a searchable resource" (4). Although the principles do not explicitly 715
mention data repositories, principle F3 implies that research data should be stored in an 716
appropriate repository that follows all principles (5). The European Clinical Research 717
Infrastructure Network (ECRIN) data sharing statement is more explicit and states, that "data 718
and trial documents made available for sharing should be transferred to a suitable data 719
repository" (3) and we support this view. According to the FAIR data principles, research data 720
should be findable, accessible, interoperable, and reusable (4,5), see introduction. Principle F3 721
mandates that "(meta)data are registered or indexed in a searchable resource" (4). Although the 722
principles do not explicitly mention data repositories, principle F3 implies that research data 723
should be stored in an appropriate repository that follows all principles (5). 724
When selecting a repository, clinical researchers therefore should ensure that the repository 725
respects all FAIR data principles as a minimum. Although there are alternative initiatives like 726
CoreTrustSeal (28), the FAIR principles seem to be the most widely accepted. However, other 727
initiatives might evolve over time and become generally agreed standards. Given the lack of 728
generally agreed standards and certification processes, researchers will need to assess the 729
suitability of a repository for their purposes. 730
13.2. Time point
731Ideally, the appropriate repository is identified before writing the Data Management Plan (see 732
section 7) and then described therein. We assume that a sponsor/investigator uses the same 733
repository for all her/his projects so this should be feasible. 734
13.3. Identifying potential repositories
735So far, no repository exists that is specific for clinical research projects. Therefore, clinical 736
researchers need to identify an appropriate repository by themselves. Many institutions 737
involved in clinical research, like universities, currently maintain their own institutional 738
repository. This might be a good starting point in the evaluation process. Alternatively, 739
universities usually have a central contact point that supports researchers with issues related to 740
data sharing and open science in general (29). 741
For projects that were funded by extramural grants, there might be specific requirements for a 742
repository or even a specific repository mandated. For example, the Bill & Melinda Gates 743
Foundation maintains a list of approved repositories for publications published in Gates Open 744
Research (30). It is also expected that the planned European Open Science Cloud (EOSC) will 745
affect how data from projects funded by the European Union will be shared (31). Repository 746
registries maintain a searchable database of repositories. The largest one is probably r3data, a 747
collaborative project of large European academic institutions. R3data can help locating topic 748
specific repositories, which may be a better choice than an institutional repository because data 749
are more likely to be found in a search for that particular topic. Furthermore, Swiss academic 750
research institutions are currently developing a digital repository for long-term preservation and 751
publishing of research data, Olos (32), to support the publication needs of funders and help 752
researchers to manage research data. 753
Another choice might be Zenodo, which is based at CERN (European Organisation for Nuclear 754
Research). There are also for-profit/commercial repositories such as FigShare and Dryad, 755
although we do not explicitly recommend their use. 756
13.4. Selection criteria
757After having identified a set of potential repositories, a researcher will need some explicit 758
criteria to select a repository. We suggest an approach to structure this process which is based 759
on a report by the Digital Curation Centre in Edinburgh (33), shaped as a checklist (Table 10, 760
p. 54). Some items are very specific, others cannot be defined exactly and require adaptations 761
on a project basis and not all aspects might be assessable. 762
Another useful resource are the levels of digital preservation by the National Digital 763
Stewardship Alliance (34). 764
Box 8: Recommendations selection of repository
R21. Select a suitable repository, and include this information in the data management plan. Institutional repositories might be a good choice.
R22. Make data as open as possible, but as closed as necessary (FAIR) 765