• No results found

Data Sharing in Research: Four Key Concerns

N/A
N/A
Protected

Academic year: 2021

Share "Data Sharing in Research: Four Key Concerns"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Sharing in Research:

Four Key Concerns

Sabina Leonelli

Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology

University of Exeter @sabinaleonelli

(2)

The Epistemology of Data-Intensive Science

• Ensemble of methods, skills, infrastructures, technologies and

institutions that is widely viewed as advancing the practice and results of research

• Main questions in my work:

– Is data-intensive science a distinct and/or new mode of research, and how does it differ from other/previous approaches?

– Does it vary across research fields?

– How, where and when can it be implemented?

– What kind of mechanisms are needed to support it, and what are their long-term implications?

– Are those implications desirable from the scientific, social, economic, political and cultural viewpoints?

– Who should adopt it, who should support it and who should be responsible for its results?

• Methods: philosophical analysis based on study of historical roots and current data-intensive practices across fields and geographical

locations; engagement with data science policies (e.g. Global Young Academy Position Statement)

(3)

Data Sharing

Central mechanism and concern for data-intensive research

I here want to discuss four aspects that I view as crucial for

effective data sharing, which however raise serious

questions about how this is done and who is involved in

those efforts:

1.

Re-Use

2.

Sustainability

3.

Size

(4)

1. Re-Use

• Key motivation for data sharing: analysis of extensive datasets by

multiple stakeholders can lead to discovery in ways unforeseen by the original data producers

• For re-use to happen effectively, data sharing needs to be extensive, comprehensive, global, long-term

• This requires:

– Habitual data donation

– Adequate standards and guidelines for data formatting – Well-organised databases

– Sharing of related materials

Leonelli, S. (under review) Researching Life in the Digital Age: A Philosophical Study of Data-Intensive Biology. Monograph.

Leonelli, S. (2012) When Humans Are the Exception: Cross-Species Databases at the Interface of Clinical and Biological Research. Social Studies of Science.

(5)

2. Sustainability

• In space:

– Data donation requires manual labor and considerable shift in research habits  challenge to current credit system in research

– Sharing practices - how data are stored, disseminated, retrieved, visualised - require intelligent curation to a oid data du ps

– Data sharing entails specific visions of how data should be organised to encourage re-use  data curation crucially shapes those visions

• In time:

– Data infrastructures need to work in the long-term

– Standards/formats for data need to be continuously updated to keep up with shifts in technology and knowledge

– However, no clear structures to support these requirements

Bastow, R. and Leonelli, S. (2010) Sustainable digital infrastructure. EMBO Reports.

Leonelli, S, Smirnoff, N., Moore, J., Cook, C. and Bastow, R. (2013) Making Open Data Work in Plant Science. Journal for Experimental Botany.

(6)

3. Size

“ize a d o prehe si e ess of data olle tio s atter ig data ,

and yet:

Sampling remains crucial  importance of meta-data

• Correlation does not trump causality

Accuracy and reliability are key goals  importance of mechanisms for updating data collections and checking data quality

Leonelli, S. (2014) What Difference Does Quantity Make? On the Epistemology of Big Data in Biology. Big Data and Society.

Leonelli, S. (2013) Global Data for Local Science: Assessing the Scale of Data Infrastructures in Biological and Biomedical Research. BioSocieties.

(7)

4. Openness

Long-standing, crucial scientific value (Royal Society 2012), yet:

Problematic implementation: the idea of Open Data is relatively new. Research ethos and practices, career structures and incentives have not yet adapted.

Semantic ambiguity: Openness means different things to different

people, even within the same discipline. This is fruitful when explicitly discussed, but can lead to substantial misunderstandings if

unacknowledged.

Obstacles: social, ethical, legal, political, economic

– Data is not only evidence: they can be tokens of personal identity, commodities and/or political currency.

– Data sharing policies can be in tension with measures of excellence and impact (e.g. UK).

(8)

Timing: the timing of disclosure of results is perceived as crucial, and important to assess this on a case-by-case basis. Late release is

potentially as damaging as early release, depending on the context – including in cases of high commercial interest.

IP: confusion around which modes of intellectual property apply, and to whom (individual researchers, labs, projects, networks, universities, funders)

Universities and the state: confusion around the role of national governments in establishing and enforcing data sharing policies

Bias: databases mostly display outputs of English-speaking labs from widely reputed research traditions, which have funds to curate data and visibility to determine data formats. Involvement of

poor/unfashionable labs, scientists in developing countries and

non-s ie tinon-stnon-s re ai non-s ery lo , al onon-st al aynon-s at re ei i g e d.

Leonelli, S. (2013) Why the Current Insistence on Open Access to Scientific Data? Big Data, Knowledge Production and the Political Economy of Contemporary Biology. Bulletin of Science, Technology and Society.

Levin, N., Weckowska, D., Castle, D., Dupré, J. and Leonelli, S. (under review) How Do Scientists Understand Openness? The Impact of Open Science Policies on Biological Research.

(9)

Conclusions

• Extensive data sharing and Open Data policies have a potentially transformative impact on scientific research

• However, current data collections are partial and difficult to re-use by outsiders

• Effective data sharing requires

1. Shifts in research ethos and institutional structures: appropriate acknowledgment and retribution of donation and curation efforts

2. Investment on long-term data infrastructures across the globe, as well as venues to coordinate and continuously update common standards

3. Promotion of data curation as integral part of research, since being involved in developing databases is key to effective data re-use

4. Promotion of critical discussions about what counts as data and openness in each research community / centre / project, taking account of specific

ethical, legal and political concerns

• Be are of the ter shari g : it suggests, ut does ot e tail, re ipro ity a d

(10)

The research leading to these results has received funding

from the

European Research Council

under the European

Union's Seventh Framework Programme (

FP7/2007-2013)

/

ERC

grant agreement

n° 335925; and the

Leverhulme Trust

,

grant award RPG-2013-153.

More information on history, philosophy and social studies of

data science:

www.datastudies.eu

@DataScienceFeed

This work is licensed under the Creative Commons Attribution

4.0 International License. To view a copy of this license, visit

(11)

Abstract

Extensive data sharing and Open Data policies have a potentially transformative impact on scientific research. I discuss four aspects that I view as crucial for effective data sharing, which however raise serious concerns about how this is currently done and who should be involved in those efforts: (1) re-use; (2) sustainability; (3) size and (4) openness. I point out that effective data

sharing requires shifts in research ethos and institutional structures, as well as large investments on long-term data infrastructures across the globe,

including venues to coordinate and continuously update common standards. In the absence of such conditions, big data collections are destined to

remain extremely partial and difficult to re-use by outsiders. I conclude by

suggesti g that autio i the use of the ter shari g , hi h suggests, ut

does not necessarily entail, reciprocity and common ground among

stakeholders. My analysis is grounded on ongoing empirical research on the conditions under which researchers share data in the UK, Europe, USA and South Africa, and the scientific and social implications of data handling

practices around the globe. This research is currently carried out through an

E‘C “tarti g Gra t o The Episte ology of Data-I te si e “ ie e a d a Leverhulme Trust award exploring the digital divide. It has also been funded by the UK Economic and Social Research Council, the Max Plank Institute for the History of Science and the British Academy; and is closely aligned with the Global Young Academy Position Statement on Open Science, which I coordinated and co-wrote (

References

Related documents

CT scan showed a contrast enhancing isodense space occupying lesion with areas of calcification in right frontal cortex with sur- rounding edema ( Figure 1 ).. Craniotomy was done

The purpose of this study was to expand the exist- ing research on a flipped pedagogy within athletic training education by exploring mas- ter ’ s students ’ perceptions of a

The framework includes three artificial intelligence algorithms: an LS-FCM method is used to locate the GVs and UAVs, a U-PSO is used to solve the MINLP problem and provide high

Table 5 showed the results of multiple linear regressions regarding the influence of health status monitoring, reminder, club activity, home visit, education on the eating

The two main ‘filters’, the stimulus expectancy (defense), and the response outcome expectancies (coping, hopelessness or helplessness; i.e. positive, negative, and no

Half Time/Full Time Half Time Which Team to Score 1st Team To Score Last Team To Score 1st Team To Score Total Goals Aggregated Total Corners Which Team to Score Odd/Even Goals

Este artículo presenta la programación del segundo canal de la televisión pública española durante el primer año de la Corporación RTVE (2007) y el último del Ente Público

Mollinga,  Peter  P.  (2008).  The  Rational  Organisation  of  Dissent.  Boundary  concepts,  boundary  objects  and