Libraries supporting e-Science
---…
combining cultures …
Pauline Simpson
National Oceanography Centre University of Southampton, UK
Digital Libraries à la Carte 2007 Tilburg University
Overview
Not
about Libraries supporting research
per se
e-Science
Open Data
Digital Repositories and Open Access
Vision of ‘joined up research’
Issues for e-Science
Combining cultures, connecting people
–
New roles for libraries
Whoo are you?
Librarian
Documentalist
Information Specialist Information Scientist Information Manager • Information Advisor
Data Librarian
Computer Specialist
Data Manager
Data Technician
Data Processor
Anoraks
Scientific evolution
Digital Libraries à la Carte 2007
Thousand years ago: Thousand years ago:Experimental Science
Experimental Science
- - description of natural phenomenadescription of natural phenomena
Last few hundred years: Last few hundred years:Theoretical ScienceTheoretical Science
- Newton’s Laws, Maxwell’s Equations …Newton’s Laws, Maxwell’s Equations …
Last few decadesLast few decades::
Computational ScienceComputational Science
- - simulation of complex phenomenasimulation of complex phenomena
Today:Today:e-Science or Data-centric Science
e-Science or Data-centric Science
- unify theory, experiment, and simulation - unify theory, experiment, and simulation
- requires data exploration and data mining- requires data exploration and data mining
(With thanks to Jim Gray)
‘
e-Science’ is a shorthand for a set of technologies to support collaborative networked science HPC and Information Management are key technologies to support this e-Science revolution
e-Science – not only
e-Science - data driven
(Natural and Physical Sciences)
e-Research
(includes e-Science and Arts & Humanities now joining in – ACLS – report 2004 – great opportunity to bring new analytic and interpretive power tohumanities and social science)
Cyberinfrastructure
(
NSF : Revolutionizing science and engineering through CyberInfrastructure, 2003 (Atkins Report) describes the new research environments in which advanced computational,
collaborative, data acquisition and management services are available to researchers through high- performance networks … more than just
hardware
and software, more than bigger computer boxes and wider network wires.
It is also a set of supporting services made available to researchers by their home
institutions as well as through federations of institutions and national and international disciplinary programs.
Key elements of e-Infrastructure
•
Research Network
•
International
authentication and
authorisation
•
OS Middleware
Engineering and Software
Repository
•
Access to international
Data Sets and Publications
•
Portals and Discovery
Services
•
Digital Curation and
Preservation
•
Remote Access to
Large Scale facilities
•
Interoperable
Institutional and
Thematic Repositories
•
Support for
International
Standards
•
Tools and Services to
support collaboration
•
International Grid
Early Vision of the Grid
J.C.R Licklider -
The Computer as a Communication Device(1968)
predicted the use of computer networks to support
communities of common interest and collaboration without regard to location. Foretold of graphical computing, point-and-click interfaces, digital libraries, e-commerce, online banking, and software that would exist on a network and migrate
wherever it was needed.
“Lick had this concept of the intergalactic network which he
believed was everybody could use computers anywhere and get at data anywhere in the world. …, but he had the same concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job. The vision was really Lick’s
originally.”
Larry Roberts – Principal Architect of the ARPANET
1990’s The Web
•
Tim Berners-Lee developed the Web at CERN as a tool for
exchanging information between the partners in physics
collaborations
•
It was the international particle physics community who
first embraced the Web
•
The first Web Site in the USA was a link to the SLAC
Library Catalogue (Stanford Linear Accelerator Center)
Web+
Scientists developing collaboration technologies that go far beyond the capabilities of the Web
– To use remote computing resources
– To integrate, federate and analyse information from many disparate, distributed, data resources
– To access and control remote experimental equipment
Capability to access, move, manipulate and mine data is the central requirement of these new collaborative science applications
– Data held in file or database repositories
– Data generated by accelerator or telescopes
– Data gathered from mobile sensor networks
Grid = set of services for sharing computing power and data storage. Use middleware to handle the complex authentication and scheduling, linking together applications, devices and computing resources as seamlessly as possible
Web 2.0
Web as a platform
– Regardless of operating system – open network enables collaboration and communication – serves + involves
– Social software all relevant to scientific research
• File sharing - much used documents, videos, slides
• Tagging (for later retrieval)
• Folksonomies (informal ontologies developed by community)
• Virtual Worlds eg Second Life - education, conferences
• Wikis (OpenWetWare), Blogs (Useful Chemistry)etc
– Library 2.0 (Web 2.0 + Library) – reinvention of the Library?
– Scientific Web?
• Web invented for sharing scientific communication – relatively few scientists have embraced the potential
• Barriers: social, psychological, technical
Some e-Science projects
Particle Physics
–
global sharing of data and computation
Astronomy
–
‘Virtual Observatory’ for multi-wavelength
astrophysics
Chemistry
–
remote control of equipment and electronic logbooks
Bioinformatics
–
data integration, knowledge discovery and workflow
Healthcare
–
sharing normalized mammograms
Environment
–
climate modelling
–
Undersea sensors
climateprediction.net
Since September 2003:
61,000 registered participants in 130 countries have…
Digital Libraries à la Carte 2007
Data Workbench
The ‘Data Deluge’ – science is turning to
e-Science
Digital Libraries à la Carte 2007
In next 5 years e-Science projects will produce more
scientific data than has been collected in the whole of human history
Some normalizations
:
The Bible = 5 Megabytes
Annual refereed papers = 1 Terabyte
Library of Congress = 20 Terabytes
Internet Archive (1996 – 2002) = 100
Terabytes
New high throughput devices, sensors and surveys
In terms of bytes - moving beyond giga 109 through tera
1012 onto peta 1015 and onto exabytes 1018
(petabyte (PB) 1015 = 10005 = quadrillion bytes)
Data-centric 2020 vision resulting from
Microsoft ‘Towards 2020 Science’
(2006) ………..Nature 440, (23 March 2006) |
Data gold-mine
‘Multidisciplinary databases also provide a rich environment for
performing science; that is, a scientist may collect new data,
combine them with data from other archives, and ultimately
deposit the summary data back into a common archive. Many
scientists no longer 'do' experiments the old-fashioned way.
Instead they 'mine' available databases, looking for new patterns
and discoveries, without ever picking up a pipette.’
Data Loss
e-Science is about improving the use and reuse of
research data
Huge amounts of research data cease to exist each
year.
– Hardware, software obsolescence
– Represents the loss of expensive intellectual resources; a huge opportunity cost for comparative and longitudinal research
Unintentional data loss in the sciences due to:
– lack of incentives to maintain them, or due to neglect (benign and otherwise – forget where or what it is!
– personal computers, Web sites, blogs, wikis, e-mails, digital photo and film etc.
Digital Preservation
“
Digital information lasts forever, or five
years, whichever comes first
”
Jeff Rothenberg,RAND 2001
Medium Practical Physical Lifetime Av. Time to obsolete
Optical (CD 5-59 years 5 years
Digital tape 2-30 years 5 years Magnetic disk 5-10 years 5 years
Preservation - Trusted Repositories
Preservation
– UK Digital Curation Centre: advice, tools & services
• RepInfo Registry – representation information adding meaning to data for preservation
– EU CASPAR Integrated Project http://www.casparpreserves.info/pages/1/index.htm
– EU Task Force on the Permanent Access to the Records of
Science http://tfpa.kb.nl/
– EU projects DPE and PLANETS
(leverage library and archive experience)
Long-term access: trust, responsibility, policy
– Trusted DR Audit Checklist for Certification Draft -
Research Libraries Group-NARA Taskforce 2005
• Defined criteria under 4 categories
– Organisation
– Functions, processes & procedures
– Designated community & usability
– Technologies & technical infrastructure
• Can these concepts be extended to data repositories?
Key drivers for e-Science
Access to Large Scale Facilities
and Data Repositories
– eg CERN, EBI , etc
Need for production quality, open
source versions of open standard
GRID middleware
–
eg
. OMI, NMI, C-Omega
Imminent ‘date deluge’ :
Particle physics, astronomy, bioinformatics
Data Loss/Preservation
Open Access movement
Open Data
Digital Libraries à la Carte 2007
Open Access to Data – European Research
Council
It is the firm intention of the ERC Scientific Council to issue specific guidelines for the mandatory deposit in open access repositories of research results -- that is, publications, data
and primary materials -- obtained thanks to ERC grants, as
soon as pertinent repositories become operational.
The ERC Scientific Council moreover hopes that research
funders across Europe will join forces in establishing common open-access rules and in building European open access
repositories that will help make these rules operational.
To facilitate this process for EU funded research, it
recommends that the European Commission sets up a task force including representatives from the various FP7
programmes … to develop an operational FP7 policy on open access by the end of 2007 …
http://ec.europa.eu/erc
Organization for Economic Co-operation and
Development (OECD)
Jan 2004 : Promoting Access to Public Research Data for Scientific, Economic, and Social
Development
Open access to, and unrestricted use of, data promotes scientific progress and facilitates the training of researchers
Open access will maximise the value derived from public investments in data collection efforts
The risk that undue restrictions on access to and use of research data from public funding could diminish the quality and efficiency of scientific
research and innovation
Dec 2006 : Recommendation of the Council concerning Access to Research Data from Public
Funding
– “each Member country, to develop policies and good practices related to the accessibility, use and management of research data”
Open Access to Data following OA for
Publications
1990’s – Subject Repositories (high energy physics,
economics, mathematics etc).
HEP (ArXiv) v.successful
Economics (RePEc) - successful
Limited success otherwise
1994 ‘Subversive proposal’ (Harnad)
2000’s - Institutional Repositories
Powered by Project funding, driven by the
Information Community (Libraries)
Libraries already supporting e-Science by
development of OA digital repositories of research publications – providing global and immediate
discovery and access to new research
Repository Growth
112 Repositories in 2002
2007
Over 900/1400 repositories
Over 20 Different Software Systems
ROAR
-Registry of Open Access
Repositories
Directory of Open Access Repositories(DOAR)
Digital Libraries à la Carte 2007
Repository Landscape
• Subject - arXiv, Cogprints, RePEc,
• Institutional – Universities, Research Institutes -Southampton, Glasgow, Nottingham (SHERPA), Max Planck
• National - DARE (all universities in the Netherlands), Scotland IRIS),
• National / Subject - OceanDocsAfrica
• International - Internet Archive ‘Universal’, OAIster (Harvester)
• Regional - White Rose UK
• Consortia - SHERPA-LEAP (London E-prints Access Project)
• Funding Agency – NIH (PubMed), Wellcome Trust (UK PubMed), NERC (NORA)
• Project - Public Knowledge Project EPrint Archive
• Conference - 11th Joint Symposium on Neural Computation, May 2004
• Personal – Peer to peer
• Media Type - VCILT Learning Objects Repository, NTDL (Theses), Museum Objects, Exhibitions
• Publisher – Journal archives
• Data Repositories - UK Data Archive; World Data Centre System; National
Support - Declarations on Open Access
Berlin Declaration in Support of Open Access 2003
(
50 + signatories)Germany: Fraunhofer Society, Wissenschaftsrat, HRK, Max Planck Society Leibniz Association, Helmholtz Association, German
Research
Foundation, Deutscher Bibliotheksverband France : CNRS, INSERM
Austria : FWF Der Wissenschaftsfonds
Belgium : Fonds voor Wetenschappelijk Onderzoek – Vlaanderen) Greece : National Hellenic Research Foundation
The IFLA Statement on Open Access to Scholarly Literature
and Research Documentation 2004
http://www.ifla.org/V/cdoc/open-access04.html
India, Australia, China, Africa, USA …
Scotland (2005) 16 Universities and Research Orgs Russell Group (UK Universities) 2005
Buenos Aires, British Columbia, Bethesda Statement (2003)
Budapest Open Access Initiative Feb 2002 (Soros Open Society Institute)
Support - Mandates for Open Access
Real mandates:
– Wellcome Trust
– RCUK (Research Councils UK)
– Universities – Southampton UK, Minho Portugal, NIT India, CERN …
Proposed mandates: public funders
– DFG, Germany; FWF, Austria; DARE network; Finland; USA; Sweden (?)
– Canada, Australia, India, S.Africa, Ukraine etc
NIH: Strengthening now very likely, Require not request
CURES Act: 6-month delay to OA permitted but must deposit at acceptance FRPAA: Mandatory deposit: all research funded by the
largest agencies Federal Research Public Access Act EU : Petition for guaranteed public access to publicly-funded
research results, Feb 2007
Open Access Research Repositories -
developing
• There will be many types of repository software and more powerful interoperability
protocols such as OAI-ORE
(need more than OAI-PMH to enable sharing and reuse)
• Thematic and Institutional Repositories contain not only full text versions of research papers but also ‘grey’
literature such as technical reports and theses
• In addition, repositories in the future will also contain data, images and software
UK JISC Projects –linking text and data
Source-to-Output Repositories
CITATION, LOCATION, And DEPOSITION IN DISCIPLINE &
INSTITUTIONAL
REPOSITORIES
New JISC Call for data
related projects 2007
More digital repositories more content
Publications, working papers, primary data,
audiovisual, images
Hardware in research labs will automatically
deposit experimental data
Desktop tools will deposit content
Rich data flow between networks of repositories
Rich data flows between repositories and other
components in information landscape
Federation of Digital Repositories
• Global
• Inter-disciplinary
• Cross-sectoral
• Multiple format types
• Data, publications, images… • e-Research Framework
• Defining common services + domain-specific services + repository services
From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/presentations/ jiie-jcs-2005
fusion layer ‘repository federator’
repository repository repository repository repository
portal portal portal portal portal
heterogeneous - metadata formats, content formats, identifiers, packaging standards
homogeneous - metadata formats, content formats, identifiers, packaging standards
Digital Libraries à la Carte 2007
Issues for e-Science
Digital Libraries à la Carte 2007
Macro and micro issues are
similar for
both text and data
repositories:
IPR and Licenses **
Distributed over many
researchers
Over National boundaries
Lack of awareness amongst
researchers
Cultural roots and resistance to
change
Funding
Policies
Standards
Interoperability
Vocabularies
**
Necessary to understand science
practices:
technical social and communicative
structure in order to adapt licensing
solutions to the practice of
e-Science
.
Arzberger, P. et al Science 2004 303. 1777 – 1778. DOI:
10.1126/science.1095958 Research Issues: information retrieval, information modelling, ontologies, authentication, systems interoperability, and policy issues
Combining Cultures
NSF Report “Long lived digital
data collections”. 2005
– “Data scientist” - hybrid skills
Facilitate collaboration
– “Multidisciplinary teams: computer scientists, domain scientists, digital library experts, statisticians/modellers
– Lessons learnt: e-Science Human
Factors Audit Report (to be published
Many of the same research issues
that the international digital
library
Combining Cultures
NSF Report :
“It is timely to seriously consider the role that digital libraries can and should play in this emerging e-Science computational
infrastructure”.
“Bringing the digital library and the emerging scientific
infrastructure worlds together can lay the foundation for providing truly integrated support for the entire process of science, from
formulation of research questions to the publication of the outcomes”.
“Specifically, the e-Science and digital libraries research communities need to work together to identify the
potential contributions of each of these communities for supporting the conduct of science and to articulate a
shared research agenda”
Calls for Combining Cultures for e-Science
needs
– EU Framework 7 – e-Research Infrastructure development
– UK – JISC – Report on future requirements for curation and
preservation
– Australia DEST – e-Infrastructures Reflection Group
includes CAUL (Council of Australian University Librarians) member - Interim Report
http://www.dest.gov.au/sectors/research_sector/policies_issues_reviews/ key_issues/e_research_consult/interim_report.htm
– UK –CURL/SCONUL Joint e-Research Task Force (2006)
– USA – ARL Libraries and Changing Research Practices
CURL/SCONUL Joint e-Research Task Force
Nov 2006
Digital Libraries à la Carte 2007
1. To raise awareness and understanding of the issues associated
with support of e-research in CURL and SCONUL member libraries and to stimulate discussion about them at institutional level.
2. To position CURL and SCONUL member libraries’ staffs to engage
with their local e-research stakeholders and to encourage them to make appropriate inputs at the research proposal stage.
3. To identify skills gaps in relation to support of e-research and to assist member libraries in addressing them.
4. To work with other e-research stakeholders, including the DCC, RLN and BL, to ensure that information management to support e-research is a high priority for future investment by funders.
5. To advise the CURL Board and the SCONUL Executive Board on
matters relating to the support of e-research.
6. To monitor, and report on, the Group’s progress against an action plan agreed annually by the CURL Board and SCONUL Executive Board.
Has your library engaged with the
e-Research agenda?
Management of the large datasets is likely
area of involvement.
Why are libraries not already involved:
–
Lack of foresight by librarians?
–
No e-Science funds for development of data
management in libraries – no call for projects
–
No Customer demand for data curation
WP 1 - Information and awareness
• Recruit network of e-research liaison contacts in HE library & information services; establish JISCmail list
• Survey of research activity, research support requirements, and e-research support work within HEIs (coordinate with WP2 needs analysis)
• Survey of the policy and practice of research funders in relation to data curation
• Disciplinary mapping of existing data curation services and gaps in provision
• Training & development needs analysis (link with WP1 survey activity on researcher support requirements)
• Design, commissioning and delivery of training and development events for HEI library & information services staff
• Maintain awareness of funding and bidding opportunities for the eRTF • Lead on bid drafting
• (with WP1) identify potential case studies/exemplar projects for development with DCC
Digital Libraries à la Carte 2007
WP 2 - Workforce development
Engaging ARL members in the development of new roles for
libraries as
e- Science infrastructure and service needs emerge at research
institutions and
promoting the contributions of research libraries in this arena.
Identify the skills needed as information professionals move into
the emerging
ARL Workshop Recommendations
NSF should fund projects in which university research libraries develop
deep archives of irreplaceable data, assuring descriptions of these data
at a minimal level (floor, not ceiling) and facilitating discovery and access to these data, according to prevailing community standards
NSF should partner with IMLS to train information and library professionals (extant and future) to work more credibly and knowledgably on data curation as members of research teams
NSF should foster the training and development of a new workforce in data science
Promote new curricula
Develop new programs
Link to training of domain scientists and information/library scientists
Digital Libraries à la Carte 2007
ARL Workshop on New
Shared Goals and Responsibilities
NSF –
NSF REPORT NSB-05-40, Long-Lived Digital Data
Collections Enabling Research and
Education in the 21st Century
Combining cultures, connecting people
Librarian
Documentalist
Information Specialist Information Scientist Information Manager • Information Advisor
Data Librarian
Computer Specialist
Data Scientist
Data Manager
Data Technician
Anoraks
What is a data scientist?
Data Scientist
• New skills requirements:• interdisciplinary
• quantitative
• data curation
• Integrate data management within the LIS curriculum
• Various approaches to develop and obtain digital curation skills
• Skills are there but often in discrete communities: we need to bring communities together
• Integration within the
curriculum: undergraduate students, library &
information science, archival studies, computer science
• Provide recognition and a career path for emerging ‘data scientists’
Digital Libraries à la Carte 2007
There must be a blurring of the boundaries between
previously well
defined silos that existed between information
managers and
Connecting People – how to
Encouraging partnerships - Inter-institutional partnerships
Institutional management support
Competencies – shared and developed
Common issues – authentication, metadata, ontologies, standards, IPR, licenses
New curricula - CPE
Share experience of Institutional repositories- libraries
fundamentally transforming research publication practices and scholarly communication forerunner of e-Science
Libraries involved in the research underlying the design of e-Science. Is there new research on Digital Libraries over the Grid
Funding Agencies – JISC, NSF, EU etc
– define project members from library and data communities
– promote the necessary international dialog between
Role for Libraries in digital data
universe
• Data as primary source material – Libraries :
– Will not be primary providers of large scale storage infrastructure required
– Will not provide the specialized tools to work with data
– Will not provide the detailed information about the data
– Unlikely to provide the solutions to digital preservation because of cost
• Can contribute library practices
:
• Collection policies (appraisal,
selection, weeding, destruction etc)
• Data clean up, normalisation, description and submission to repositories
• Data Citation
• Curation and Preservation
• Collaboration with researchers re scholarly communication ,deposit, education and training
• Innovative discovery and presentation mechanisms
•
Data as part of ‘enhanced
publications’ – Libraries:
– Well positioned to define standards for
• Taxonomies and ontologies
(for complex publications that include data)
• Persistent identifiers • Consistent description
practices
• Data structuring conventions • Interoperability protocols for
searching and retrieval
– Well positioned to exploit IR experiences
Role of Digital Libraries - IRs
Institutional repository is a key component of
e-infrastructure
– Mostly in library domain
– Access and preservation
– Digitization – data archaeology
– Interoperable with departmental, national, subject repositories
Data Curation
– Creation, metadata, preservation institutional intellectual assets
• but disparate data types and ontologies
Training Provision
– Research methods training for researchers
• Data creation, documentation, management
Advocacy, policy setting
– Cross disciplinary approach to key issues
• Expand OA agenda
Roles for Libraries
Institutional repositories accept ‘small’ datasets (size or
subject outside remit of Data Repositories). Data deposited in IR until accepted by data repository
Development of Regional or Discipline Repositories alongside IRs (singly or consortia) . Research libraries a natural home for content curation, (with funding)
Mapping of commonalities (eg metadata) across disciplines, maintaining ready interoperability
Management of metadata throughout a research project
Address conditional and role-based access requirements for scientific data
Support e-Science interface functions for local users
Adding Value: linking, annotation, visualization
Libraries and researchers can add value by creating ‘e-Science
Mashups’ - data needs to be re-used in multiple ways, on
multiple occasions and at multiple locations (reuse, remix)
The “mash-up”
Data from
FAO, WHO
+
Earth
Role for Libraries – build on the
strengths
Serving the needs of the scientific community
Systematically managing and making accessible information from heterogeneous sources
– Metadata, discovery mechanisms, portals, VRE , “Science World”
– Publication and Citation
– Selection and use of tools and resources
– Digitization of legacy content
– access management, copyright, IPR, Licenses
– Curation and Preservation advice
Provide specialist assistance to end-users
– expertise in user services and training
Exploit strengths in designing and implementing
innovative and useful e-Science information infrastructure.
Reduce the risk of "re-inventing the wheel".
S
W
O
T
Facing the future
Build Institutional Repositories
Develop leadership & vision for e-Research engagement
– Web 3.0?, Semantic web - publishing?
Review organisational structures
– Extend & re-profile the Faculty/Subject/Reference Librarian role? – Closer collaboration with Computing Services, and Data Services?
Provide eServices for data
– We “do” e-Learning so why not e-Research?
– Include in institutional digital asset management
Promote professional development of staff
– Awareness-raising activities, new skills
– Greater engagement, hybrid roles and hybrid teams
Build new partnerships, new business models , new
research projects
Facilitate Transformational Change in Libraries
Acknowledgements
With special thanks to
Prof Tony Hey,
Vice-President for Technical Computing. Microsoft Corporation ,(previously Director of UK e-Science Programme