E.V . E.V . O RMA T ION O RMA T ION
Measuring impact revisited
W ERKINF O W ERKINF O ÜR NETZ W ÜR NETZ W
Frank Scholze
ITIA T IVE F ITIA T IVE FStuttgart University Library
5th Workshop on Innovations in Scholarly Communication
TSCHE IN I TSCHE IN I
p
y
CERN, Geneva, 18.4.2007
DEUDEUOutline
E.V . E.V .1. Scope / definitions
O RMA T ION O RMA T ION1. Scope / definitions
2. Issues
3 Ongoing o k / p ojects
W ERKINF O W ERKINFO
3. Ongoing work / projects
ÜR NETZ W ÜR NETZ W ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
Impact
E.V . E.V . O RMA T ION O RMA TION
“Impact is any change resulting from an
activity, project, or organization. It includes
W ERKINF O W ERKINF O
y, p j
,
g
intended as well as unintended effects,
negative as well as positive, and long-term as
ÜR NETZ
W
ÜR NETZ
W
negative as well as positive, and long term as
well as short-term.”
ITIA T IVE F ITIA T IVE FFrom: Susan Wainwright. Measuring Impact - A Guide to Resources. NCVO, 2002
TSCHE IN
I
TSCHE IN
I
Measuring I
E.V . E.V .• What
O RMA T ION O RMA TION
– Impact vs. output
W ERKINF O W ERKINF O
• Publications are the quantifiable output of the
research process
ÜR NETZ
W
ÜR NETZ
W
• Electronic publications as a limited but rather
ITIA T IVE F ITIA T IVE F
Electronic publications as a limited but rather
well defined sub-field of research output
TSCHE IN
I
TSCHE IN
I
• Which publications represent „e-science“?
Measuring II
E.V . E.V .• How
O RMA T ION O RMA TION
– Choose indicators for publications (citation, usage,
„endorsement“, post-publication peer-review …)
W ERKINF O W ERKINF O
– Collect information
– Analyse / digest the information
ÜR NETZ
W
ÜR NETZ
W
Analyse / digest the information
ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
Indicators and methods
E.V
.
E.V
.
• Web Analytics = Usage
O RMA T ION O RMA T ION
y
g
– Logfile analysis
– Server side scripts (e.g. Linkresolver)
W ERKINF O W ERKINF O
– Page tagging, web bugs, cookies
– Hybrid methods
ÜR NETZ W ÜR NETZ W• Citation Analysis
E
i
ti
f th f
d
tt
f it ti
ITIA T IVE F ITIA T IVE F– Examination of the frequency and pattern of citations
in publications
– Impact factor (IF)
TSCHE IN
I
TSCHE IN
I
p
( )
A taxonomy of metrics
E.V
.
E.V
. authors
scholarship evaluation now
O RMA T ION O RMA T ION Google’s PR W ERKINF O W ERKINF O Google s PR Technorati
IF, citation counts
ÜR NETZ W ÜR NETZ W Amazon.com Collaborative filtering Flickr.org Slashdot.org Frequentist “counting” Structural “pattern” ITIA T IVE F ITIA T IVE F g g TSCHE IN I TSCHE IN I users novel methods of resource evaluation
LANL experiments for scholarship evaluation since 1999
DEUDEU From: Bollen, Johan and Van de Sompel, Herbert (2005) A framework for assessing the impact of units of
Measuring impact:
the elements (schematic)
E.V
.
E.V
.
Services
(ranking, recommender, collection building
the elements (schematic)
O RMA T ION O RMA T ION
Metrics
(structural, quantitative)
and management...)
s
e
W ERKINF O W ERKINF OData mining
Metrics (structural, quantitative)
analy
s
ÜR NETZ W ÜR NETZ WLink Resolvers
Logfile analysis
Citation analysis
Compare, aggregate
ect
ITIA T IVE F ITIA T IVE FLink-Resolvers
Collect
A
t
Logfile analysis
Collect
A
t
Citation analysis
Parse
E t
t
coll
TSCHE IN I TSCHE IN IAggregate
Aggregate
Extract
Aggregate
f
(
o
ose
DEUDEU
Basic set of documents
(Journals, repositories,
primary data etc.)
ch
Outline
E.V . E.V .1. Scope / definitions
O RMA T ION O RMA T ION1. Scope / definitions
2. Issues
3 Ongoing o k / p ojects
W ERKINF O W ERKINFO
3. Ongoing work / projects
ÜR NETZ W ÜR NETZ W ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
What are the basic items we want to count?
E.V
.
E.V
.
• Granularity (journals, articles, etc.)
O RMA T ION O RMA T ION
• Beyond “publications” (primary data learning
W
ERKINF
O
W
ERKINF
O
• Beyond publications (primary data, learning
objects, etc.)
ÜR NETZ
W
ÜR NETZ
W
• Identifying items (=referent deduplication)
ITIA T IVE F ITIA T IVE F
– Metadata based heuristics, persistent identifiers
TSCHE IN
I
TSCHE IN
I
• Identifying users (=agent deduplication)
– User and session identification, pseudonymization
Indicators and methods
E.V
.
E.V
.
• Log data processing
O RMA T ION O RMA T
ION
– Identifying / filtering robots and crawlers
– Proxies, caching
– “click spans”
W ERKINF O W ERKINF Oclick spans
– Grouping, isolating and aggregating useful usage patterns
– Comparison and validation to citation data
ÜR NETZ
W
ÜR NETZ
W
• Aggregation and scalability
– Technical issues
ITIA T IVE F ITIA T IVE FTechnical issues
• Different architectural frameworks: linking
server-based, other, scalability, pseudonymization
– Social/Policy issues
TSCHE IN
I
TSCHE IN
I
Social/Policy issues
• Networks of trust, sharing usage data
– How is data collated with external sources (publisher data
…)
DEUDEU
Confidence in data and stats?
E.V . E.V .• Data validity
O RMA T ION O RMA TION
– Usage definition, recording and representation, quality
benchmarks, falsification issues
W ERKINF O W ERKINF O
• Fraud and data manipulation
– Standardized and transparent collection and aggregation of
ÜR NETZ
W
ÜR NETZ
W
Standardized and transparent collection and aggregation of
usage data
– Auditing standards and procedures
ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
Legal issues, metrics and services
E.V
.
E.V
.
• Privacy and other legal issues
O RMA T ION O RMA T ION
y
g
– User and session identification, pseudonymization
– Legal implications of log storage and aggregation
W ERKINF O W ERKINF O
• Metrics and services
ÜR NETZ
W
ÜR NETZ
W
– Investigating and tailoring metrics
– Interfaces with existing bibliometric products
– Definition of end-user services
ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
Background Issues
E.V
.
E.V
.
• What is the context for talking about impact?
O RMA T ION O RMA T ION
g
p
• Why is promoting enhanced and alternative
metrics important?
W ERKINF O W ERKINF Ometrics important?
– as a licensing issue: consortia and publishers want to
investigate value for money issues
ÜR NETZ
W
ÜR NETZ
W
– as a research policy issue:
• scholarly communication changes rapidly
i
i
OA
i
ITIA T IVE F ITIA T IVE F• impact in new OA environment
• trend analysis (“in-progress” communication)
• assessment and evaluation of research
TSCHE IN
I
TSCHE IN
I
• assessment and evaluation of research
Outline
E.V . E.V .1. Scope / Definitions
O RMA T ION O RMA T ION1. Scope / Definitions
2. Issues
3 Ongoing o k / p ojects
W ERKINF O W ERKINFO
3. Ongoing work / projects
ÜR NETZ W ÜR NETZ W ITIA T IVE F ITIA T IVE F TSCHE IN I TSCHE IN I DEUDEU
Knowledge Exchange Workshop
on Institutional Repositories
E.V . E.V .on Institutional Repositories
P
ti l d fi iti
f ’
’
O RMA T ION O RMA T IONPractical definition of ’usage’
Raw format for exchanging usage events
Items to be counted
Lobby COUNTER to add article level stats
W
ERKINF
O
W
ERKINF
O
Lobby COUNTER to add article level stats
Other academic “output types”
Standard reports
Agree on a small set of standard useful statistical reports that
ÜR NETZ
W
ÜR NETZ
W
Agree on a small set of standard useful statistical reports that
repositories should produce
Policies for stats
Compliance with local laws on e.g. privacy
ITIA T IVE F ITIA T IVE F
p
g p
y
Enhance SHERPA policy tool
Collection and aggregation
Normalisation
TSCHE IN
I
TSCHE IN
I
Specify issues of aggregation and deduplication for later study
Collation with external sources
Aggregating COUNTER stats at consortium level
I
ti
t SUSHI i t
bilit
ith
it i
d OAI PMH
DEUDEU
Investigate SUSHI interoperability with repositories and OAI-PMH +
OpenURLContextObjects
Ongoing work
E.V . E.V .g
g
• LANL
bX (with CalState ExLibris)
O RMA T ION O RMA T ION
– bX (with CalState, ExLibris)
– MESUR
– …
W ERKINF O W ERKINF O• UK
ÜR NETZ W ÜR NETZW
– University of Southampton (IRS, EPStats …)
– University College London
–
ITIA T IVE F ITIA T IVE F…
• Germany (DINI / DFG)
TSCHE IN I TSCHE IN IGermany (DINI / DFG)
– Göttingen State and University Library
– Stuttgart University Library
DEUDEU
– Computer and Media Service Humboldt University Berlin
Germany
E.V . E.V .y
W k h
E h
d
d Alt
ti
M t i
O RMA T ION O RMA TION
• Workshop on Enhanced and Alternative Metrics
of Publication Impact, 20–21 February 2006,
b ld
l
W ERKINF O W ERKINFO
Humboldt University Berlin
ÜR NETZ
W
ÜR NETZ
W
• Cluster of proposals to the DFG
– Network of certified open access repositories
2y
9
ITIA T IVE F ITIA T IVE F
– Network of certified open access repositories
2y
9
– Usage statistics
demonstrator
9
Distributed open access reference citation service
TSCHE IN
I
TSCHE IN
I
– Distributed open access reference citation service
demonstrator
9
Germany II
E.V . E.V .y
G
ll ti
i t f
i ht h
O RMA T ION O RMA TION
• German collecting society for copyright charges
(VG Wort) has started a project on statistics
(
S)
W ERKINF O W ERKINF O(METIS)
ÜR NETZ W ÜR NETZ W• IFABC (International Federation of Audit
Bureaux of Circulation) standards for robot
ITIA T IVE F ITIA T IVE F
)
detection and click spans (30 min)
TSCHE IN
I
TSCHE IN
I
Distributing VG Wort IDs
E.V . E.V . O RMA T ION O RMA T IONDoc comes with or gets
VG Wort registers ID
if not known
W ERKINF O W ERKINFO
Doc. comes with or gets
new VG Wort counter ID
e.g.
490271fca856ba65cc
VG Wort
ÜR NETZ W ÜR NETZ W
DNB
ITIA T IVE F ITIA T IVE FRepository ingest
Open access
Repository
TSCHE IN I TSCHE IN I
ID is made available
through OAI PMH
DEUDEUthrough OAI-PMH
Collecting data
E.V . E.V .Repository
User
O RMA T ION O RMA T IONRewrite Module
PDFaddressRepository
User
W ERKINF O W ERKINF ORewrite Module
mod_rewrite and
PHP
PDFaddresshttp://vg00.met.vgwort.de/na/490271fca856ba65cc?l=PDFaddress
ITIA T IVE F ITIA T IVE F
Log
Repository
Log
harvester
VG Wort
TSCHE IN I TSCHE IN IIFABC metrics
DEUDEU COCOCOMeasuring impact:
the technical elements for usage data
E.V . E.V . OpenURL ContextObjects Log Repository Link
the technical elements for usage data
O RMA T ION O RMA T ION Link Resolver L CO CO CO Aggregated W ERKINF O W ERKINF O Data Metrics Log DB Log Repository Link Resolver CO CO CO Aggregated
Usage Data e.g.
ÜR NETZ W ÜR NETZ W Data Mining Filtering Services Log Log harvester (Service Provider) ITIA T IVE F ITIA T IVE F Filtering Aggregated logs g Repository CO CO CO Log DB Webserver -Log Aggregated Usage Data e.g. TSCHE IN I TSCHE IN I og g Rewrite OpenURL ContextObjects SUSHI Normalise COUNTER DEUDEU e e module
Normalise (optional) -> Robots, psydonymization or SUSHI
Based on: Bollen, Johan and Van de Sompel, Herbert, OAI4, Geneva
Conclusion
E.V
.
E.V
.
• Infrastructure for collecting and aggregating usage data
O RMA T ION O RMA T ION
g
gg g
g
g
is conceptually available, has to be deployed and
implemented in practice on a large scale
W ERKINF O W ERKINF O
• Investigating metrics for different needs and purposes
ÜR NETZ
W
ÜR NETZ
W