• No results found

here

N/A
N/A
Protected

Academic year: 2020

Share "here"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Evaluating the Scientific Impact of XSEDE

Fugang Wang

Indiana University Bloomington, Indiana, U.S.A.

Gregor von Laszewski

Indiana University Bloomington, Indiana, U.S.A.

[email protected]

Timothy Whitson

Indiana University Bloomington, Indiana, U.S.A.

Geoffrey C. Fox

Indiana University Bloomington, Indiana, U.S.A.

Thomas R. Furlani

Center for Computational

Research

University at Buffalo, SUNY 701 Ellicott Street Buffalo, New York, 14203

Robert L. DeLeon

Steven M. Gallo

University at Buffalo, SUNY

Buffalo, New York, 14203

ABSTRACT

We use the bibliometrics approach to evaluate the scientific impact of XSEDE. By utilizing publication data from vari-ous sources, e.g., ISI Web of Science and Microsoft Academic Graph, we calculate the impact metrics of XSEDE publica-tions and show how they compare with non-XSEDE

pub-lication from the same field of study, or non-XSEDEpeers

from the same journal issue. We explain in detail how we re-trieved, cleaned, and curated millions of related publication entries. We then introduce the metrics we used for evalu-ation and comparison, and the methods used to calculate them. Detailed analysis results of Field Weighted Citation Impact (FWCI) and the peers comparison will be presented and discussed. We also explain how the same approaches could be used to evaluate publications from a similar organi-zation or institute, to demonstrate the general applicability of the present evaluation approach providing impact even beyond XSEDE.

Categories and Subject Descriptors

H.4 [Information Systems Applications]: Miscellaneous;

D.2.8 [Software Engineering]: Metrics—complexity

mea-sures, performance measures

General Terms

Theory, Measurement

Keywords

Scientific impact, bibliometrics, h-index, Technology Audit Service, XDMoD, XSEDE

1.

INTRODUCTION

Corresponding Author.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

PEARC18 ...2018,USA

Copyright 2018 ACM TBD...$15.00. http://dx.doi.org/TBD ...$15.00.

To identify the impact ofscientific advancements enabled

by enhanced cyberinfrastructure, it is important to conduct a comprehensive analysis of achievements that can be at-tributed to the use of the advanced infrastructure, such as that provided by the Extreme Science and Discovery Envi-ronment (XSEDE) [5, 18].

We use the bibliometrics approach to evaluate the scien-tific impact of XSEDE. By acquiring related publication and citation data from multiple sources we calculate various met-rics that show the impact of the publications and how they compare to their non-XSEDE peers that were published in the same journals, or in the same field of study. By process-ing millions of publication data entries we normalized the ci-tation count by field of study. This essentially eliminates the problem that different fields of study have different publica-tion characteristics. We introduced a novel [19] method to compare the target publications group with their peers pub-lished in the same publication venue to further show how the target publications group performs compared to their peers within the same publication venue.

2.

RELATED WORKS

Bibliometrics based analysis has been the most commonly used method to evaluate the research impact of an individ-ual, a research group, or even an organization. Publication count and citation count based metrics provide an effective way to show the quantity and quality, and the impact of scientific research activities. For instance, it was used to evaluate the quality of research in the United Kingdom [17, 16]. Most popular college/university rankings use citation based bibliometrics as an important factor to evaluate the quality of their research, e.g., the overall publication count and citation count in a certain year or a year range; the num-ber of papers published in certain top journals; the numnum-ber of highly cited papers; rating the citation count of a paper or its percentile ranking, etc.

Compute Canada, a virtual organization similar to XSEDE, also uses a bibliometrics based analysis to evaluate the im-pact of their research [1].

Some previous limited work studied the impact of Tera-Grid [6], the early version of XSEDE, by analyzing the pub-lications for one specific research allocation quarter, which involved a very limited number of researchers and

publica-tions. Our work is unique in that it provides acomprehensive

(2)

ap-proach such as Field Weighted Citation Impact and journal publication-based peers comparison.

In addition to the more intuitive direct metrics of publica-tion and citapublica-tion count, some other derivative metrics such as h-index [13] and g-index [11] combines both publication and citation count to generate one metric. I10-index [2] in contrast measures only the count of those publications that received at least ten references by other publications. In our evaluation we calculated such metrics for various XSEDE research entities to show the impact and comparison of the entities on the same level, e.g., individual, project, research field of study, organization, etc. The results are presented on the XDMoD scientific impact portal [4].

Usage based metrics [7, 8] have also been proposed in-cluding metrics such as views and downloads, instead of the more formal citations of publications. However the appli-cability of this approach may be limited because the usage data may not be available from a publisher, or different pub-lishers may have different criteria to measure the usage data. Thus, it would create an inconsistent comparison for papers published by different publishers. For this reason we did not present such metrics in this paper.

3.

METHODOLOGY

We first introduce the methodology we use in the biblio-metrics analysis to evaluate the scientific impact of XSEDE publications. This includes the specification of the dataset and data sources used in the analysis, the approaches to define and calculate the various metrics, and the informa-tion about the sophisticated software and service framework developed to facilitate this to XSEDE unique and compre-hensive evaluation study.

3.1

Dataset and data sources

Several data sets and sources are involved in this study which includes XSEDE publications, Microsoft Academic Graph (MAG), and Web of Science Data. For all data we used the same time period between 2005 and 2016, which is the same as for the XSEDE publications. We describe the basic features of the datasets next.

XSEDE publications. This dataset includes publications from XSEDE as well as from TeraGrid. This data is col-lected from two sources. One is from the user-submitted data from the XSEDE user portal; another is from the past TeraGrid/XSEDE project reports submitted to NSF. For the latter, we extracted the publications appendix from the reports and then parsed and curated with significant ef-fort the publication records text, before putting them into a structured database. There were over 20 thousand raw entries.

Microsoft Academic Graph (MAG). This dataset was retrieved with the API provided by Microsoft. The data was then curated and cleaned and put into a MongoDB database. This dataset has about 58 million entries.

ISI Web of Science (WoS). For the publication venues with at least 10 XSEDE papers appearing in them, we re-trieved all the publications published in them to facilitate the peers comparison study. This dataset has about 2 mil-lion entries.

3.2

Field Weighted Citation Impact Analysis

The field Weighted Citation Impact (FWCI) metric is pro-posed as one of the snowball metrics [10]. It calculates the

average citation count of a target group of publications based on their field of science, and then compare that with the av-erage citation count of the whole field of science in the same time period. The result is a ratio

F W CI=avg(CCgroup)/avg(CCf ield)

A FWCI value greater than 1 indicates that the pertinent

publication group had more citations than the expected value of the field of science, while a value less than 1 indicates that the average citation count that the group received was less than the expected value for the applicable field of science. In this study we introduced the following process to calculate

theFWCI values for the XSEDE publications.

1. Query every raw XSEDE publication by title against the MAG data set, and verify the matching ones by checking other properties such as published year. After this process we identify the verified matching records in the MAG for all valid XSEDE publications. During this process we use elasticsearch [12] to improve both the accuracy and the performance of the query. This is important because of the size of the dataset.

2. For each of the 58 million MAG data records, we use the assigned field of study values, along with other

re-lated data form MAG, totraceupward to the top levels

of the hierarchical fields. This process narrowed down the 30k different assigned fields of study to 19 overall

top levelfields of studyas defined in the MAG dataset.

One thing to note is that each publication is assigned to multiple science fields in the original publication records, and the final top level science field category of a publication may not be unique either. However as a lot of research publications are themselves multi-disciplinary we think that such results are valid and acceptable.

In the following analysis we counted a publication in all the top level science fields that we found following this tracing process.

3. Once we have each and every publications in the MAG dataset, we can calculate the average citation count by each top level field, for all the MAG publications and XSEDE publications respectively. Following that we can calculate the ratio to get the FWCI values.

3.3

Metric for Journal Publication-based Peer

Comparison

An importnat achievement is our novel and sophisticated

Journal Publication-based Peer Comparison (JPPC)metric as discussed in [19]. We used the following process to obtain the data needed for the analysis.

(3)

2. From this verified publications list, we find the sub-set of all publication venues with at least 10 XSEDE publications. For each of the publications published in these venues we retrieve from WoS the extended meta-data to get the exact volume and issue number of the publication venue. The reasons why we chose a thresh-old value of 10 to identify a publication venue subset are:

(a) This ensures the statistical significance of the anal-ysis results.

(b) This eases the data retrieval work substantially.

While we have~1400 distinct publication venues

identified from all the verified XSEDE publica-tions, the subset when we use 10 as the minimum number of publications appearing in the venue

was reduced to~120 publication venues.

(c) Using this criterion, the number of XSEDE pub-lications in the peers comparison was about five thousand, or about 56% of all the verified ones. This represents a good portion of all the data.

3. For all ~120 publication venues, we retrieved all the

publications data published in them during the same time period as the TeraGrid/XSEDE publications (2005-2016).

Based on this data, we can establish suitable comparison peer groups, which are based each on a single journal issue (or journal volume when no issue data available for some publications) that an XSEDE publication appeared in. For each comparison peer group, we rank the citation count of each publication (including the XSEDE ones and the peers). The calculated percentile ranking values serve as the basis of the peer comparison study. The comparison is between publications that we identified as XSEDE papers and those that were not.

To apply the percentile ranking to the field of science of XSEDE publications among the journal issues where each publication was published, we aggregate them based on the project field of science data obtained from the XSEDE cen-tral database (XDcDB). These XSEDE fields of science are self-reported by the researchers. We then calculate the av-erage and median percentile rank for each field of science (FOS).

3.4

Software Architecture Supporting the Study

We have developed a sophisticated software framework sup-porting the study, which includes data acquisition, cleanup, processing and presentation. The framework is based on a distributed set of software services. This service-oriented framework is integrated as part of a layered architecture consisting of components for:

• A data layer that retrieves publication and citation

data from external sources. This includes data from the ISI Web of Knowledge; Microsoft Academic Graph; Google Scholar, and very importantly the NSF award database.

• Business logic layer that deals with:

– parsing and processing while correlating data from

various databases and services, such as the XSEDE central database (XDcDB).

– a metrics generation and analysis system for

dif-ferent aggregation levels – users, projects, organi-zation, field of science.

• a presentation layer using a lightweight portal in

addi-tion to exposing some data via a RESTful API [20].

Due to the use of the Software as a Service (SaaS) approach,

our framework isexpandableas we are able to integrate new

services and data resources as required. Hence our frame-work can be adapted to other resource providers as demon-strated in [19]. Obviously, adaptation could mean that we have to change the bibliometric data, which could mean that we need to integrate new data sources and curation services spending significant effort to integrate such data.

3.4.1

Service Integration into XSEDE and XDMoD

Our current framework for XSEDE includes services that are motivated by our initial findings from the XSEDE

bibliomet-ric data. A RESTful service is integrated into the XSEDE

User Portal as part of the publication discovery service. The various impact metrics of different levels of XSEDE entities - person, project, organization, field of study - as

well as part of the analyses are available on the XDMoD

scientific impact portal [4].

4.

RESULTS AND DISCUSSIONS

In this paper, we discuss results specifically targeting the analysis of data related to XSEDE.

4.1

Field Weighted Citation Impact Metrics

First we show the calculated FWCI values in Figure 1. The plot lists the FWCI for the top-level fields of science as de-fined from the MAG data. Each data point also has the number of XSEDE publications as well as the number of all publications in that field. The red vertical line indicates the point at which FWCI=1. Figure 1 shows all fields but one (political science, with only 3 publications) that had FWCI values greater than 1, with the majority fields having much higher values.

In Figure 2 we display the same data but sort it based on the number of XSEDE publications in the field. This emphasizes the FWCI for the fields that the majority of the XSEDE publications fell within.

Figure 3 compares the expected citation count based on all publications in a given field of science with the actual aver-age citation count for XSEDE publications in each field of science, while including the FWCI values at the same time. The plot was sorted based on the number of XSEDE publi-cations in the field. This again indicates that XSEDE pub-lications received much higher citation counts and implies a higher scientific impact than their non-XSEDE peers. In Figure 4 we show the extra citations XSEDE publications receive for each field, compared to the expected overall field of science value. Figure 4 also indicates how much impact that access to XSEDE resources has on each individual field of science.

(4)

political science environmental science psychology geology chemistry mathematics medicine biology physics materials science ALL economics computer science business sociology engineering philosophy history geography art

FWCI

0 5 10 15 20 25 30

30.5 ( 62 / 2,360,681 )

0.1 ( 3 / 1,845,790 ) 1.9 ( 25 / 654,264 ) 2.2 ( 454 / 4,747,669 ) 2.3 ( 1,205 / 2,352,750 ) 2.3 ( 6,691 / 7,505,924 ) 2.6 ( 2,439 / 5,981,276 ) 2.7 ( 1,468 / 12,061,137 ) 2.8 ( 3,461 / 9,042,998 ) 2.9 ( 8,126 / 7,025,455 ) 3.3 ( 2,385 / 4,001,730 )

4.3 ( 15,042 / 56,628,566 ) 4.4 ( 390 / 6,825,909 ) 4.4 ( 4,498 / 12,422,929 ) 4.5 ( 19 / 1,754,885 )

6.5 ( 293 / 5,977,362 ) 7.3 ( 1,451 / 11,691,074 )

7.5 ( 82 / 2,365,079 ) 7.8 ( 52 / 2,039,197 )

9.5 ( 170 / 3,592,772 )

[image:4.612.61.288.53.228.2]

FWCI (n_TGXD / n_all)

Figure 1: Field Weighted Citation Impact (FWCI) by Field sorted by FWCI.

political science business environmental science history art philosophy geography sociology economics psychology geology engineering medicine materials science mathematics biology computer science chemistry physics ALL

n_TGXD

0 2k 4k 6k 8k 10k 12k 14k

4.3 ( 15,042 / 56,628,566 )

0.1 ( 3 / 1,845,790 ) 4.5 ( 19 / 1,754,885 ) 1.9 ( 25 / 654,264 ) 7.8 ( 52 / 2,039,197 ) 30.5 ( 62 / 2,360,681 ) 7.5 ( 82 / 2,365,079 ) 9.5 ( 170 / 3,592,772 ) 6.5 ( 293 / 5,977,362 ) 4.4 ( 390 / 6,825,909 ) 2.2 ( 454 / 4,747,669 )

2.3 ( 1,205 / 2,352,750 ) 7.3 ( 1,451 / 11,691,074 ) 2.7 ( 1,468 / 12,061,137 )

3.3 ( 2,385 / 4,001,730 ) 2.6 ( 2,439 / 5,981,276 )

2.8 ( 3,461 / 9,042,998 ) 4.4 ( 4,498 / 12,422,929 )

2.3 ( 6,691 / 7,505,924 ) 2.9 ( 8,126 / 7,025,455 )

[image:4.612.319.552.55.228.2]

FWCI (n_TGXD / n_all)

Figure 2: FWCI by Field sorted by Publication Count of the Field.

0 10 20 30 40 50

political science business environmental science history art philosophy geography sociology economics psychology geology engineering medicine materials science mathematics biology computer science chemistry physics ALL

0 10 20 30 40 50

FWCI

Citation Count

Field

mean_xd mean_all

Figure 3: FWCI with Expected Citation Count and Actual Citation Count from XD Publications.

each category. The results show that for most fields a higher than expected percentage of XSEDE publications fall into

political science business environmental science history art philosophy geography sociology economics psychology geology engineering medicine materials science mathematics biology computer science chemistry physics ALL

0 100,000 200,000 300,000

Overall Citation Count

Field

Extra Citations Expected Citations

Figure 4: Extra Citation Count Achieved by XD Publications.

the highly cited papers categories. E.g., when we consider all the publications and fields together, 4.8% XSEDE pub-lications were in the top 1% highly cited group while 22.5% were in the top 5% highly cited group.

4.2

XSEDE Peer Data Analysis

Now we present a number of graphs and tables that show the results from the peer comparison study. Figure 5 shows the average percentile rank of XSEDE publications grouped by each publication venue. Figure 6 shows the same publication data but presents the median percentile rank values. When we aggregate the results by fields of study instead of by individual journal, we get the results shown in Figure 7. These plots show that for majority of the publications venues, or fields of science, XSEDE publications have a higher per-centile ranking based on citation count.

[image:4.612.62.281.279.440.2]

When we consider the overall comparison results, Figure 8 shows the distribution of the XSEDE publication’s percentile rank in each 10% increment group. Values above 50% in-dicate that the XSEDE publications are cited at a higher rate than their non-XSEDE peers. Again the result show the distribution skewed to the higher end, which means that XSEDE publications are cited more frequently than their non-XSEDE peers.

Figure 9 shows the empirical cumulative distribution of the percentile ranks compared to that of the peers group. The XSEDE publication curve is entirely to the right of the over-all publication curve which is another indication that the XSEDE publications have a higher impact. Figure 10 shows the kernel density of the distributions of XSEDE publica-tions’ percentile ranking and that of peers’. As expected, the non-XSEDE peer publications are evenly distributed by per-centile ranks with the spike at 50% mostly coming from more recently published journal issues where most publications were not yet cited. The XSEDE publications are weighted to the higher percentile ranks side again. This again shows that XSEDE publications tend to be more highly cited com-pared to their peers published in the same journal issue. Table 2 lists the average and median rankings and cita-tions received by XSEDE and non-XSEDE peer publication groups.

[image:4.612.60.289.479.649.2]
(5)

distribu-Table 1: Highly Cited Papers Statistics (in top 1% and 5%)

Field # in top 1% % in top 1% # in top 5% % in top 5% # per 100,000 # XSEDE pubs

ALL 727 4.8 3380 22.5 26.6 15042

physics 292 3.6 1204 14.8 115.7 8126

chemistry 177 2.6 782 11.7 89.1 6691

computer science 223 5.0 1037 23.1 36.2 4498

biology 102 2.9 453 13.1 38.3 3461

mathematics 68 2.8 351 14.4 40.8 2439

materials science 108 4.5 446 18.7 59.6 2385

medicine 42 2.9 213 14.5 12.2 1468

engineering 111 7.6 414 28.5 12.4 1451

geology 33 2.7 183 15.2 51.2 1205

psychology 11 2.4 53 11.7 9.6 454

economics 26 6.7 101 25.9 5.7 390

sociology 18 6.1 63 21.5 4.9 293

geography 19 11.2 87 51.2 4.7 170

philosophy 11 13.4 31 37.8 3.5 82

art 15 24.2 39 62.9 2.6 62

history 6 11.5 19 36.5 2.6 52

environmental science 1 4.0 3 12.0 3.8 25

business 1 5.3 6 31.6 1.1 19

political science 0 0.0 0 0.0 0.2 3

0 10 20 30 40 50 60 70 80 90 100 SCIEN TIFIC REPO RTS JOURN AL O

F NUC LEAR

MATE RIALS

NATU RE CHE

MISTRY

NEW JOURN

AL OF P HYSIC

S

NATU RE MA

TERIA LS

WEA THER A

ND FO RECA STIN G NANO SCAL E JOURN AL O

F LIG HTW

AVE T ECHN

OLOG Y

BULLE TIN O

F THE SEISMO

LOGIC AL SO

CIETY OF A

MERIC A

MOLE CULA

R ECO LOGY

JOURN AL O

F PHY SICS B

-ATO MIC MO

LECU LAR A

ND O PTICA

L PHY SICS

JOURN AL O

F COMP UTAT

IONA L PHY

SICS

ASTRO PHYSIC

AL JO URN

AL SU PPLE MENT SERIE S STRU CTURE ASTRO PHYSIC

AL JO URN

AL

PHYSIC S OF P

LASMA S

ASTRO PHYSIC

AL JO URN

AL LE TTERS

NATU RE PHY

SICS

COMP UTER

PHYSIC S COMMU

NICAT IONS

PHYSIC AL RE

VIEW A

GENO ME BIO

LOGY BIOPHY SICAL JOURN AL JOURN AL O

F BIO MOLE

CULA R NMR SCIEN

CE

JOURN AL O

F THE ELEC TROCHE MICAL SOCIE TY PRO TEINS-S

TRUCT URE FU

NCTIO N AN

D BIO INFO

RMA TICS

JOURN AL O

F FLU ID ME

CHANIC S BIOIN FORMA TICS PHYSIC AL RE

VIEW LETT ERS BIOPO LYME RS JOURN AL O

F POW ER SO

URCES

JOURN AL O

F MO LECU

LAR B IOLO

GY

COMP UTER

METHO DS IN

APPL IED ME

CHANIC S AND

ENGIN EERIN

G

JOURN AL O

F CAT ALYS

IS

PHYSIC AL CHE

MISTRY CHEMIC

AL PHY SICS

JOURN AL O

F PHY SICAL CHE MISTRY A PLOS ONE NANO LETT ERS BIOCHE MISTRY COMB USTIO

N AN D FLA

ME

NUCL EIC AC

IDS RE SEARC

H

BIOCHIMIC A ET B

IOPHY SICA A

CT A-BIOME

MBRA NES

JOURN AL O

F PHY SICAL

CHE MISTRY

C

PHYSIC S OF F

LUID S

JOURN AL O

F ME DICIN

AL CHE MISTRY SOFT MATT ER MOLE CULA

R SIMU LATIO

N

JOURN AL O

F THE ATMO

SPHE RIC SC

IENCE S CHEMIS TRY-A EURO PEAN JOURN AL ABST RACT

S OF P APERS

OF T HE AME

RICAN CHE MICAL SOCIE TY JOURN AL O

F CHE MICAL

INFO RMA

TION AND MO

DELIN G ADVA NCED FUNC TIONA L MA TERIA LS JOURN AL O

F PHY SICAL

CHEMIS TRY LE

TTERS ACS C

ATAL YSIS INORG ANIC CHEMIS TRY CHEMIC AL SC

IENCE

ANGE WAN

DTE C HEMIE

-INTE RNAT

IONA L EDIT

ION

ACCO UNTS

OF C HEMIC

AL RE SEARC H ADVA NCED MATE RIALS ORG ANIC LETT ERS INTE RNAT IONA L JOU

RNAL OF HE

AT AN D MA

SS TRA NSFE

[image:5.612.80.531.66.510.2]

R

Figure 5: Average percentile ranking of XD publications by journal (by ISI)

0 10 20 30 40 50 60 70 80 90 100 ATMO SPHE

RIC EN VIRO NME NT BMC BIOIN FORMA TICS JOM MOLE

CULA R ECO

LOGY

NATU RE CO

MMU NICAT

IONS

CLAS SICAL

AND Q UANT

UM GRA VITY

MONT HLY N

OTICE S OF T

HE RO YAL A

STRO NOMIC

AL SO CIETY NATU

RE MA TERIA

LS

JOURN AL OF P

HYSIC S-CON

DENS ED MA

TTER

JOURN AL O

F PHY SICS B

-ATO MIC MO

LECU LAR A

ND O PTICA

L PHY SICS PHYSIC

AL RE VIEW

D

JOURN AL O

F COMP UTAT

IONA L PHY

SICS

JOURN AL O

F APP LIED P

HYSIC S

ASTRO PHYSIC

AL JO URNAL

LETT ERS

JOURN AL O

F LIG HTW

AVE T ECHN

OLOG Y

ASTRO PHYSIC

AL JO URNAL

STRU CTURE PHYSIC

AL RE VIEW

A OPTIC

S EXP RESS

JOURN AL O

F MO LECU

LAR B IOLO

GY

JOURN AL O

F GEO PHYSIC

AL RE SEARC

H-SP ACE P

HYSIC S

PHYSIC AL RE

VIEW LETT

ERS PHYSIC

S OF P LASMA

S

JOURN AL O

F PHY SICAL

CHEMIS TRY B NATU

RE PHY SICS

SCIEN CE

MOLE CULA

R PHY LOGE

NETIC S AND

EVOL UTIO

N PHYSIC

AL RE VIEW B NANO TECHN OLOG Y ACTA MATE RIALIA BIOCHE MISTRY JOURN AL O

F PHY SICAL OCE ANOG RAPHY COMP UTER METHO DS IN

APPL IED ME

CHANIC S AND

ENGIN EERIN

G

CHEMIS TRY O

F MA TERIA

LS OCEA

N MO DELLIN

G

JOURN AL O

F CAT ALYS

IS

JOURN AL O

F BIO MOLE

CULA R NMR

JOURN AL O

F GEO PHYSIC

AL RE SEARC

H-OC EANS JOURN

AL O F CLIMA

TE

JOURN AL O

F PHY SICAL

CHEMIS TRY C

JOURN AL OF P

HYSIC AL CHE

MISTRY A

PROCE EDIN

GS O F THE

NAT IONA

L ACA DEMY

OF S CIENC

ES O F THE

UNIT ED

MOLE CULA

R SIMU LATIO

N

NUCL EIC AC

IDS RE SEARC

H PHYSIC

S OF F LUID S CHEMIS TRY-A EURO PEAN JOURN AL JOURN AL O

F MA TERIA

LS SC IENCE

ABST RACT

S OF P APERS

OF T HE AME

RICAN CHEMIC

AL SO CIETY

JOURN AL O

F MA TERIA

LS CHE MISTRY

A CARB

ON

JOURN AL O

F CHE MICAL

INFO RMA

TION AND MO

DELIN G

JOURN AL O

F PHY SICAL CHE MISTRY LETT ERS MACRO MOLE CULE S JOURN AL O

F THE AME

RICAN CHEMIC

AL SO CIETY

JOURN AL O

F BON E AND

MIN ERAL RE

SEARC H

JOURN AL O

F THE ATMO

SPHE RIC SC

IENCE S RSC A

DVAN CES

JOURN AL O

F ORG ANIC

CHEMIS TRY CHEMIC

AL SC IENCE ADVA NCED MATE RIALS INTE RNAT IONA L JOU

RNAL OF HE

AT AN D MA

SS TRA NSFE

[image:5.612.77.532.454.676.2]

R

(6)

0 10 20 30 40 50 60 70 80 90

Comp uter an

d Com putati

on Re searc

h

Desig n and Man

ufactur ing Sy

stems

Biolog ical an

d Criti cal Sy

stems Physics

Astro nomi

cal Sc ience

s

Atmos pheri

c Scie nces

Integrati ve Bi

ology and N

euro scien

ce

Math emati

cal an d Phy

sical Scien

ces

Geosci ences

Envir onme

ntal Bi ology

Biolog ical Sc

ience s

Earth Scien

ces

Social and E

cono mic S

cienc e

Engine ering

Chem istry

Ocean Scien

ces

Molec ular Bi

oscie nces

Mate rials Re

searc h

Chem ical, T

herm al Syste

ms

Electr ical an

d Com munic

ation Syste

ms

Math emati

cal Sc ience

s

Mech anical

and S tructu

ral Sy stems

Comp uter an

d Infor mati

on Sc ience

and E ngine

ering

Polar Prog

rams Other

Advan ced S

cienti fic Co

mputi ng

Cross-D iscipl

inary A ctiviti

[image:6.612.80.535.52.253.2]

es

Figure 7: Average percentile ranking of XD publications by Field of Study (by ISI)

Percentile Rank

Density

0 20 40 60 80 100

0

5

10

[image:6.612.343.527.302.434.2]

15

Figure 8: Histogram of Percentile Ranking

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Percentile ranking

Cum

ulativ

e distr

ituion

[image:6.612.80.260.302.435.2]

TG/XD publication peers

Figure 9: Empirical Cumulative Distribution of Percentile Ranks

Table 2: Basic statistics of XSEDE publications group and peers group

Number of Rank Citations

Publications Average Median Average Median

XD 5078 59 63 28 12

Peers 356464 49 49 15 5

0.000

0.005

0.010

0.015

0.020

0.025

Percentile rank

Density

10 20 30 40 50 60 70 80 90

[image:6.612.75.270.471.610.2]

TG/XD publication peers

Figure 10: Kernel Density of the distributions of XSEDE publications’ percentile ranking and that of peers’

tions are identical without assuming that they follow a

nor-mal distribution. We used the Mann-Whitney-Wilcoxon

test [15], Mood’s median test [9], and Kruskal-Wallis test [14]. The results are as the following.

Wilcox test for citation count

• W = 1160300000, p-value<2.2e-16. Alternative

hy-pothesis: true location shift is not equal to 0

Wilcox test for percentile ranking

• W = 1090700000, p-value<2.2e-16. Alternative

hy-pothesis: true location shift is not equal to 0

Mood’s median test for citation count

• p-value = 3.299883e-172

Mood’s median test for percentile ranking

• p-value = 8.83052e-71

Kruskal-Wallis Test for citation count

• Kruskal-Wallis chi-squared = 1207.6, df = 1, p-value

(7)

Kruskal-Wallis Test for percentile ranking

• Kruskal-Wallis chi-squared = 632.35, df = 1, p-value

<2.2e-16

All of these results strongly indicate that the differences that we see between the XSEDE and the non-XSEDE publication metrics are statistically significant.

We also performed a T-test to test the citation count dif-ferences and percentile rank difdif-ferences. Even though the distribution of the citation count of the XSEDE publication group and the peers group are not necessarily normally dis-tributed, due to the central limit theorem, when the sample size is large enough, it is rational to use the T-test to not only test if there is a statistical difference between the two groups, as having been shown by the several previous tests, but also to quantify the difference between the means. The t-test results for both citation count and percentile ranking are given below.

• T=9.8328, df=5105.5, p-value< 2.2e-16, 95%

confi-dence interval: [10.90,16.32]

T-test for ranking (Welch Two sample t-test)

• T=25.412, df=5105.5, p-value<2.2e-16, 95% confidence

interval: [9.07,10.59]

The results show that the XSEDE group has a statistically higher citation ranking and a statistically higher mean cita-tion rate than the non-XSEDE peer group.

4.2.1

Journal peer comparison based on MAG data

Although we first integrated the MAG data in order to eval-uate the field weighed impact of the XSEDE publications, we can follow the same approach as was done using the WoS data to conduct a similar peer comparison study. Figure 11 and Figure 12 show the average and median percentile rank-ing for XSEDE publications by each journal usrank-ing MAG data. The overall results are pretty similar to what we got from the study with the WoS data.

5.

CONCLUSION

We evaluated the scientific impact of XSEDE by examin-ing the publications that were enabled by havexamin-ing access to the XSEDE resources. By curating the XSEDE publication data including cleansing, verifying and correlating the var-ious data sources, we obtained a substantial very valuable dataset with which to compare and evaluate the scientific impact of XSEDE itself. While using two distinct

analy-ses -Field Weighted Citation Impact analysis, and another

novelJournal Publications-based Peer Comparisonstudy, we

found that XSEDE publications tend to be cited more than non-XSEDE publications. Various statistical tests show the

results are statistically significant. The results from this

study could potentially be used to inform to the XSEDE leadership team and the funding agency about the manage-ment of the facility, for example, to provide useful informa-tion to the resource allocainforma-tion committee during proposal selection and approval. While the present study dealt ex-clusively with XSEDE data, the approaches and methods developed can be applied to evaluate publication data from a variety of different facilities or groups. In fact we have done similar analyses for NCAR, BlueWaters, and Bridges using the developed methodology and software framework.

6.

ACKNOWLEDGMENTS

This work is part of the XSEDE Metrics Service (XMS) project sponsored by NSF under grant number OCI-1025159. Lessons learned from FutureGrid have significantly influ-enced this work. Gathering publications was first pioneered by FutureGrid, influencing the development in the XSEDE portal. We would like to thank Matt Hanlon and Maytal Dahan for their efforts to integrate part of the services into the XSEDE portal.

7.

REFERENCES

[1] Compute Canada Report: Increasing Canadian Research Impact. Web Page. URL:

https://www.computecanada.ca/compute-canada-report-increasing-canadian-research-impact/.

[2] i-10 index|google scholar citations open to all. URL:

http://googlescholar.blogspot.com/2011/11/ google-scholar-citations-open-to-all.html. [3] ISI Web of Science. Web Page. URL:

http://wokinfo.com/.

[4] XDMoD Scientific Impact Portal for XSEDE. Web

Page. URL:https:

//sciimp.ccr.xdmod.org/xdportalpub/overview/.

[5] XSEDE. Web Page. URL:https://www.xsede.org/.

[6] J. Bollen, G. Fox, and P. R. Singhal. How and where the TeraGrid supercomputing infrastructure benefits

science.Journal of Informetrics, 5(1):114–121, 2011.

[7] J. Bollen, M. A. Rodriguez, and H. Van de Sompel. MESUR: Usage-based Metrics of Scholarly Impact. In

Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’07, pages 474–474, New York, NY, USA, 2007. ACM. URL: http://doi.acm.org/10.1145/1255175.1255273, doi:10.1145/1255175.1255273.

[8] J. Bollen, H. Van de Sompel, and M. A. Rodriguez. Towards Usage-based Impact Metrics: First Results

from the Mesur Project. InProceedings of the 8th

ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’08, pages 231–240, New York, NY, USA, 2008. ACM. URL:

http://doi.acm.org/10.1145/1378889.1378928, doi:10.1145/1378889.1378928.

[9] G. W. Brown, A. M. Mood, et al. On median tests for

linear hypotheses. InProceedings of the Second

Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California, 1951.

[10] L. Colledge. Snowball metrics recipe book. 2014, 2014. [11] L. Egghe. Theory and practise of the g-index.

Scientometrics, 69(1):131–152, 2006.

[12] C. Gormley and Z. Tong.Elasticsearch: The Definitive

Guide: A Distributed Real-Time Search and Analytics Engine. ” O’Reilly Media, Inc.”, 2015.

[13] J. E. Hirsch. An index to quantify an individual’s

scientific research output.Proceedings of the National

academy of Sciences of the United States of America, 102(46):16569–16572, 2005.

[14] W. H. Kruskal and W. A. Wallis. Use of ranks in

one-criterion variance analysis.Journal of the

(8)
[image:8.612.79.533.47.434.2]

0 10 20 30 40 50 60 70 80 90 Sc ie nc e N atu re Ph ys ic al Re vi ew D Str uc tu re Mo le cu lar S im ul ati on Am er ic an Jo ur nal o f B otan y Jo ur nal o f C om pu tati on al P hy si cs An nal s of B io m ed ic al E ng in ee rin g Mo nth ly N oti ce s of th e Ro yal A str on om ic al S oc ie ty Co m pu tin g in S ci en ce an d En gi ne er in g Jo ur nal o f B io m ol ec ul ar N MR Jo ur nal o f Hi gh E ne rg y Ph ys ic s Co m pu te r P hy si cs C om m un ic ati on s Bu lle tin o f th e Se is m ol og ic al S oc ie ty o f A m er ic a Mo le cu lar P hy lo ge ne tic s an d Ev ol uti on Jo ur nal o f P hy si cs : C on de ns ed Matte r Jo ur nal o f C om pu tati on al C he m is tr y Jo ur nal o f P hy si cs B N atu re Mate rial s As tr op hy si cal Jo ur nal S up pl em en t S er ie s Mo le cu lar P hy si cs An nu al Re vi ew o f A str on om y an d As tr op hy si cs Ph ys ic al Re vi ew B Jo ur nal o f A pp lie d Ph ys ic s Cl as si cal an d Q uan tu m G rav ity O pti cs E xp re ss N atu re S tr uc tu ral & Mo le cu lar B io lo gy Ph ys ic al Re vi ew L ette rs Jo ur nal o f Mo le cu lar B io lo gy Ph ys ic s of A to m ic N uc le i Jo ur nal o f C he m ic al T he or y an d Co m pu tati on Jo ur nal o f C he m ic al P hy si cs Jo ur nal o f C om pu te r-ai de d Mo le cu lar D es ig n Ph ys ic al Re vi ew E Jo ur nal o f P hy si cal C he m is tr y B N uc le ic A ci ds Re se ar ch Pr ote in s Jo ur nal o f P hy si cal C he m is tr y A N ew Jo ur nal o f P hy si cs Co nc ur re nc y an d Co m pu tati on : P rac tic e an d Lan gm ui r Ph ys ic al Re vi ew A PL O S G en eti cs Cr ys tal G ro w th & D es ig n Sc ie nti fic Re po rts Bi oi nf or m ati cs Ch em ic al P hy si cs L ette rs G eo ph ys ic al Re se ar ch L ette rs PL O S Co m pu tati on al B io lo gy Ph ys ic al C he m is tr y Ch em ic al P hy si cs Bi oc he m is tr y O pti cs L ette rs Ch em is tr y of Mate rial s Jo ur nal o f P hy si cal C he m is tr y C Che minfor m Jo ur nal o f B io lo gi cal C he m is tr y Pr oc ee di ng s of th e N ati on al A cad em y of S ci en ce s of Ac ta Mate rial ia N atu re C om m un ic ati on s Bi op hy si cal Jo ur nal In du str ial & E ng in ee rin g Ch em is tr y Re se ar ch In or gan ic C he m is tr y N an ote ch no lo gy Ap pl ie d Ph ys ic s Le tte rs eL ife PL O S O NE N an o Le tte rs Jo ur nal o f P hy si cal C he m is tr y Le tte rs An ge w an dte C he m ie Jo ur nal o f G eo ph ys ic al Re se ar ch AC S Catal ys is Jo ur nal o f C he m ic al In fo rm ati on an d Mo de lin g AC S N an o Ch em is tr y: A E ur op ean Jo ur nal Jo ur nal o f Me di ci nal C he m is tr y Bi om ac ro m ol ec ul es Jo ur nal o f th e Am er ic an C he m ic al S oc ie ty AC S Ap pl ie d Mate rial s & In te rf ac es Jo ur nal o f T he E le ctr oc he m ic al S oc ie ty Jo ur nal o f O rg an ic C he m is tr y O rg an ic L ette rs Mac ro m ol ec ul es Ac co un ts o f C he m ic al Re se ar ch Atm os ph er ic C he m is tr y an d Ph ys ic s

Figure 11: Average percentile ranking of XD publications by journal (by MS)

0 10 20 30 40 50 60 70 80 90 100 Str uc tu re Ph ys ic al Re vi ew D Jo ur nal o f B io m ol ec ul ar N MR Sc ie nc e Mo le cu lar P hy lo ge ne tic s an d Ev ol uti on Mo nth ly N oti ce s of th e Ro yal A str on om ic al S oc ie ty N atu re Am er ic an Jo ur nal o f B otan y Jo ur nal o f C om pu tati on al P hy si cs An nal s of B io m ed ic al E ng in ee rin g Co m pu te r P hy si cs C om m un ic ati on s PL O S G en eti cs Co m pu tin g in S ci en ce an d En gi ne er in g As tr op hy si cal Jo ur nal S up pl em en t S er ie s Jo ur nal o f P hy si cs : C on de ns ed Matte r Mo le cu lar P hy si cs Jo ur nal o f P hy si cs B N atu re Mate rial s Jo ur nal o f C he m ic al P hy si cs Mo le cu lar S im ul ati on An nu al Re vi ew o f A str on om y an d As tr op hy si cs Ph ys ic s of A to m ic N uc le i Jo ur nal o f C om pu te r-ai de d Mo le cu lar D es ig n Jo ur nal o f Hi gh E ne rg y Ph ys ic s Ph ys ic al Re vi ew B Jo ur nal o f A pp lie d Ph ys ic s Cl as si cal an d Q uan tu m G rav ity N atu re S tr uc tu ral & Mo le cu lar B io lo gy Bu lle tin o f th e Se is m ol og ic al S oc ie ty o f A m er ic a Jo ur nal o f C om pu tati on al C he m is tr y Ph ys ic al Re vi ew L ette rs Jo ur nal o f Mo le cu lar B io lo gy Jo ur nal o f C he m ic al T he or y an d Co m pu tati on Ph ys ic al Re vi ew E Jo ur nal o f P hy si cal C he m is tr y B Pr ote in s In or gan ic C he m is tr y Lan gm ui r Cr ys tal G ro w th & D es ig n Co nc ur re nc y an d Co m pu tati on : P rac tic e an d Ph ys ic al Re vi ew A O pti cs E xp re ss Jo ur nal o f P hy si cal C he m is tr y A Ch em ic al P hy si cs L ette rs Sc ie nti fic Re po rts G eo ph ys ic al Re se ar ch L ette rs N ew Jo ur nal o f P hy si cs N uc le ic A ci ds Re se ar ch Jo ur nal o f P hy si cal C he m is tr y C Jo ur nal o f B io lo gi cal C he m is tr y eL ife PL O S Co m pu tati on al B io lo gy Ch em is tr y of Mate rial s Bi oc he m is tr y Bi oi nf or m ati cs Ph ys ic al C he m is tr y Ch em ic al P hy si cs Pr oc ee di ng s of th e N ati on al A cad em y of S ci en ce s of N an ote ch no lo gy Ap pl ie d Ph ys ic s Le tte rs An ge w an dte C he m ie Ac ta Mate rial ia N atu re C om m un ic ati on s PL O S O NE N an o Le tte rs Jo ur nal o f C he m ic al In fo rm ati on an d Mo de lin g O pti cs L ette rs Che minfor m Jo ur nal o f P hy si cal C he m is tr y Le tte rs Jo ur nal o f G eo ph ys ic al Re se ar ch AC S N an o Ch em is tr y: A E ur op ean Jo ur nal AC S Catal ys is Bi om ac ro m ol ec ul es Jo ur nal o f T he E le ctr oc he m ic al S oc ie ty Bi op hy si cal Jo ur nal In du str ial & E ng in ee rin g Ch em is tr y Re se ar ch Jo ur nal o f Me di ci nal C he m is tr y Jo ur nal o f th e Am er ic an C he m ic al S oc ie ty O rg an ic L ette rs Jo ur nal o f O rg an ic C he m is tr y AC S Ap pl ie d Mate rial s & In te rf ac es Ac co un ts o f C he m ic al Re se ar ch Atm os ph er ic C he m is tr y an d Ph ys ic s Mac ro m ol ec ul es

Figure 12: Median percentile ranking of XD publications by journal (by MS)

[15] H. B. Mann and D. R. Whitney. On a test of whether one of two random variables is stochastically larger

than the other.The annals of mathematical statistics,

pages 50–60, 1947.

[16] T. Penfield, M. J. Baker, R. Scoble, and M. C. Wykes. Assessment, evaluations, and definitions of research

impact: A review.Research Evaluation, 23(1):21–32,

2014.

[17] P. Thomas and D. Watkins. Institutional research rankings via bibliometric analysis and direct peer review: A comparative case study with policy

implications.Scientometrics, 41(3):335–355, 1998.

[18] J. Towns, T. Cockerill, M. Dahan, I. Foster,

K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. Peterson, R. Roskies, J. Scott, and N. Wilkins-Diehr. XSEDE: Accelerating Scientific

Discovery.Computing in Science Engineering,

16(5):62–74, Sept 2014.doi:10.1109/MCSE.2014.80.

[19] G. von Laszewski, F. Wang, G. C. Fox, D. L. Hart, T. R. Furlani, R. L. DeLeon, and S. M. Gallo. Peer comparison of xsede and ncar publication data. In

2015 IEEE International Conference on Cluster Computing, pages 531–532, Sept 2015.

doi:10.1109/CLUSTER.2015.98.

[20] F. Wang, G. von Laszewski, G. C. Fox, T. R. Furlani, R. L. DeLeon, and S. M. Gallo. Towards a scientific impact measuring framework for large computing

facilities - a case study on xsede. InProceedings of the

2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE ’14, pages 25:1–25:8, New York, NY, USA, 2014. ACM. URL:

[image:8.612.79.533.49.241.2]

Figure

Figure 2: FWCI by Field sorted by Publication Count of the Field.
Figure 5: Average percentile ranking of XD publications by journal (by ISI)
Figure 7: Average percentile ranking of XD publications by Field of Study (by ISI)
Figure 12: Median percentile ranking of XD publications by journal (by MS)

References

Related documents

AS 1530.1 Combustibility test Define non-combustible materials AS ISO 9705 Room corner test Control wall and ceiling linings AS/NZS 3837 Cone calorimeter Control wall

Check and Ballard (2014) and Hsieh (2016), all scholars in the fields of art edu- cation and LGBTQ+ issues, contend that teachers may want to help queer students feel safe and

IIROC Notice 14-0058 – Rules Notice – Request for Comments – Guidance Respecting Underwriting Due Diligence 9 about an issuer’s public disclosure, financial statements,

Regional Medical Center of San Jose is one of three designated trauma centers that make up the trauma care system in Santa Clara County.Trauma centers offer physicians, nurses and

The adopted sampling technique limits the on-line computation burden by exploiting several quantities computed offline (see subsection II.A), Nataf transformation

This study examined whether service providers and clients have different perceptions of interorganizational networks in the agricultural development field in Burkina Faso.. Although

As a high school English teacher, I have to prepare my students for all levels of college writing, not just English class and “different colleges in the same area have different

In addi­ti­on to the General Edu­cati­on requ­i­rem­ents and the 192 total hou­rs speci­fi­ed by the Uni­versi­ty, all m­ajors i­n the School of