• No results found

Semantic Research Grid

N/A
N/A
Protected

Academic year: 2020

Share "Semantic Research Grid"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Semantic Research Grid

Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington

October 15 2007

Geoffrey Fox, Aurel Cami, Ahmet Fatih

Mustacoglu, Ahmet E. Topcu

Community Grids Laboratory,

Indiana University Bloomington IN 47404

(2)

Existing User Interface

Semantic Scholars Grid

etc. Google Scholar Manuscript Central Science.gov Windows Live Academic Search Citeseer CMT Conferenc Management Existing Documen based Tools Web servic Wrappers New Document-enhanced Research Tools Integration Enhancement User Interface Community Tools Generic Document Tools

(3)

Delicious Semantic Web/Grid

n

http://del.icio.us

purchased by

Yahoo

for ~$30M

n

h

ttp://www.CiteULike.org

n

http://www.connotea.org (

Nature)

n

Associate

metadata

with

Bookmarks

specified by

URL’s, DOI’s (Digital Object Identifiers)

n

Users add

comments

and

keywords

(called

tags)

n

Users are linked together into

groups

(communities)

n

Information such as title and authors extracted

automatically

from some sites (PubMed, ACM, IEEE,

Wiley etc.)

n

Bibtex

like additional information in CiteULike

n

This is perhaps

de facto Semantic Web

– remarkable

(4)

Example

n Parallel

Computing Collection selected on

Cell Tag

n So far no clear

“winner” in tagging space

n Maybe

CiteUlike with different

metadata better

n How do I

preserve

(5)

General Document Semantic Analysis

n Citeseer and Google Scholar scour the Internet and analyze documents

for incidental metadata

Title, author and institution of documents

Citations with their own metadata allowing one to match to other

documents

n These capabilities are sure to become more powerful and to be

extended

Give “Citation Index” in real time

Tell you all authors of all papers that cite a paper that cites you etc.

(Note it’s a small world so don’t go too far in link analysis)

Tell you all citations of all papers in a workshop

Helps journal editor by suggesting referees based on document

(6)

Possible challenges

n

Use of Web 2.0 tools

in science (and business) is very

promising but adoption is currently

small

n

Which of many tools will be popular with your

colleagues?

n

What happens if

tool

you chose is not adopted or worse

– just

disappears

in a industry “shake-up”?

n

How to best

integrate web-tagged

document with

Word

and

Latex

citations?

n

Need to tag

URI’s – e.g. database entries, not just

URL’s (did for journal control system)

n

Is currently

security

model sufficient?

n

Can we

link virtual organization

of tagging system with

(7)

Roughly what we are doing

n We are NOT building a new tagging or search system

n We are building tools integrating and adding value to existing

systems

n We built a mashup linking to del.icio.us, CiteULike, Connotea

allowing exchange of tags between sites and between local repositories

n Repositories also link to local sources (PubsOnline) and Google

Scholar (GS) and Windows Academic Live (WLA)

GS has number of cited publications.

WLA has Digital Object Identifier (DOI)

n We implement a rather more powerful access control mechanism n We build heuristic tools to mine “web lists” for citations

n We have an “event” based architecture (consistency model)

allowing change actions to be preserved and selectively changed

Supports integrating different inconsistent views of a given document and

(8)

del.icio.us Tags

Download to Local System

(9)
(10)

Key Concepts of System Architecture

n

Digital Entity (DE):

a digital collection of metadata for

a citation

n

Event:

a time-stamped action on a digital entity. Our

event-based model consists of:

Major Events:

n

Insertion or deletion of a digital entity

Minor Events:

n

Modifications to an existing digital entity

Dataset:

n

Collection of major and minor events

n

Service-based Framework

(SOAP over Http)

(11)

Example Subsystem

n Transfer

n Download/Upload n Modify Digital

Entity (DE)

n Share DE with

other users

n Add/Get More info

on a DE

n History (as a set of

events) of a DE and rollback

03/02/2020 11

CiteULike Connotea Delicous

Research

Database ResearchDatabase ResearchDatabase Core Web

(12)

SRG System Modules I

n

Digital Entity

(DE)

Management Service

Manual DE entity into the systemDE history

DE versioning and flexible choices (rollback)

Editing and more info tools for a DE (Update Model)

n

Session

and

Event

Management Services

Event and dataset managementDE view options

User credentials (username/password) - cookie-based

n

Annotation Tools Service

Transfer ServiceDownload serviceUpload Service

Extract DE and tags from web lists

(13)

SRG System Modules II

n Search Tools Services

Google Scholar/Windows Live AcademicGoogle Scholar Advanced

Local Database Search:

n Via integrated PubsOnline Tool from Indiana University n My Research Database

n My Research Database Advanced

n Authentication and Authorization Services

Login and Logout service

DE Access rights management

Database access rights managementAdministrative tools

n Other Services

User Registration

Username and password recovery

User’s Profile ManagementDE metadata view options

(14)

Technical Issues

n Event-based model

Manipulating data and metadataHow to build event-based model ?

n Major and Minor events

n Datasets (collection of minor events)

How to apply event-based model ?

How to apply modifications to a record (Digital Entity) ?

n Keep them in user’s session and let user apply them

n Or apply them automatically to a DE

How to merge metadata fields of Event and Digital Entity ?

n Identification of metadata fields as dynamic or static

field

n How to apply service-based framework as wrapper?

(15)

Some recent Features of SRG

• Hybrid Consistency Framework Implementation

– Data-centric strict consistency model

– Implements primary-copy based consistency protocol

– Pull-based:

• Time-based consistency approach.

• Communicates with Annotation Tools to collect updates

periodically

– Push-based:

• Updates are distributed to Annotation Tools immediately once

they occurred on the primary copy

• Periodic Search Tools Implementation

– Search, compare and apply the updates made to a Digital Entity

(DE) in the system.

• Unique (128 bit) UUID assignment for each Digital Entity

• User Tags view in the system

– Displays all tags belongs to a user

(16)
(17)
(18)
(19)
(20)

Metadata Collection from CGL web

pages

• The aim is to

– Eliminate duplicate data entry in different web platforms.

– Building richer metadata in SRG using base collected Digital Entities from web pages.

– Share new Digital Entities with other tools and users in SRG

(21)

Methodology for Collection

Collect:

– Digital Entities in Community Grid Publication web pages.

Analyze:

– Using heuristic methodology to extract metadata fields of the Digital Entities for CGL publications

Build:

– RSS objects using collected Digital Entities. – New tags using collected Digital Entities.

• Compare:

– Collected Digital Entities from CGL web pages with the existing Digital Entities in SRG.

• If they are:

different: Store new Digital Entities in SRG storage. – same: Option to update tags and other fields.

Share:

(22)

Security Model

n

Security in Web 2.0

can be limited

n

We implement a simple but

more powerful

security

model around local tools that wrap Web 2.0 systems

n

We used an

access-control matrix

model to provide

security for our information system

Supports multiple groups and multiple users for each object.Similar to UNIX file system

n The Unix RWX bits corresponds to Read, Write, and Execute operation for each file and directory.

In SRG, DE (Digital Entity) correspond to the file element and

folder corresponds to the directory element.

For each DE and folder, there are three types of access rights

(23)

Security Model II

n

We have a security model that supports

Level of Authorization

n Roles are defined as Super Administrator (SA) and Group

Administrator (GA), User (U)

n The system allows having more than one SA. n An existing SA can add other SAs to the system.

n SA can assign any U to become GA, and remove GA from

group.

n Each group should at least one GA. GA add/remove U

from group

User profile

(24)
(25)

Current Usage of

Semantic Research Grid Project

n

We have used/tested Semantic Research Grid (SRG) (a

prototype model) for published scientific research

publications in Community Grids Lab at Indiana

University

n

In CGL 20 students ,post-docs and faculty members

are testing

n

They are using the prototype model for collecting of

(26)

Summary

n Integration

We have successfully integrated Google Scholar and Windows Live

Academic search tools and CiteUlike, Delicious, and Connotea annotation tools which provide a system that allow dynamic publication.

n Flexibility and Extensibility

We provides flexibility allowing integration of different tools having

common metadata.

Easy to add and extend service mechanism

n Management and Consistency Scheme of Digital Entities

Allows the manipulation of a digital entity

Applies Event-based model based on the concept of:

n Major events n Minor events n Datasets

Provides a rollback feature to:

n Support for history tool for a DE

n Merge and change the content of a digital entity

A service-based framework for using existing annotation tools through web

services

(27)

Domain Specific Semantic Document

Analysis

n It is natural to develop core document Services such as those

used in Citeseer/Google Scholar but applied to “your”

documents of interest that may not have been processed yet

As just submitted to a conference perhaps

n These tools can help form useful lists such as authors of all cited

or submitted papers to a journal

n OSCAR3 (from Peter Murray-Rust’s group at Cambridge)

augments the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms

This tool is a Service that can be applied to “your” document

or to a set of documents harvested in some fashion

Luis Rocha has developed related ideas for Biology

Other fields have natural application specific metadata and

OSCAR like tools can be developed for them

(28)

OSCAR3 Chemistry

Document analysis

n It detects “magic”

chemical strings in text and then

Stores them as

metadata associated with document

n Queries

ChemInformatics

repositories to tell you lots of information

about identified compounds

n Tells you which other

(29)

Initial Results from OSCAR on PubMed

n We have a small sample (100) of full text Chemistry papers selected at

random from 15 years of PubMed with over 5 million abstracts

OSCAR3 generates 4.17 compound names per abstractand 36.7 compound names per full text

555,007 PubMed abstracts of 2005 – 2006 (part) used for Abstracts (on

Big Red)

(30)

CICC Chemical Informatics Cyberinfrastructure

Collaboratory

PubMed Database OSCAR Text Analysis POV-Ray Parallel Rendering Initial 3D Structure Calculatio n Toxicity Filtering Cluster Grouping Docking Molecular Mechanics Calculatio ns Quantum Mechanics Calculation s IU’s Varuna Database NIH PubChem Database NIH PubChem Database

Product databases are wrapped with Web service interfaces and are suitable for inclusion in Taverna workflows.

PubChem Database

MOAD Database

References

Related documents

Despite a variety of policy tools and environmental measures, conventional intensive agriculture and forestry is jeopardizing sustainable land management,

Representation of service users on committees and groups responsible for planning, implementing and reviewing anti-stigma, mental disorder prevention and mental health

In this paper, we have proposed a GSM-CR scheme with enhanced spectral efficiency, termed GSM-DCR. GSM-DCR improves the overall spectral efficiency of equivalent GSM-CR and

partnerships with an entrepreneurial approach meant to benefit community, economic and/or workforce development efforts in their college district service area. These

The main objective of validation of an analytical procedure[17] is to demonstrate that the procedure is suitable for its intended purpose; this document describes

Although silver addition and macroporous templating enhanced the visible light activity, the most significant improvement was afforded by the utilization of the

After a clinical interview with the parents (indirect assessment) and from the analysis of collected baseline data, the health professional hypothesised that the primary function

Biodiversity and habitat protection B&HP Steps to protect endangered species and their habitats Water resources conservation WRC Surface and underground water quality