• No results found

DC, MODS and CERIF-XML

N/A
N/A
Protected

Academic year: 2021

Share "DC, MODS and CERIF-XML"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

DC, MODS and CERIF-XML

A Tale of Two Cultures

Ed Simons

Radboud University Nijmegen, NL.

(2)

Some personal data

Ed Simons

Ó Workplace: Information Centre (UCI) of Radboud University

Ó UCI takes care of all IT-services for RU.

Ó UCI also managing host of SURFnet, NL university network.

Ó Project leader software development projects.

Ó Initiator and project leader METIS: CRIS of all NL universities

+ NL Royal Academy of Sciences

Ó Last few years: international IT-projects within framework of

development cooperation (Africa).

(3)

Structure of the Presentation

1.

Comparison of the 3 formats.

2.

Why XML?

3.

Towards another solution for exposure and access

(4)
(5)

Comparison of the formats

z DC: Dublin Core

z MODS: Metadata Object Description Schema. Often goes

together with DIDL (Digital Item Declaration Language), so often you see “DIDL/MODS” mentioned. In these cases MODS is a metadata record in the “DIDL container”, describing the bibliographic metadata of the publication whereas other parts in DIDL contain the metadata of the object files of the

publication (location, file size, mime type etc...).

(6)

Comparison of the formats

The following documents give a representation of the same article in the 3 XML-formats:

Titel: On the relations between ISE and structure in some RE(Mg)SiAlO(N) glasses

Author(s): Dauce R (Dauce, R.)1, Keding R (Keding, R.)2, Sangleboeuf JC (Sangleboeuf, J-C.)1 Source: JOURNAL OF MATERIALS SCIENCE

Abbriviation: J MATER SCI

Volume: 43 Issue: 22 Pages: 7239-7246 Published: NOV 2008

JCR Impact factor: 1.081 Times Cited: 0 References: 47

Abstract: Six oxide and oxynitride glasses were synthesized in the Y-Mg-Si-Al-O-N, Nd-Mg-Si-Al-O-N and La-Mg-Si-Al-O-N

systems. As already known, nitrogen introduction increases the T-g, packing factor and mechanical properties of the glasses. Cationic substitution also has an influence on the glasses' behavior, particularly in terms of sensitivity to indentation load/size effect (ISE). The structure of the yttrium-containing glasses was investigated by mean of Al-27 and Si-29 MAS-NMR. Al is found to occur for 2/3 as a network former and for 1/3 as a modifier.

Language: English

Reprint Address: Dauce, R (reprint author), Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France Addresses: 1. Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France

2. Univ Aalborg, Aalborg, Denmark

Corrosponding author: E-mail Addresses: [email protected]

Publisher: SPRINGER, 233 SPRING ST, NEW YORK, NY 10013 USA

KeyWords: AL-O-N; EARTH ALUMINOSILICATE GLASSES; OXYNITRIDE GLASSES; MAS-NMR; FLOPPY MODES;

INDENTATION; SYSTEM; MICROHARDNESS; RAMAN; DIFFRACTION Subject Category: Materials Science, Multidisciplinary

IDS Number: 373FZ

ISSN: 0022-2461 (Print) 1573-4803 (Online) DOI: 10.1007/s10853-008-2851-3

(7)

Dublin Core

z DC: too simple: of limited use because of lack of

detail and granularity. E.g.:

z no separate elements for volume, issue and page z not possible to describe in the same DC record the

item of which a publication, e.g., a book chapter, is a part.

z not possible to indicate the exact role of a creator

or contributor,

(8)

Dublin Core

z DC reflects the tradtional “library culture”: electronic

version of the old library card.

z DC possibly also reflects a “political” aspect or

culture.

z The OAI-community needed a format which was easy to

implement everywhere on short notice. They in a way did not have time to wait until a more suitable, robust solution was worked out.

z DC and DC-based harvesting indeed a “success” but in

which sense: the success of the tool or the success of optimally supplying research information?

(9)

MODS

z Solves the shortcomings of DC. More detailed format

and good handling of semantics, e.g.:

z possibility to express roles of authors/persons

z possibility to use established classification schemas

(controlled vocabularies) by means of the “authority” attribute.

<role>

<roleTerm authority="marcrelator" ...>

aut

</roleTerm> </role>

(10)

MODS

Describe in the same record the item of which a publication, e.g., a book chapter, is a part.

<titleInfo>

<title>The provisions of the Corpus Juris on community fraud</title>

<subTitle>a Belgian and Dutch perspective</subTitle> </titleInfo>

<relatedItem type="host">

<titleInfo>

<title>Das Corpus Juris als Grundlage eines

europaeischen Strafrechts : Europaeisches Kolloquium, Trier, 4.-6. Maerz 1999 </title>

</titleInfo> </relatedItem>

(11)

MODS

z Still MODS heavily reflects the library culture and

vision on research information.

z Rich metadata set to adequately describe the

bibliographical aspects of a publication.

z But adequately and optimally exposing research

information involves more than just bibliographical aspects and more than just publications. E.g.

“contextual” research metadata (e.g. about the

(12)

CERIF-XML

Describe in the same record the item of which a publication, e.g., an article, a book chapter, is a part.

<cfResPublTitle>

<cfResPublId>ArtTitle4778</cfResPublId>

<cfTitle cfLangCode="EN" cfTrans="o">On the relations between ISE and structure in some

RE(Mg)SiAlO(N) glasses</cfTitle> </cfResPublTitle>

<cfResPublTitle>

<cfResPublId>JournalTitle345</cfResPublId>

<cfTitle cfLangCode="EN" cfTrans="o">JOURNAL OF MATERIALS SCIENCE</cfTitle> </cfResPublTitle>

<cfResPubl_ResPubl>

<cfResPublId1>ArtTitle4778</cfResPublId1> <cfResPublId2>JournalTitle345</cfResPublId2>

<cfClassId>is article in</cfClassId>

<cfClassSchemeId>cfResultPublication-ResultPublication</cfClassSchemeId> <cfStartDate>2001-01-01T12:00:00-05:00</cfStartDate>

<cfEndDate>2001-01-01</cfEndDate> </cfResPubl_ResPubl>

(13)

CERIF-XML

<cfPers_ResPubl>

<cfPersId>DauceR</cfPersId>

<cfResPublId>ArtTitle4778</cfResPublId> <cfClassId>is author of</cfClassId>

<cfClassSchemeId>cfPerson-ResultPublicationRoles</cfClassSchemeId> <cfStartDate>2001-01-01T12:00:00-05:00</cfStartDate>

<cfEndDate>2001-01-01T12:00:00-05:00</cfEndDate> <cfCopyright></cfCopyright>

</cfPers_ResPubl>

The link between an author and the publication is done in the

(14)

CERIF-XML

z Very strong point of CERIF-XML: all relations between

entities of a publication are done in exactly the same, uniform way: z Authors to publication z Article to journal z Chapter to book z Editors to book z Etc..

z Secondly: all these relations are at the same time semantically

described (role of a person, type of relation between

(15)

CERIF-XML

z

Reflects very strongly the relational database

culture or way of thinking:

z Mirrors the relational CERIF model into the

XML-world.

z Too fragmented: too many schema's and

namespaces (e.g. More than 10 different schema's to express an article).

z Could lead to performance issues.

z Difficult to communicate to non-experts or people

(16)

CERIF-XML

z

Need to combine schema's into a limited number,

corresponding to major research objects, e.g.: one

schema for a publication.

z CERIF task group is aware of this and currently

(17)

CERIF-XML

z Extensive set of metadata not limited to bibliographic or

publication metadata, but encompassing all aspects of

research information (including the bibliographical metadata as expressed by MODS).

z Strong point of CERIF is that it is a uniformed, standardized

MODEL which allows easy extension or addition of research

(18)
(19)

Why XML?

z We all seem to uncritically embrace XML as the

obvious format for exposing research information in the international context.

z All has to be prepared to work with the XML-based

architecture and technologies: OAI/PMH, SOA...

z Result: we all copy, transform and double store data

(e.g. from our CRIS repositories we transform a set of metadata into XML which we then upload/store in

(20)

Why XML?

z But shouldn't we ask ourselves whether all this

copying, transforming and re-storing of data in XML-format is necessary and really the way to go?

z Would it not be better should there be a solution

which only needs the original data sources and leave these intact without transforming and re-storing data somewhere else?

(21)
(22)
(23)

Towards another solution?

z

Conclusions: METIS can automatically harvest

the metadata already stored in Elsevier’s SCOPUS

database and so these do not have to be entered

separately in METIS again.

z

However up to now, METIS still stores the

harvested data in its own database, but actually

this probably should not be necessary and so we

should considering solutions for this. This brings

us to a next step.

(24)

Business Intelligence view may be inspiring

z In one sentence Business Intelligence (BI) could be

defined as: knowledge of all aspects of the business

in a comprehensive, integrated and maneagable way.

z BI-tools are softwares which supply this knowledge

(e.g. Business Objects, Jasper-Reports/iReport).

z Great consumers of BI are managers of big

companies who need to know all aspects of their business in a comprehensive manageable form (statistics, charts, diagrams, et...)

(25)

z The problem that BI is confronted with is more or

less the same as we face when talking about getting full, appropriate and integrated view on research information: the data is dispersed over various, heterogenuous resources (databases, ,

XML-repositories, files, etc..).

z There are solutions emerging that solve this problem,

in other words: which supply timely data from

heterogenuous sources in an integrated way without first copying, tranforming and storing these data in intermediate resources.

(26)

The following builds upon the ideas expressed by Rick van der

Lans, a Dutch internationally acclaimed expert on software

architecture and solutions for Business Intellingence and notably his recent publication:

Rick, F. van der Lans, Developing a Data Delivery

PlatformWith Informatica Data Services. A Technical Whitepaper on Next Generation Data Virtualization,

February 28th, 2011, Copyright © 2011 R20/Consultancy.

Business Intelligence view may be inspiring

http://vip.informatica.com/RickLans8761?elqPURLPage=6013&docid=1571&lsc=NA-Ongoing-2011Q1-JP-DI_Developing_Data_Delivery_Platform_WP_www

(27)
(28)

Federation Server

z Works with all kinds of input resources: relational

databases, data warehouses, XML-resources, Excel sheets, text files, web services, etc..

z Based on relational database concept but with virtual

tables.

z No (re-)storage of the data

z On demand (on-the-fly) transformation of incoming

(29)

Federation Server: virtualization

(30)

Mapping foreign to virtual table

(31)

Mapping XML document to virtual table

(32)

Joining Relational and XML data

(33)

Concrete Application

(34)
(35)

To conclude:

Perhaps good to explore also this kind of technologies instead of just sticking to the XML based solutions.

References

Related documents