DC, MODS and CERIF-XML
A Tale of Two Cultures
Ed Simons
Radboud University Nijmegen, NL.
Some personal data
Ed SimonsÓ Workplace: Information Centre (UCI) of Radboud University
Ó UCI takes care of all IT-services for RU.
Ó UCI also managing host of SURFnet, NL university network.
Ó Project leader software development projects.
Ó Initiator and project leader METIS: CRIS of all NL universities
+ NL Royal Academy of Sciences
Ó Last few years: international IT-projects within framework of
development cooperation (Africa).
Structure of the Presentation
1.
Comparison of the 3 formats.
2.Why XML?
3.
Towards another solution for exposure and access
Comparison of the formats
z DC: Dublin Core
z MODS: Metadata Object Description Schema. Often goes
together with DIDL (Digital Item Declaration Language), so often you see “DIDL/MODS” mentioned. In these cases MODS is a metadata record in the “DIDL container”, describing the bibliographic metadata of the publication whereas other parts in DIDL contain the metadata of the object files of the
publication (location, file size, mime type etc...).
Comparison of the formats
The following documents give a representation of the same article in the 3 XML-formats:
Titel: On the relations between ISE and structure in some RE(Mg)SiAlO(N) glasses
Author(s): Dauce R (Dauce, R.)1, Keding R (Keding, R.)2, Sangleboeuf JC (Sangleboeuf, J-C.)1 Source: JOURNAL OF MATERIALS SCIENCE
Abbriviation: J MATER SCI
Volume: 43 Issue: 22 Pages: 7239-7246 Published: NOV 2008
JCR Impact factor: 1.081 Times Cited: 0 References: 47
Abstract: Six oxide and oxynitride glasses were synthesized in the Y-Mg-Si-Al-O-N, Nd-Mg-Si-Al-O-N and La-Mg-Si-Al-O-N
systems. As already known, nitrogen introduction increases the T-g, packing factor and mechanical properties of the glasses. Cationic substitution also has an influence on the glasses' behavior, particularly in terms of sensitivity to indentation load/size effect (ISE). The structure of the yttrium-containing glasses was investigated by mean of Al-27 and Si-29 MAS-NMR. Al is found to occur for 2/3 as a network former and for 1/3 as a modifier.
Language: English
Reprint Address: Dauce, R (reprint author), Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France Addresses: 1. Univ Rennes 1, CNRS, LARMAUR, FRE 2717, F-35042 Rennes, France
2. Univ Aalborg, Aalborg, Denmark
Corrosponding author: E-mail Addresses: [email protected]
Publisher: SPRINGER, 233 SPRING ST, NEW YORK, NY 10013 USA
KeyWords: AL-O-N; EARTH ALUMINOSILICATE GLASSES; OXYNITRIDE GLASSES; MAS-NMR; FLOPPY MODES;
INDENTATION; SYSTEM; MICROHARDNESS; RAMAN; DIFFRACTION Subject Category: Materials Science, Multidisciplinary
IDS Number: 373FZ
ISSN: 0022-2461 (Print) 1573-4803 (Online) DOI: 10.1007/s10853-008-2851-3
Dublin Core
z DC: too simple: of limited use because of lack of
detail and granularity. E.g.:
z no separate elements for volume, issue and page z not possible to describe in the same DC record the
item of which a publication, e.g., a book chapter, is a part.
z not possible to indicate the exact role of a creator
or contributor,
Dublin Core
z DC reflects the tradtional “library culture”: electronic
version of the old library card.
z DC possibly also reflects a “political” aspect or
culture.
z The OAI-community needed a format which was easy to
implement everywhere on short notice. They in a way did not have time to wait until a more suitable, robust solution was worked out.
z DC and DC-based harvesting indeed a “success” but in
which sense: the success of the tool or the success of optimally supplying research information?
MODS
z Solves the shortcomings of DC. More detailed format
and good handling of semantics, e.g.:
z possibility to express roles of authors/persons
z possibility to use established classification schemas
(controlled vocabularies) by means of the “authority” attribute.
<role>
<roleTerm authority="marcrelator" ...>
aut
</roleTerm> </role>
MODS
Describe in the same record the item of which a publication, e.g., a book chapter, is a part.
<titleInfo>
<title>The provisions of the Corpus Juris on community fraud</title>
<subTitle>a Belgian and Dutch perspective</subTitle> </titleInfo>
<relatedItem type="host">
<titleInfo>
<title>Das Corpus Juris als Grundlage eines
europaeischen Strafrechts : Europaeisches Kolloquium, Trier, 4.-6. Maerz 1999 </title>
</titleInfo> </relatedItem>
MODS
z Still MODS heavily reflects the library culture and
vision on research information.
z Rich metadata set to adequately describe the
bibliographical aspects of a publication.
z But adequately and optimally exposing research
information involves more than just bibliographical aspects and more than just publications. E.g.
“contextual” research metadata (e.g. about the
CERIF-XML
Describe in the same record the item of which a publication, e.g., an article, a book chapter, is a part.
<cfResPublTitle>
<cfResPublId>ArtTitle4778</cfResPublId>
<cfTitle cfLangCode="EN" cfTrans="o">On the relations between ISE and structure in some
RE(Mg)SiAlO(N) glasses</cfTitle> </cfResPublTitle>
<cfResPublTitle>
<cfResPublId>JournalTitle345</cfResPublId>
<cfTitle cfLangCode="EN" cfTrans="o">JOURNAL OF MATERIALS SCIENCE</cfTitle> </cfResPublTitle>
<cfResPubl_ResPubl>
<cfResPublId1>ArtTitle4778</cfResPublId1> <cfResPublId2>JournalTitle345</cfResPublId2>
<cfClassId>is article in</cfClassId>
<cfClassSchemeId>cfResultPublication-ResultPublication</cfClassSchemeId> <cfStartDate>2001-01-01T12:00:00-05:00</cfStartDate>
<cfEndDate>2001-01-01</cfEndDate> </cfResPubl_ResPubl>
CERIF-XML
<cfPers_ResPubl>
<cfPersId>DauceR</cfPersId>
<cfResPublId>ArtTitle4778</cfResPublId> <cfClassId>is author of</cfClassId>
<cfClassSchemeId>cfPerson-ResultPublicationRoles</cfClassSchemeId> <cfStartDate>2001-01-01T12:00:00-05:00</cfStartDate>
<cfEndDate>2001-01-01T12:00:00-05:00</cfEndDate> <cfCopyright></cfCopyright>
</cfPers_ResPubl>
The link between an author and the publication is done in the
CERIF-XML
z Very strong point of CERIF-XML: all relations between
entities of a publication are done in exactly the same, uniform way: z Authors to publication z Article to journal z Chapter to book z Editors to book z Etc..
z Secondly: all these relations are at the same time semantically
described (role of a person, type of relation between
CERIF-XML
z
Reflects very strongly the relational database
culture or way of thinking:
z Mirrors the relational CERIF model into the
XML-world.
z Too fragmented: too many schema's and
namespaces (e.g. More than 10 different schema's to express an article).
z Could lead to performance issues.
z Difficult to communicate to non-experts or people
CERIF-XML
z
Need to combine schema's into a limited number,
corresponding to major research objects, e.g.: one
schema for a publication.
z CERIF task group is aware of this and currently
CERIF-XML
z Extensive set of metadata not limited to bibliographic or
publication metadata, but encompassing all aspects of
research information (including the bibliographical metadata as expressed by MODS).
z Strong point of CERIF is that it is a uniformed, standardized
MODEL which allows easy extension or addition of research
Why XML?
z We all seem to uncritically embrace XML as the
obvious format for exposing research information in the international context.
z All has to be prepared to work with the XML-based
architecture and technologies: OAI/PMH, SOA...
z Result: we all copy, transform and double store data
(e.g. from our CRIS repositories we transform a set of metadata into XML which we then upload/store in
Why XML?
z But shouldn't we ask ourselves whether all this
copying, transforming and re-storing of data in XML-format is necessary and really the way to go?
z Would it not be better should there be a solution
which only needs the original data sources and leave these intact without transforming and re-storing data somewhere else?
Towards another solution?
z
Conclusions: METIS can automatically harvest
the metadata already stored in Elsevier’s SCOPUS
database and so these do not have to be entered
separately in METIS again.
z
However up to now, METIS still stores the
harvested data in its own database, but actually
this probably should not be necessary and so we
should considering solutions for this. This brings
us to a next step.
Business Intelligence view may be inspiring
z In one sentence Business Intelligence (BI) could be
defined as: knowledge of all aspects of the business
in a comprehensive, integrated and maneagable way.
z BI-tools are softwares which supply this knowledge
(e.g. Business Objects, Jasper-Reports/iReport).
z Great consumers of BI are managers of big
companies who need to know all aspects of their business in a comprehensive manageable form (statistics, charts, diagrams, et...)
z The problem that BI is confronted with is more or
less the same as we face when talking about getting full, appropriate and integrated view on research information: the data is dispersed over various, heterogenuous resources (databases, ,
XML-repositories, files, etc..).
z There are solutions emerging that solve this problem,
in other words: which supply timely data from
heterogenuous sources in an integrated way without first copying, tranforming and storing these data in intermediate resources.
The following builds upon the ideas expressed by Rick van der
Lans, a Dutch internationally acclaimed expert on software
architecture and solutions for Business Intellingence and notably his recent publication:
Rick, F. van der Lans, Developing a Data Delivery
PlatformWith Informatica Data Services. A Technical Whitepaper on Next Generation Data Virtualization,
February 28th, 2011, Copyright © 2011 R20/Consultancy.
Business Intelligence view may be inspiring
http://vip.informatica.com/RickLans8761?elqPURLPage=6013&docid=1571&lsc=NA-Ongoing-2011Q1-JP-DI_Developing_Data_Delivery_Platform_WP_www
Federation Server
z Works with all kinds of input resources: relational
databases, data warehouses, XML-resources, Excel sheets, text files, web services, etc..
z Based on relational database concept but with virtual
tables.
z No (re-)storage of the data
z On demand (on-the-fly) transformation of incoming
Federation Server: virtualization
Mapping foreign to virtual table
Mapping XML document to virtual table
Joining Relational and XML data
Concrete Application
To conclude:
Perhaps good to explore also this kind of technologies instead of just sticking to the XML based solutions.