OKC Tools for XML Metadata
Management
Marlon Pierce
Overview
•
We discuss systems we have built for managing XML
metadata.
•
Applications include
– Newsgroups
– Bibtex-based citation managers
– Glossary term and abbreviation managers – RIB compatible browsers
•
Running demos available from www.xm
lnuggets.org.
•
Downlo
ads of revised newsgroup application available
soon.
•
Challenge: promote scientific metadata usage
Parts of the System
•
Each application has one or more XML schemas that serve
as a data model.
•
The general system contains the following components:
– Form wizards for creating valid XML instances for a particular application.
– Publishers or “feeders” that post messages into the system. – Unique URI generators for storing each message.
<?xml version="1.0"?>
<rss version="0.91" xmlns:cg="http://grids.ucs.indiana.edu/okc/schema/cg/ver/1"> <channel>
<title>Community Grids Project Reports</title> <image>
<title>ptllogo</title>
<url>http://www.communitygrids.iu.edu/img/smallLOGO.gif</url> <link>http://www.communitygrids.iu.edu</link>
<description>Pervasive Technology Labs Logo</description> </image>
<Item>
<name>CORBA</name>
<URI>glossary/C/CORBA</URI>
<description>Common Object Request Broker Architecture is an open distrubuted object-computing infrastructure being standardised
by the Object Management Group.</description> </Item>
<!—Other items deleted--> </channel>
Sample Applications
Newsgroup System Features
• Email and browser-based posting.
• Supports attachments.
• Multiple topic subscriptions
• Periodic topic digests
• Multiple user privileges
–
Read through browser only
–
Post through browser only
Citation Browser
• Supports multiple schema descriptions
based on bibtex
–
Journal articles, books, book chapters,
conference proceedings, tech reports, theses
RIB Compatible Applications
• Basic system can be used
with any schema, so we
created a version using the
Basic Interoperability
Data Model (BIDM)
– Developed by the RIB team – IEEE standard
• BIDM has two important
extensions that we do not
currently support.
– Asset certification
Steps for a Metadata Generator
•
There were common tasks that we performed for each
application:
– Design an object model and create a W3C XML Schema to represent it.
– Create a memory object model of the schema, i.e. corresponding Java classes.
– Design an interface, i.e. HTML forms, for user inputs, and bind the interface with the memory model.
– Let users input data.
– Finally, generate XML based on input, and publish it.
Generating XML Form Wizards
SchemaWizard and XML
•
Schema Wizard maps XML Schema elements to
HTML form elements through its
schema parser
,
and creates the
framework and logic
for an XML
form wizard.
•
Users use newly generated wizards to create and
publish
XML instances
, which follow a schema,
to any destinations such as publish/subscribe
messaging systems or through SMTP.
•
XML form wizards are Web applications that also
serve as validating XML editors and are
Steps for a Metadata Generator
•
There were common tasks that we performed for each
application:
– Design an object model and create a W3C XML Schema to represent it.
– Create a memory object model of the schema, i.e. corresponding Java classes.
– Design an interface, i.e. HTML forms, for user inputs, and bind the interface with the memory model.
– Let users input data.
– Finally, generate XML based on input, and publish it.
SchemaWizard and XML
•
Schema Wizard maps XML Schema elements to
HTML form elements through its
schema parser
,
and creates the
framework and logic
for an XML
form wizard.
•
Users use newly generated wizards to create and
publish
XML instances
, which follow a schema,
to any destinations such as publish/subscribe
messaging systems or through SMTP.
•
XML form wizards are Web applications that also
serve as validating XML editors and are
SchemaWizard Architecture
•
The steps that take place in generating a XML form wizard
– The Schema Wizard unpacks and deploys the Web application package into a Web server’s application repository (i.e. webapps under Tomcat).
– User provides with a location of the Schema.
– The Schema is read in to create an in-memory representation (SOM) of the schema and also to create Java classes.
• SOM=Castor’s Schema Object Model
• SOM API provides a convenient interface to access the W3C XML Schema structures.
– Using the SOM, Castor SourceGenerator creates Java classes that correspond to the Schema structures. These classes form the memory model (i.e. Javabeans for JSP) and come with the necessary
framework to parse and regenerate (marshal and unmarshal) XML instances.
SchemaWizard Architecture
Castor Schema Unmarshaller
Castor Sourc Generator JavaBean s Castor SOM Sche ma Parser Velocity Template s Java Compiler
Annotated XML Schema
Web Application
Template
Librarie
s Classes JSPs
XML Form Wizard created as a Web Application
SchemaWizard Architecture
•
The steps that take place in generating a XML form wizard
(cont.)
– Using the SOM once again, SchemaParser traverses the in-memory schema and collects structure information, i.e. names, types, whether element or attribute, complex or simple type.
– Based on this information, the parser chooses what type of template will be used, stores the information in a Velocity context, and invokes the template engine to generate the program logic presented in JSP. The parser also gathers the Schema annotations, i.e. page color, input sizes, at this level and place the parameters in the context.
SchemaParse
Data Flow and Action
Traverse schema for typesCollect type information, create a context
Decide template:
Project page Index page Simple type
Enumerated simple type Unbounded simple type Complex type
Unbounded complex type
Velocity Template Engine
Castor SOM
Schema object
Individual types
Velocity context with type info
XML Schema location is given to SchemaWizard.
XML Form Wizard is generated.
Schema Annotations
•
Users can make cosmetic changes for the final project
beforehand with annotations in the schema.
•
W3C XML Schema allows developers to embed user
defined languages into the schema using <xs:annotation>
and <xs:appinfo> structures.
•
Annotations for the whole schema affects the whole page,
i.e. page title, background color, default input sizes,
leading numbers on and off, XML browsing on and off.
<xs:annotation>
<xs:appinfo source="title">SchemaWizard Output for Topics Schema </xs:appinfo> <xs:appinfo source="inputsize">30</xs:appinfo>
<xs:appinfo source="bgcolor">#e0e0ff</xs:appinfo>
<xs:appinfo source="leadingnumbers">false</xs:appinfo> <xs:appinfo source="showxml">true</xs:appinfo>
Schema Annotations
•
Annotations for individual structures override the schema
annotations, i.e. input size for each element. Also, labels
for each element can be defined, and input fields can be
changed to larger text areas with a textarea parameter and
row numbers, or to password fields by a password
parameter whose value set to true.
<xs:annotation>
<xs:appinfo source=“label">User Password</xs:appinfo> <xs:appinfo source="inputsize">15</xs:appinfo>
<xs:appinfo source=“password">true</xs:appinfo> </xs:annotation>
…
<xs:annotation>
Smaller input size
Textarea, row count set to 5
Unbounded element
with its own add/delete buttons
XML browsing turned on Title set