Page 1 of 10
Introduction
STELLAR aims to provide tools for non-specialist users to map and extract archaeological datasets into RDF/XML conforming to the CIDOC CRM and its CRM-EH extension.
Aim of this document
The aim of this document is to give a high level overview of the STELLAR templates for mapping to the CRM-EH and CIDOC CRM and extracting mapped datasets using the STELLAR tools. It should be read in conjunction with the more detailed Manual1
STELLAR Rationale
to using the Templates and Tools and the accompanying Tutorials.
The benefits for semantic interoperability in mapping and extracting datasets to an integrating conceptual framework, such as the CIDOC CRM, are widely recognized. However, achieving mappings in practice has required specialist knowledge of the ontology and has been resource intensive. Given the complexity of the CIDOC CRM, it is also possible to make multiple valid mappings, dependent on the intended
purpose and emphasis.
From experience in the previous STAR project, we identified a set of commonly occurring patterns in the datasets and the CRM. The STELLAR internal templates express these patterns.
The current internal templates correspond to the general aim of cross searching excavation datasets for inter-site analysis and comparison. Different templates that drew on other areas of the ontology could be designed for purposes, such as project management or detailed intra-site analysis. Output from the templates can also be combined with CRM RDF produced by other mechanisms. The RDF output is produced in a form that allows subsequent expression as Linked Data.
The STELLAR tools convert archaeological data to RDF conforming to the CRM in a consistent manner, without requiring detailed knowledge of the underlying ontology. Various commands are available. To generate RDF, the user chooses a template for a particular data pattern.
Some internal templates are expressed in terms of the English Heritage CRM-EH archaeological extension to the CIDOC CRM. There are also more general CIDOC CRM templates conforming to the CLAROS Project format. Additionally there is a template allowing a glossary/thesaurus connected with the dataset to be expressed in SKOS standard format, which allows controlled data items to be linked via SKOS. Whilst the existing internal templates guarantee consistent and valid output, they lack flexibility – any required changes to the internal templates necessitate rebuilding of the STELLAR application. With this in mind STELLAR (additionally) facilitates
Page 2 of 10
user defined external templates for converting data to any user-defined textual form. There is a short tutorial available via the project website detailing creation and use of user defined templates.
The current set of internal templates focus on key elements of published excavation data: Contexts, broader interpretive Groups (of contexts), Finds, Samples and their associated attributes.
STELLAR tools
STELLAR.Console is a downloadable command line utility application that performs a variety of data manipulation and conversion tasks. Files of delimited tabular data (TAB, CSV) can be imported and consolidated to an internal database then queried using SQL. When converting to RDF, the user specifies which template to apply. The user also supplies a file with the SQL commands that will generate the required input for the given template from the internal database. Batch processing is possible with STELLAR.Console, which has a wide choice of methods for expressing the inputs to the templates.
STELLAR.Web is a simpler browser-based application that performs a subset of the STELLAR.Console functionality using the same internal templates. It allows RDF to be produced directly from CSV data, for situations when users have their own means of producing the initial tabular delimited data (CSV files). The user specifies which template to apply and the CSV column names match the required input for the given template.
Each template is a composition of a set of optional elements with a mandatory ID. It is possible to specify some of the input elements and omit others. Thus not all the columns in a given database Finds Table, for example, may be mapped with the current set of templates. However, key elements for cross search purposes can be mapped.
Users (data providers) select an appropriate template and provide it with the appropriate data input. This data is either a SQL file (for the Console tool) or a previously prepared CSV file (For the Web tool).
Choosing a template corresponds to making a mapping to the CRM and CRM-EH entities associated with the template. The user provides the input required for the chosen template, choosing which of the optional elements to supply. See the manual for details on the tools and for the CRM and SKOS templates. This document goes on to describe the CRM-EH templates, although the general principles also apply to the other templates.
CRM-EH STELLAR Archaeological Templates
These templates correspond to a core set of elements within the CRM-EH, a subset for STELLAR purposes. The CRM-EH was originally designed to encompass a range of archaeological activities, including excavation processes but also covering finds recording, analysis and conservation; sampling; and environmental processing, etc. Some parts of the model are therefore more appropriate for intra-site analysis than for STELLAR immediate purposes.
Page 3 of 10
The STAR project, held a number of workshops to review user needs and requirements for cross-search and interoperability between project datasets from different organisational systems. We identified four key concepts involved in archaeological activity and have developed a data extraction template with related data for each concept.
STELLAR focuses on the shared archaeological concepts that enable searching between different sites (or projects) that have used the practice of ‘single context recording’ to record individual units of archaeological significance and the stratigraphic relationships that hold between those Contexts. The STELLAR templates also cover the ‘grouping’ of contexts into larger Groups of either shared structural or morphological significance for synthesis and analytical reporting or phasing purposes. STELLAR also includes the general processes of identification, typology and dating of particular Finds objects, along with the taking and recording of different types of Samples and various notes associated with the different concepts. Attributes include materials, measurements, time periods, location, IDs, Notes, Types of element, etc. Attributes that are not present in particular data sets can be omitted. The templates are further described in the Manual, which gives specific details of the template parameters and figures of the CRM-based RDF.
Page 4 of 10
The elements of each template are now described in turn, together with typical patterns in archaeological datasets and examples derived from STAR and STELLAR project work. The ID element is mandatory with the other elements being optional.
Contexts
The eleven column headings used in the STELLAR template for Context data are shown in Table 1 below with examples of the sorts of data involved. There is a choice of template for context_type: the data relationships can either be implemented using (standardized) vocabulary terms or, if available, using the URI of the controlled type in the appropriate SKOS online glossary.
STELLAR CRMEH_CONTEXTS Template column name Example data Context_id maps to the field containing the context number given to
each individual stratigraphic unit.
110 Context_note maps to a free text descriptive field, usually the one that
most clearly describes and explains the general nature of the context and how it is distinguished from other contexts. Multiple note fields can be supplied for the same Context_id.
free text description of the context. Context_type maps to the broad general type assigned to the context
and will ideally be taken from a controlled vocabulary/glossary of context types. Usually a user would use either context_type (a word) or context_type_uri (URI - preferable).
Hearth
Context_type_uri maps to the URI (online identifier) of the controlled type for the Context_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose.
Context_location This could be a number of spatial referencing
systems. For STELLAR Linked Data purposes we have opted simply for a single X,Y,Z point based on WSG84 coordinates, following MIDAS quickpoint syntax
e.g. x/y/z
Context_period maps to the broad period assigned to a context for dating purposes - may be in period name or year/year format (e.g. Roman or 43/410). Use negative numbers for BC dates.
Roman
or
43/410 Within_context_id maps to the field containing the context identifier
(number) of any context, such as a Cut, that contains the current context. 120 Within_group_id maps to the field containing the group identifier
(number) of any group, such as a Building group, that contains the current context.
12500
Within investigation_id maps to the field containing the specific (and unique) event identifier (name) of the particular investigation event, or project that contains the current context.
MOLAS. ROP95 Strat_lower_id maps to the field containing the context number(s) of
any contexts that are directly stratigraphically below the current context. Using CRM relationship p120 (along with p114 – see below) the
stratigraphic relationships of every context on a specific site can be logically represented.
112
Page 5 of 10
any contexts that are recorded as stratigraphically equivalent to the current context. Using CRM relationship p114 (along with p120 – see above) the stratigraphic relationships of every context on a specific site can be logically represented.
Table 1: Column names and mappings used by CRMEH_CONTEXTS template
Finds
The nine column headings used in the STELLAR template for Finds data are shown in table 2 below with an explanation of what fields they should map to and examples of the sorts of data involved. There is a choice of template for find_type and
find_material: the data relationships can either be implemented using (standardized) vocabulary terms or, if available, using the URI of the controlled type in the
appropriate SKOS online glossary.
STELLAR CRMEH_FINDS Template column name Example data Find_id maps to the field containing the find identifier (number) given
to each individual find object.
SF105 Find_note maps to a free text descriptive field, usually the one that most
clearly describes and explains the general nature of the find and how it is distinguished from other finds. Multiple note fields can be supplied for the same Find_id.
Free text description
Find_type maps to the broad general type assigned to the find (e.g. coin) and will ideally be taken from a controlled vocabulary/glossary of find types (e.g. MDA Objects Thesauri). Usually a user would use either find_type (a word) or find_type_uri (URI - preferable).
blade
Find_type_uri maps to the URI (online identifier) of the controlled type for the Find_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR
SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose.
Find_material maps to the broad general type assigned to the find material and will ideally be taken from a controlled vocabulary/glossary of material types. Usually a user would use either material_type (a word) or material_type_uri (URI - preferable).
Iron
Find_material_uri maps to the URI (online identifier) of the controlled type for the Material_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose.
Within_context_id maps to the field containing the context identifier (number) of any context, such as a pit fill, that contains the current find object.
110
Production_period maps to the broad period (or spot date) assigned to a find for dating purposes – may be in period name or year/year format. Use negative numbers for BC dates
Roman
or
43/410 Within investigation_id maps to the field containing the specific (and
unique) event identifier (name) of the particular investigation event, or
MOLAS. ROP95
Page 6 of 10
project that contains the current find.
Table 2: Column names and mappings used by CRMEH_FINDS template
Samples
The six column headings used in the STELLAR template for Samples data are shown in table 3 below with an explanation of what fields they should map to and examples of the sorts of data involved. There is a choice of template for sample_type: the data relationships can either be implemented using (standardized) vocabulary terms or, if available, using the URI of the controlled type in the appropriate SKOS online glossary.
STELLAR CRMEH_SAMPLES Template column name Example Data Sample_id maps to the field containing the sample identifier (number)
given to each individual sample.
80714 Sample_note maps to a free text descriptive field, usually the one that
most clearly describes and explains the general nature of the sample and how it is distinguished from other samples. Multiple note fields can be supplied for the same Sample_id.
Vegetation history of valley Sample_type maps to the broad general type assigned to the sample
(e.g. dendrochronology) and will ideally be taken from a controlled vocabulary/glossary of sample types. Usually a user would use either sample_type (a word) or sample_type_uri (URI - preferable).
Pollen
Sample_type_uri maps to the URI (online identifier) of the controlled type for the Sample_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR
SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose.
Within_context_id maps to the field containing the context identifier (number) of any context, such as a pit fill, from which the identified sample was taken.
110
Within investigation_id maps to the field containing the specific (and unique) event identifier (name) of the particular investigation event, or project that contains the current sample.
MOLAS. ROP95 Table 3: Column names used by CRMEH_SAMPLES template
Groups
The eight column headings used in the STELLAR template for Groups data are shown in table 4 below with an explanation of what fields they should map to and examples of the sorts of data involved. There is a choice of template for group_type: the data relationships can either be implemented using (standardized) vocabulary terms or, if available, using the URI of the controlled type in the appropriate SKOS online glossary.
Page 7 of 10
data Group_id maps to the field containing the group number given to each
uniquely identified group of associated contexts.
12500 Group_note maps to a free text descriptive field, usually the one that
most clearly describes and explains the general nature of the group and how it is distinguished from other groups. Multiple note fields can be supplied for the same Group_id.
C4th timber building fronts on to road 12501 Group_type maps to the broad general type assigned to the group and
will ideally be taken from a controlled vocabulary/glossary of group types. Usually a user would use either group_type (a word) or group_type_uri (URI - preferable).
Building
Group_type_uri maps to the URI (online identifier) of the controlled type for the Group_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose.
Group_location This could be a number of spatial referencing systems. For STELLAR Linked Data purposes we have opted simply for a single X,Y,Z point based on WSG84 coordinates, following MIDAS quickpoint syntax.
e.g. x/y/z
Group_period maps to the broad period assigned to a group for dating purposes – may be in period name or year/year format (e.g. Roman or
43/410). Use negative numbers for BC dates.
Roman
or
43/410 Within_group_id maps to the field containing the group identifier
(number) of any group, such as a Building or Land Use group, that contains the current group. In CRM-EH grouping is seen as an iterative process of building sub-groups, groups or higher groups into
archaeological groupings of associated contexts for the purposes of synthesis, phasing and dating.
12502
Within investigation_id maps to the field containing the specific (and unique) event identifier (name) of the particular investigation event, or project that contains the current group.
MOLAS. ROP95 Table 4: Column names used by CRMEH_GROUPS template
Sample Measurements
The six column headings used in the STELLAR template for Sample measurement data are shown in table 5 below with an explanation of what fields they should map to and examples of the sorts of data involved. There is a choice of template for
Measurement_type: the data relationships can either be implemented using
(standardized) vocabulary terms or, if available, using the URI of the actual controlled type in the appropriate SKOS online glossary.
STELLAR CRMEH_SAMPLE_MEASUREMENTS Template column name
Example data Sample_id maps to the field containing the sample identifier (number) 80714
Page 8 of 10
given to each individual sample from which the sample measurements are taken.
Measurement_type maps to the broad general type assigned to the measurement and will ideally be taken from a controlled
vocabulary/glossary of measurement types. Usually a user would use either measurement_type (a word) or measurement_type_uri (URI - preferable).
Flotation volume
Measurement_type_uri maps to the URI (online identifier) of the controlled type for the Measurement_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose. Measurement_unit maps to a field that specifies the units used to record the measurements in Measurement_value
kg Measurement_unit_uri maps to the URI (online identifier) of the
controlled type for the Measurement_unit, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI. The STELLAR SKOS_CONCEPTS template converts a database glossary into a SKOS vocabulary, which can provide URIs for this purpose. Measurement_value maps to the field with the actual value of the measurement
2 Table 5: Column names used by CRMEH_SAMPLE_MEASUREMENTS template
Investigation Projects
The column headings used in the STELLAR template for Investigation Project data are shown in table 6 below with an explanation of what fields they should map to and examples of the sorts of data involved. In the case of Investigation type the data relationships can either implemented using standardized vocabulary terms or, if available, using the URI of the controlled type in the appropriate SKOS online glossary (i.e. the NMR Events Thesaurus).
STELLAR CRMEH_INVESTIGATION_PROJECTS Template column name
Example data Investigation_id maps to the field containing the specific (and
unique) event identifier (name and/or number) of the particular investigation event, or project that the data set derives from.
MOLAS. ROP95 Investigation_type maps to the general type of investigation event
that resulted in the data set that is referenced and will ideally be taken from a controlled vocabulary/glossary of event types, such as the EH Event Types Thesaurus. Usually a user would use either investigation_type (a word) or investigation_type_uri (URI - preferable).
Excavation
Investigation_type_uri maps to the URI (online identifier) of the controlled type for the Investigation_type, where that exists. This might be a concept in a SKOS thesaurus, or another form of unique URI.
Page 9 of 10
Investigation_timespan Will give the start and end dates of the particular investigation event, or project, that this data derives from.
01-03-2005/ 30-04-2005 Investigation_description a free text descriptive field, usually the
one that most clearly describes and explains the general nature and archaeological features of the investigation site and how it is distinguished from other sites. Multiple note fields can be supplied for the same Investigation_id.
Free text summary describing the main archaeological characteristics of the site Investigation_location An overall general spatial location reference
for the specific Area of Investigation that the rest of the data derives from. This could be a number of spatial referencing systems. For STELLAR Linked Data purposes we have opted simply for a single X,Y,Z point based on WSG84 coordinates, following MIDAS quickpoint syntax..
e.g. a site centroid x,y,z
Table 6: Column names used by CRMEH_INVESTIGATION_PROJECTS template
Page 10 of 10
References
STELLAR Guide, Tutorial, Tools.
http://hypermedia.research.glam.ac.uk/resources/STELLAR-applications/ STELLAR Project website. http://hypermedia.research.glam.ac.uk/kos/stellar/ CIDOC Conceptual Reference Model (CRM), http://cidoc.ics.forth.gr
CRM-EH: English Heritage Extension to CRM for the archaeology domain, http://hypermedia.research.glam.ac.uk/kos/CRM/
Crofts N, Doerr M, Gill T, Stead S, Stiff M, Definition of the CIDOC Conceptual Reference Model. http://www.cidoc-crm.org/docs/cidoc_crm_version_5.0.2.pdf English Heritage http://www.english-heritage.org.uk/
RDFS Encoding of the CIDOC CRM, http://cidoc.ics.forth.gr/rdfs/cidoc_v4.2.rdfs English Heritage http://www.english-heritage.org.uk/
Doerr, M.: The CIDOC Conceptual Reference Module: an Ontological Approach to Semantic Interoperability of Metadata. AI Magazine, 2493, 75--92 (2003)
Cripps P, Greenhalgh A, Fellows D, May K, Robinson D. 2004. Ontological Modelling of the work of the Centre for Archaeology. http://www.cidoc-crm.org/docs/Ontological_Modelling_Project_Report
SKOS: Simple Knowledge Organization Systems - W3C Semantic Web Deployment Working Group, http://www.w3.org/2004/02/skos
STAR: Semantic Technologies for Archaeological Resources, http://hypermedia.research.glam.ac.uk/kos/star