A Practical Application of Archaeological Field Drawing Data using Semantic Web Principles
4.3 The data
4.3.1 The data from Cottam
The data from Cottam was downloaded from the project archive for Burrow House Farm, Cottam: an Anglian and Anglo-Scandinavian Settlement in East Yorkshire, held by the Archaeology Data Service (doi:10.5284/1000339) (Richards 2001c). The Archaeology Data Service (ADS) is a UK national archive for
primary archaeological data, which promotes standards, and creates best practice guidelines. Established in 1996, the ADS is based at the University of York, and
is made up of a consortium of Higher Education and related national institutions, with an advisory committee of individuals representing interests across the discipline (Archaeology Data Service 2011b). The ADS provides a wide range of services, but primarily maintains an archive of persistent, freely available archaeological data, which plays an important role in responsibly mitigating the destructive process inherent in much archaeological fieldwork.
The ADS archive contains all the primary data from the fieldwork undertaken at Cottam B. The HTML archive includes the overall research design, the Level III reports for the excavations carried out in two areas in 1993 (COT93, Area 1; COT93, Area 3), and in 1995 (COT95). It includes reports for the fieldwalking, geophysics and metal detection carried out over the site, and the relevant reports for the animal and plant evidence found during the excavation. Finds reports are also included for bone and antler, flint, iron, copper alloy and non-metallic objects, non-ferrous metal, post-Roman coins, pottery and stone. In addition to the reports available in the archive, the Burrow House Farm, Cottam archive is linked to the publication Anglian and Anglo-Scandinavian Cottam: linking digital publication and archive (Richards 2001a) published in the online journal, Internet Archaeology. The article was originally published in the Journal of The Royal Archaeological Institute, under the title Cottam: An Anglian and Anglo-Scandinavian settlement on the Yorkshire Wolds (Richards 1999) and later the electronic version
was created for Internet Archaeology as part of an experiment in electronic publication. The intention was to demonstrate how the interpretative synthesis of a journal article could be linked with the full corpus of digital data from which the synthesis was created. This would allow for analysis of the data as understood by the investigators to be published, while also providing access to the raw data for future use and interpretation, in line with best practices for the preservation of archaeological data as defined at the time (Austin et al. 2000). So both the archive and the interpretative information necessary to understand the archaeology at the site were easily accessed online for this research.
Many of the files and raw data used to create the reports are included in the archive in their original formats, including the resistivity and magnetometry data as DAT files, the geophysics plots with geo-referencing data as TIF files, the report illustrations as GIF files, the metal detector and excavation finds as JPG files and the vector drawings as DWG/DXF/DWF files. Metadata is also included to help the user understand how the files are structured. Database files are
available in archival TXT format, and the entity relationship diagrams have also been included for anyone wishing to reconstruct the various databases. The raw dataset has been published in its entirety, using non-proprietary formats whenever possible for long-term accessibility.
Figure 46: The entity relationship diagram for the context database from the Burrow House Farm,
Cottam: an Anglian and Anglo-Scandinavian Settlement in East Yorkshire archive held by the
Archaeology Data Service. (doi:10.5284/1000339) (Richards 2001c).
This research focussed primarily on the TXT files and DWG files. Vector plans are available for COT93.1, COT93.3, COT95, the Cottam B study area, and a plot of the cropmarks found at the site. The database files consist of the context data for the COT95 excavation only, but the content of the finds database includes all the work carried out at Cottam B. After evaluating all of the data available in the
Burrow House Farm, Cottam archive, the DWG file from the COT95 excavation was chosen to carry forward for use with this research, as it was the only trench containing evidence for Anglo-Scandinavian activity. The availability of the additional context data, which could be incorporated with the data from the DWG file, also made it the richer potential dataset for experimentation when compared to the COT93 datasets. The context data for COT95 includes a table with data and descriptions for each of the contexts, including tables defining the ‘later than’, ‘contemporary with’ and ‘earlier than’ spatial relationships between the contexts, and a layer/fill table. There are also tables associating sampling, photos and plans with their relevant contexts.
The COT95 plan drawing consists of vector polylines of the cuts of the major excavation contexts, with separate layers for Phase IIb and Phase III. The drawing was created for illustrative purposes only, so no context or annotation information is associated with the polylines and polygons directly. Contexts are simply
labelled with their context numbers drawn either on top of, or next to them. This makes the COT95 drawing typical of vector plans created for publication rather than analysis.
Figure 47: Plan drawing from the COT95 excavation trench. Contexts from the Period IIB: Anglian Phase B are shown in yellow, and contexts from the Period III: Anglo-Scandinavian Phase are shown in pink. From the Burrow House Farm, Cottam: an Anglian and Anglo-Scandinavian
Settlement in East Yorkshire archive held by the Archaeology Data Service. (doi:10.5284/1000339)
To make it ready for Semantic Web use, the drawing had to be cleaned and prepared. Using AutoDesk’s AutoCAD 2008, all extraneous information was removed, including the hachures and context number labels. Then all the contexts had to be identified and converted into closed polygons where necessary.
Polygons with dashed lines indicating ‘edge uncertain’ were converted to solid lines, and polylines representing contexts extending beyond the excavated area were truncated at the excavation wall to form closed polygons. Notes were made about these changes for annotation later in the process.
Although the size of the excavation trench probably did not require assigning a projection to the data, the CAD drawing was then brought into GIS using ESRI’s ArcGIS 9:ArcMap 9.3.1 to georeference and project the data. New fields were added to the annotation table for context numbers, drawing notes, the x and y coordinates for the centre point of the context, along with calculations for the area and perimeter. As no attribute data was initially part of the drawing, each context was identified by hand and its context number added to the table. Any context containing ‘edge uncertain’ data lost by making a closed polygon, or was truncated where the context extended past the edge of the excavated area (or both) was noted in the table as well. Then the relevant data from the GIS attribute table was then exported using the Geography Markup Language (GML) format using FWTools.
Figure 48: Plan drawing (in red) from the COT95 excavation trench, georeferenced and projected in ArcMap 9.3.1, showing its position relative to the Burrow House Farm buildings.
Figure 49: Plan drawing from the COT95 excavation trench as created within the GIS.
FWTools is a set of open source tools for working with GIS data created by Frank Warmerdam. It consists of several kits, including the Geospatial Data Abstraction Library (GDAL) within which is a translation library for GIS vector data called the OGR Simple Feature Library (OGR) (Warmerdam 2011). As the data from Cottam and Hungate is already in vector format, the ogr2ogr translation tool was used. Once in GML, the data was translated into CSV using a small, bespoke program written in Java by Michael Charno, called the STELLARPreloader. The STELLARPreloader converts the GML into CSV, ready for processing by the STELLAR tool (see page 21). Specifically it converts the geospatial information about each context from the GML file into the Well-Known Text
(WKT) format. The STELLARPreloader requires the context number field to be declared explicitly for the extraction, but the choice of other fields drawn from the attribute table is customisable. In the case of Cottam, the fields chosen were: Area, Perimeter, CentroidX, CentroidY, Centroid (a comma delimited concatenation of CentroidX and CentroidY), DrawNote (where changes made to the context polygons by the author were noted) and Phase. The STELLARPreloader then converted the data into CSV format for import into the next phase of transformation into RDF.