• No results found

2 PROBLEM DEFINITION

2.4 Data Transfer and Storage

Information required for geoprocessing comes from one table (HerpColl) in the collection database. The location description field’s content is input for the NLP parser. Parser output provides the spatial information identified for each

location description. Additional fields’ contents are ready for use in geoprocessing with no further preparation; these provide location-related attributes. Information from parser output and unaltered field content are assembled into a structure that is suitable for use as geoprocessing input. In considering suitable input formats, it is reasonable to take into account that the parsing/assembling procedures and the geoprocessing operations may not be run by the same individual or group. Additionally, operations may not occur on the same machine or the same computing platform. Accordingly, a data storage format is needed that is suitable for a collaborative and cross platform

environment.

The extensible markup language (XML) format is selected for data storage of this intermediate geocoding data. As such, it will convey geoparser output to

geoprocessor input in a format that is compatible with a variety of computing platforms and environments. XML is well suited for this role for several reasons. It is designed to store data (rather than display it) and can incorporate a

document type definition that provides a schema with data type definitions. Further, it is self-documenting in tagged text format; data elements and their assigned functional category (tag) can easily be read and understood by users. The hierarchical structure reflects the organization of the basic spatial

information inherent in traverse type descriptions and that is required for geoprocessing data input requirements.

Data stored in well-formed XML provides improved opportunity for

collaborative efforts between those working in geoparsing and geocoding. The XML schema developed for this project is presented immediately below as a structure only. Examples follow that populate elements with content. The XML schema is fully presented and explained in Appendix 2.

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE Location_Descriptions [

<!ELEMENT Location_Descriptions (Location_Description)>

<!ELEMENT Location_Description (Record_Number,Locality_Text,County_Name,

Elevation,Points_Of_Beginning,Segments)>

<!ELEMENT Record_Number (#PCDATA)> <!ELEMENT Locality_Text (#PCDATA)> <!ELEMENT County_Name (#PCDATA)>

<!ELEMENT Elevation (Elevation_Value,Elevation_Unit)> <!ELEMENT Elevation_Value (#PCDATA)>

<!ELEMENT Elevation_Unit (#PCDATA)>

<!ELEMENT Points_Of_Beginning (Place,Place_and_Route,Route_and_Route)> <!ELEMENT Place (PlaceName)>

<!ELEMENT PlaceName (#PCDATA)>

<!ELEMENT Place_and_Route (PlaceName,TravelRoute)> <!ELEMENT TravelRoute (#PCDATA)>

<!ELEMENT Route_and_Route (TravelRoute,CrossRoute)> <!ELEMENT CrossRoute (#PCDATA)>

<!ELEMENT Segments (Segment)>

<!ELEMENT Segment (Segment_Type,Direction,Distance,Remarks)> <!ELEMENT Sequence_Number (#PCDATA)>

<!ELEMENT Segment_Type (#PCDATA)> <!ELEMENT Direction (#PCDATA)>

<!ELEMENT Distance (Distance_Value,Distance_Unit)> <!ELEMENT Distance_Value (#PCDATA)>

<!ELEMENT Distance_Unit (#PCDATA)> <!ELEMENT Remarks (#PCDATA)>

]> <Location_Descriptions> <Location_Description> <Record_Number></Record_Number> <Locality_Text></Locality_Text> <County_Name></County_Name> <Elevation> <Elevation_Value></Elevation_Value> <Elevation_Unit></Elevation_Unit> </Elevation> <Points_Of_Beginning> <Place> <PlaceName></PlaceName> </Place> <Place_and_Route> <PlaceName></PlaceName> <TravelRoute></TravelRoute> </Place_and_Route> <Route_and_Route> <TravelRoute></TravelRoute> <CrossRoute></CrossRoute> </Route_and_Route> </Points_Of_Beginning> <Segments> <Segment> <Segment_Type></Segment_Type> <Direction></Direction> <Distance> <Distance_Value></Distance_Value> <Distance_Unit></Distance_Unit> </Distance> <Remarks></Remarks> </Segment> </Segments> </Location_Description> </Location_Descriptions>

Examples of specimen data from the LACM herpetology collection encoded in this XML format are provided below. Collection table fields contributing the data are identified (italicized in parentheses) in the leading descriptive texts. All other elements are parser output derived from the descriptive text. The first example is record (IDKey) number 809 with a locality description (Locality) of “3 mi S. Victorville, San Bernardino Nat'l Forest.” This is a basic Euclidean traverse style of description with a place name for a point of beginning. County and Elevation fields from the HerpColl table provide “San Bernardino” and “1829 ft” respectively.

The second is record number 2043 with a locality description of “9 mi SE Mecca on Hwy 195.” This is a route traverse style of description with the intersection of a place name and a highway for a point of beginning. No value is listed for elevation in the database.

<Location_Description>

<Record_Number>809</Record_Number>

<Locality_Text>3 mi S. Victorville, San Bernardino Nat'l Forest</Locality_Text>

<County_Name>San Bernardino</County_Name> <Elevation> <Elevation_Value>1829</Elevation_Value> <Elevation_Unit>ft</Elevation_Unit> </Elevation> <Points_Of_Beginning> <Place>

<PlaceName>Victorville</PlaceName> </Place>

</Points_Of_Beginning> <Segments>

<Segment>

<Segment_Type>EuclideanTraverse</Segment_Type> <Direction>S</Direction> <Distance> <Distance_Value>3</Distance_Value> <Distance_Unit>mi</Distance_Unit> </Distance> </Segment> </Segments> </Location_Description>