Enterprise Data - Semantic Assistance for Data Utilization and Curation

The enterprise data consists of a superset of the bathymetric and hyperspectral use cases. A project list was presented to me, which was manually entered into an Excel spreadsheet. Each of the projects listed was an accepted proposal from a single researcher who was considered the principal investigator. This investigator was in charge of the project, and project information would be attributed to this person. For each project, we were given a list of attributes. The rst attribute, a project identier, was a pair of two separate codes. The rst numerical code was a research level of the given project, the second numerical code was the two-digit year in which the project was proposed, and the third numerical code was the index of the research proposal in terms of research level and year.

Algorithm 10 SPARQL Query: Chlorophyll Concentration Date

PREFIX proc : <http :// muckdragon . info / data / processing #> PREFIX sets : <http :// muckdragon . info / data / sets #>

PREFIX prov : <http :// www .w3. org /ns/ prov #>

PREFIX measure : <http :// muckdragon . info / hyperspectral / tags / measure #> PREFIX substance : <http :// muckdragon . info / hyperspectral / tags / substance #> CONSTRUCT {

? file prov : startedAtTime ? start ; prov : endedAtTime ? end } WHERE {

? variable prov : wasGeneratedBy / prov : used / proc : produces ? product ; measure : hasMeasurementTag measure : concentration ;

substance : hasSubstanceTag substance : chlorophyll ; sets : isContainedIn ? file .

? file prov : endedAtTime ? end ; prov : startedAtTime ? start }

Project identier Scientist identier

Type of data (e.g. temperature, bathymetry) Sensors used in experiment

Purpose of data

Location where data was acquired Title of project

4.6.1 Generate Enterprise Data from Spreadsheet

This enterprise data is relatively useless without a way to connect it with the actual data generated by a given project. For this, it was required to convert the spreadsheet into an RDF database, using an autogenerated vocabulary in the RDF translation software. Due to condentiality reasons, the numerical indices referring to the given scientists could not be dereferenced into names or other identifying information. Therefore, the generated enterprise data was integrated manually with a small subset of actual project data (namely, the bathymetry and hyperspectral use cases). This small subset of linking information was stored in enterprise-link.ttl, and the real converted enterprise data was stored in enterprise.ttl. An example of the converted data is shown in Table (11).

4.6.2 Merge Enterprise Data Together with Provenance Framework

After we have translated the enterprise data into the RDF format we wish to use, we needed to merge the Enterprise data with the Bathymetry data set as well as the Hyperspectral data set, otherwise we would not be able to make any sensible queries. We could do trivial tasks such as look up titles of projects, years project was achieved, or sensors used in a project, but this would be simple enough to implement in a conventional database. We wished to integrate data, the main goal of the semantic web, and tie the enterprise data into the other two research projects which we had concrete results from.

To this eect, we utilized the same Provenance Framework Application(6) which we also used in the Hyperspectral use case. We simply added a few new queries to test our integration eorts. We also added a Load External Dataset button for this use case. If this Enterprise Data dataset is made available to it, then all of the queries listed in this use case will be available to be tested.

<http :// muckdragon . info / arv / request /6.1/07/1 > rdf : type arv : Data ;

arv : scientist <http :// muckdragon . info / arv / scientist /1 >;

arv : dataType " Hyperspectral imagery ", "IR imagery ", " topographic LIDAR ", " bathymetric LIDAR ", " thermal imagery ";

arv : sensor " hyperspectral ", " NASA LIDAR ";

arv : purpose " Understand the hydrodynamic sources of features detectable in IR imagery of water surface ";

arv : location " Virginia Coast Reserve ";

arv : title " Joint LIDAR / Hyperspectral Data Collection ". Table 11: Conversion Example: Enterprise Data

Algorithm 11 SPARQL Query: Bathymetry Research Enterprise Query

PREFIX : <http :// muckdragon . info / temp / enterprise - query0001 #> PREFIX proc : <http :// muckdragon . info / data / processing #>

PREFIX prov : <http :// www .w3. org /ns/ prov #>

PREFIX sets : <http :// muckdragon . info / data / sets #> PREFIX arv : <http :// muckdragon . info / arv / schema /> PREFIX dc: <http :// purl . org /dc/ elements /1.1/ >

PREFIX measure : <http :// muckdragon . info / hyperspectral / tags / measure #> PREFIX substance : <http :// muckdragon . info / hyperspectral / tags / substance #> PREFIX category : <http :// muckdragon . info / hyperspectral / tags / category #> CONSTRUCT { ?s1 : usedSensor ? sens1 . } WHERE { OPTIONAL { ?v1 prov : wasAttributedTo ?s1 . ?r1 arv : scientist ?s1 .

?r1 arv : sensor ? sens1 .

?v1 category : hasCategoryTag category : bathymetry . ?v1 measure : hasMeasurementTag measure : depth . }

}

4.6.3 Find Scientists: Bathymetry Research, Sensors Used

A researcher studying bathymetry wants to nd any and all bathymetry related research which has previously been conducted, and what sensors were used in the research. Therefore, the researcher queries for the scientists who have been attached via enterprise data to actual projects. The query is shown in Algorithm (11).

4.6.4 List Scientists' Data: Bathymetry

A researcher knows of a particular scientist's work, for instance they have read a publication by them and would like to see what actual data sets they used in their research. This query is shown in Algorithm (12).

Algorithm 12 SPARQL Query: List Bathymetry Data by Scientist

PREFIX : <http :// muckdragon . info / temp / enterprise - query0001 #> PREFIX proc : <http :// muckdragon . info / data / processing #>

PREFIX prov : <http :// www .w3. org /ns/ prov #>

PREFIX sets : <http :// muckdragon . info / data / sets #> PREFIX arv : <http :// muckdragon . info / arv / schema /> PREFIX dc: <http :// purl . org /dc/ elements /1.1/ >

PREFIX measure : <http :// muckdragon . info / hyperspectral / tags / measure #> PREFIX subst : <http :// muckdragon . info / hyperspectral / tags / substance #> PREFIX category : <http :// muckdragon . info / hyperspectral / tags / category #> CONSTRUCT { ?v1 prov : wasAttributedTo ?s1 . } WHERE { OPTIONAL { ?v1 prov : wasAttributedTo ?s1 . ?r1 arv : scientist ?s1 .

?r1 arv : sensor ? sens1 .

?v1 category : hasCategoryTag category : bathymetry . ?v1 measure : hasMeasurementTag measure : depth . }

}

4.6.5 Find Scientists: Hyperspectral Research, and Sensors Used

A researcher in the hyperspectral eld wants to locate all scientists who performed hyperspectral research on a given data set, and also wants to know what sensors are associated with his or her studies. This is equivalent to the Enterprise query described in Algorithm (11), but is more complex as there are three dierent categorizations that we are looking for, phytoplankton absorption, particulate absorption, and sea surface temperatures.

4.6.6 List Scientists' Data: Hyperspectral

Noting some of the sensor types, the researcher queries the scientists' data which is related to the hyperspectral imaging. This allows the researcher to decide which, if any, of the listed sensors actually apply to the data being studied. This is the same as Algorithm(13), with the line in the Construct section ?v1 prov:wasAttributedTo ?s1 . rather than ?s1 :usedSensor ?sens1.

4.6.7 Review: Enterprise Data Schema

The enterprise data use case was one of the most successful use cases, as it integrated the Bathymetry and Hyperspectral use cases with an additional Enterprise schema together under one schema. This schema became automatically apparent from the union of the three schemas, and a visual representation of such schema is presented as Figure (7). From here, we are able to see the interaction between the Tags (Substance, Category, Algorithm, Sensor, Measurement) from the Hyperspectral data set, the Project Requests in the Enterprise data set, as well as the PROV-O Activities, Entities, and Agents available from both Hyperspectral and Bathymetry data sets.

In document Semantic Assistance for Data Utilization and Curation (Page 37-40)