• No results found

2.4 Source Collections Details

2.4.2 Source Collection Persistent Properties

The persistent properties of a Process Target are the properties of the object that are centrally stored in a database. These properties can be grouped in the following types, a categorization that is especially important for Source Collections:

ˆ Progenitors: Other Process Targets from which the catalog represented by this Source Collection is derived, often other Source Collections.

ˆ Process Parameters: Parameters that influence the processing. The pro- genitors and process parameters together uniquely define the catalog that the Source Collection represents.

ˆ Processing Results: Results of processing the Source Collection, or a reference to the location where the results are stored.

ˆ Processing Status: The status of the processing.

ˆ General Information: Properties that are not related to the processing or the processing results. E.g. identifiers of the object, a human readable name of the Source Collection, a reference to its creator, etc. Some of these can be specified by the user, others are set automatically by the information system.



Example: The Source Collection in figure 2.3a with B as identifier has a ‘Fil- ter Sources’ operator, one parent Source Collection (A) and query as process parameter. The listed composition of sources and attributes is not part of the definition, but is considered a result of the processing.

Persistent Properties: Progenitors

The progenitors of a Source Collection are other Process Targets from which the catalog data is derived. The focus in our research is on Source Collections with other Source Collections as progenitors.

Some progenitors can also be seen as process parameters: they influence the pro- cessing, instead of representing the data set on which the processing takes place. To distinguish between these, we sometimes use the term parent to indicate progenitors that should be regarded as precursors to a Source Collection. For example, we can say that the catalog represented by a Source Collection is derived from its parents,

using the process parameters and other progenitors.

The distinction between these is relevant when copying Source Collections. When a Source Collection is copied to apply the same operation to another data set, then the copy will have different parents, but the same process parameters. Vice versa, when a Source Collection is copied to change the behavior of the operator, then the copy will have the same parents, but different process parameters. In these scenarios, the other progenitors should be treated the same as the process parameters. For example, see the Select Sources Source Collection in section 2.5.4.

Persistent Properties: Process Parameters

The process parameters are free parameters that are used to tune the behavior of the operator. These are usually primitives, such as character strings and numbers.

Persistent Properties: Processing Results

The result of processing a Process Target can be stored persistently. For the Source Collections it is important that the processing results are split up in distinct compo- nents that, in principle, can be derived and stored separately. The following results can be distinguished:

ˆ The catalog the Source Collection represents: the values of all the attributes for all the sources. This is the primary processing result and can be decomposed in the partial results that follow.

ˆ A partial catalog: the values of the attributes for a subset of the sources or attributes.

ˆ The set of sources the Source Collection represents, which can be seen as a list of identifiers of the sources. This can be further split up into the number of sources, or an identification of the set without actually enumerating all the sources individually.

ˆ The set of attributes of the sources. That is, which physical properties the Source Collection represents, not the actual values of the attributes.

To process a Source Collection partially, a new Process Target is created that only represents the required component, which is subsequently processed in its entirety. Such a component is either stored in its entirety or not at all, and can be shared between Source Collections.

Implementation Detail: In our Astro-WISE implementation a catalog can be stored in subsets of sources and the composition of sources or attributes of a Source Collection can be stored. It is not possible to store a subset of the attributes, for bookkeeping reasons. Furthermore, the number of sources can be derived separately, but not stored on its own.

Persistent Properties: Processing Status

These properties contain information about the status of the processing of the Source Collection. For example, which processing results are stored, whether they are still considered valid, etc.

Persistent Properties: General Information

Both persistent and transient Source Collections require unique identifiers. The iden- tifiers of persistent Source Collections are used to share or publish data; those of transient Source Collections are required for interaction between the information sys- tem and the scientist or auxiliary software such as visualization.

The information system can include persistent properties with other information, such as a creation date, an identifier of the creator of the object, a name for the

object, etc. These are useful for users to interpret the processing result, but do not influence the processing itself.



Example: In this thesis we usually use positive integers or capital letters to refer to persistent Source Collections and negative integers or lower case letters for transient Source Collections.