• No results found

Schemas Supporting Physical Data Storage

N/A
N/A
Protected

Academic year: 2021

Share "Schemas Supporting Physical Data Storage"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Schemas Supporting Physical Data Storage Introduction

A RAQUEL DB is made up of a DB Schema, which itself consists of a set of schemas. These schemas fall into two subsets. One subset supports the

relational model in that every schema in it consist of a set of relational variables (=

relvars). The other subset supports the physical storage of the DB‟s data in that every schema in it consists of a set of members that are to do with physical data storage. Only the physical data storage subset is considered here.

Factors Determining Physical Data Storage

In order to understand the physical data storage schemas, it is first necessary to understand the storage factors recognised and dealt with by the RAQUEL DBMS via its storage schemas. There are three factors and hence three different types of schema, one to handle each factor. The three factors are :

1. The relationship of real relvar values to the relvalues that are actually physically stored.

Traditionally a real relvar‟s complete relvalue would be stored as a single, coherent volume of physical data. However this need not be the case. For example, a relvar‟s value may be split into several fragments – each of which is a relvalue – and each fragment stored as a single, coherent volume of physical data. Alternatively the relvalues of 2 or more relvars could be merged into a single relvalue which is then stored as a single, coherent volume of physical data. A combination of the fragmentation and merging approaches could also be applied.

Therefore RAQUEL stores the relvalues of „stored relvars‟ rather than real relvars. A stored relvar is a fragment of a real relvar, or the merge of 2 or more real relvars, or the result of the fragmentation and merging of real relvars.

A relational algebra expression, called a „Storage Expression‟, is used to define the relationship of a real relvar to the stored relvar(s) whose value(s) are used to derive the real relvar‟s value. The storage expression, whose only variables must be stored relvars, is assigned to the real relvar.

It is permissible if desired to assign 2 or more different storage expressions to the same real relvar. Each storage expression represents a different way of storing the real relvar‟s value. The real relvar‟s value is stored according to each storage expression, and the RAQUEL DBMS is responsible for ensuring consistency between them.

2. The locations and names of the physical storage units that hold relvalue data.

A „Storage Unit‟ is a named storage vessel in a particular location that holds the data which is the value of a stored relvar. It is permissible if desired to hold a stored relvars‟ value in multiple storage units, each typically in a different location.

The term „location‟ is used in a very general sense to maximise its

(2)

computers, and/or multiple storage systems managed by a single computer.

Reasons for distribution typically include the need to store data near where it is used, because one computer installation is insufficient to store all the physical data of a DB, and to maintain multiple copies of relvalues for backup and recovery purposes.

Where multiple storage units are used, the RAQUEL DBMS is responsible for ensuring that they all always hold the same relvalue, even if a different kind of physical storage mechanism is used by each copy‟s storage unit.

3. The nature of the physical storage mechanisms used to store relvalue data.

Each storage unit has a „Storage Mechanism‟ or „Storage Facility‟

associated with it. The storage mechanism provides the means by which a relvalue is stored in the storage unit. Traditionally a storage mechanism consists of one or more software utilities designed in conjunction with one or more computer file designs so that the software utility/ies move data into and out from a file(s) of that design; examples are hash files, sequential files and index sequential files. In RAQUEL the idea of a storage mechanism is

generalised to permit any kind of physical storage; so it can include another DB, a specialised hardware device, etc.

The three factors take no account of any need to store data in encrypted form.

Encrypted data can arise on 2 levels, the logical and the physical.

Encrypted data at the logical level means that the relvalue of a real relvar is in encrypted form. So en/decryption takes place either on input to/output from the DB or within the application(s) that use(s) the encrypted relvalues. In both cases this does not affect the physical storage of data within the DB, even if the DBMS carries out the en/decryption.

Encrypted data at the physical level means that a logical relvalue is

encrypted before being stored, and inversely the stored form is decrypted before being retrieved; i.e. encryption is incorporated into the process of physically storing data and decryption into the process of physically retrieving data. En/decryption can be applied before the value of a real relvar is mapped to the value(s) of stored relvar(s) or after. In the former case the enc/decryption process may well affect how the value of the real relvar is allocated to the value(s) of stored relvar(s). In both cases the nature of the physical storage facilities used may be affected by the en/decryption process.

However the same three factors apply regardless of whether data is stored in encrypted form or not, and so data encryption is henceforth ignored.

Note that stored relvars are only used to physically store the relvalues of real relvars. They are not used to store the values of source relvars or sink relvars.

The values of source and sink relvars are stored outside the DB and not managed by the DBMS. Sources and sinks are the relational interface of locations outside the DB from and to which respectively relvalues are transmitted. (So again any en/decryption of source and sink relvar values will operate outside the jurisdiction of the DBMS, which will be unaffected by it).

(3)

Physical Storage Schemas

Physical storage schemas fall into 2 categories, system schemas and default schemas. A system schema is created automatically as part of the creation of a DB Schema and has a standard name which cannot be altered. A default schema is either created as part of DB Schema creation with a default name which can later be changed, or is optionally created after DB Schema creation with a name which is determined at creation time and can also be changed later if desired.

The automatic creation and use of system schemas minimises the work needed to provide the physical storage of data. While default schemas are similarly

supportive, they permit flexibility in the way that data storage is spread over one or more physical computer locations. Note that flexibility as regards the means by which data is physically stored is also provided by being able to plug different kinds of physical storage mechanisms into the DBMS; these mechanisms are referenced by the relevant storage schemas and do not affect the DBMS schema architecture.

The physical storage schemas are : The Storage Schema

This schema is a system schema.

It is the set of all the stored relvars held in the entire DB. A stored relvar is one whose value is directly physically stored as a single-valued entity. Hence the Storage Schema holds all the data in the DB; in this sense it corresponds to the Logical Schema. However the Storage Schema consists of stored relvars whose values are actually physically stored, whereas the Logical Schema consists of real relvars that are the basis of what is available to an application program that accesses the DB.

A real relvar may also be a stored relvar – indeed this is the default – but a real relvar may also have its value stored in fragments, merges, or some

combination of the two. The intersection of the Logical and Storage Schemas contains those relvars that are both real and stored.

It is also possible for a real relvar to have two or more copies of its value held via stored relvars, which the RAQUEL DBMS must automatically maintain in a consistent state. Copies may use the same or different arrangements of stored relvars.

Named Stack Schemas

A Stack Schema is a default schema.

The purpose of a Stack Schema is to manage the physical storage at a particular „computer location‟ of that data that comprises the values of a set of stored relvars. (A computer location is a specific computer or a specific storage device attached to a specific computer).

Since one can expect a DB to contain at least one real relvar, there must be at least one Stack Schema to handle the storage of physical data. So when a DB Schema is created, by default a single Stack Schema is created with a default name at a default computer location (which is typically the computer installation on which the DBMS is installed).

(4)

Each Stack Schema consists of a set of 2 system schemas, which are : 1. A Location Schema.

2. A Physical Schema.

When a Stack Schema is created, its 2 member schemas are automatically created within it. The member schemas are empty when created.

In a Unix/Linux environment, the assignment that creates a Stack Schema takes a parameter that consists of a path name. The path name defines where the „computer location‟ is with respect to the RAQUEL DBMS. Therefore it could contain a URL if the stack were located on another, networked computer.

A Location Schema

This schema is a system schema.

Each Location Schema comprises a set of stored relvars, namely those whose values are stored at the particular computer location of a Storage Stack.

Thus a Location Schema is a subset (not necessarily a proper subset) of the Storage Schema.

A stored relvar must appear in at least one Location Schema for its value to be physically stored, but it may appear in two or more Location Schemas (in different stacks), in which case the RAQUEL DBMS must automatically

maintain the copies in a consistent state. A stored relvar has the same name in each of the Location Schemas in which it appears. Different copies of the stored relvar are differentiated by the Location Schema in which they exist; only one copy can exist in a Location Schema.

A Physical Schema

This schema is a system schema.

Each Physical Schema consists of a set of physical storage specifications, one for each of the stored relvars in the associated Location Schema. Each specification consists of :

A type of physical storage mechanism or facility (e.g. an index sequential file or a hashed file).

The name of a storage unit (e.g. a file or hardware device). This is defined to be that of its associated stored relvar with the suffix “St”.

(It is not a default that can be changed).

The specification denotes that a particular storage mechanism is used to store the value of a particular stored relvalue in a particular storage unit.

The Schema Architecture Viewed as Layers

The schemas can be considered as forming a layered architecture :-

(5)

Here the diagram assumes that there just happen to be two storage stacks. There could be more, as required.

If one were to add in the relational model schemas lying immediately above, and the actual physical data storage managed by each stack, then the complete RAQUEL schema architecture would be :-

Note that this architecture does not preclude the addition of „materialised views‟. In RAQUEL, a materialised view is a method of directly storing the relvalues of virtual relvars (and so is expressed via the formal data storage model, not the logical relational model).

The Schema Architecture Viewed as Sets

The schema architecture can also be considered as forming an inter-related collection of sets.

Storage Schema Stack 1

Location Schema Physical Schema

Stack 2 Location Schema Physical Schema

Subschema Subschema Subschema

Logical Schema Virtual Schema Source Schema Sink Schema Storage Schema

Stack 1 Location Schema Physical Schema

Stack 2 Location Schema Physical Schema

Data Data

(6)

The following small DB, expressed via a Venn diagram, illustrates this, where Rn represents a real relvar, Sn represents a stored relvar, and SnSt represents the physical storage specification of a stored relvar :-

R1 R2 R3

R4

S1 S2 S3

S1St S2St S2St R4St S3St

In the above example, R4 is a both a real relvar and a stored relvar.

Real relvars R1, R2 and R3 have their values contained in stored relvars S1, S2 and S2. There cannot be a 1 : 1 relationship between the 3 real relvars and the 3 stored relvars, because that would mean that these real relvars were also stored relvars. It may be that one real relvar uses two stored relvars (say a Join or Union of them) to hold its value, while two real relvars use one stored relvar to hold their value (each being, say, a Projection or Restriction of the stored relvar‟s value).

The „Storage Expression‟ bound to each real relvar via an “==Equate” assignment specifies the precise relationship between the real and stored relvars.

The value of stored relvar S2 is held in both stacks, so the RAQUEL DBMS must ensure that the two values are always identical regardless of the changes in value made to S2. S2 may or may not have a different physical storage specification in each stack; this is not apparent from the identifier S2St used in the Venn diagram.

Note : it is important to distinguish between the terms “Stack Schema” and

“Storage Stack”. A “Stack Schema” is a schema in the schema architecture used in the formal data storage model of a RAQUEL DBMS whereas a “Storage Stack”

is that part of the RAQUEL DBMS software architecture that implements a particular kind of physical storage mechanism or facility.

Storage Schema Logical

Schema

Stack1 Schema

Stack2 Schema Stack2’s Location Schema

Stack2’s Physical Schema Stack1’s

Physical Schema Stack1’s Location Schema

(7)

Using the Physical Storage Schemas

The names of the system schemas within a (default) stack schema are the system names „Location‟ and „Physical‟ prefixed by the stack name.

When a real relvar is created, DBMS defaults are used to automatically create physical storage for the value of the relvar. The defaults are as follows :

1. Each real relvar becomes a stored relvar as well, and appears in the Storage Schema as well as the Logical Schema.

2. The real/stored relvar appears in the Location Schema of the default storage stack, which is either the initial stack or, if there are 2 or more stacks, a stack specified as the default from those available.

If there is only one storage stack, its Location Schema will have the same set of members as the Storage Schema.

3. A default physical storage mechanism or facility is specified.

The default mechanism to be used can vary between DBMS installations.

The storage defaults allow a real relvar to be used immediately after it has been created. However if preferred, the defaults may be overridden completely or to whatever extent is required, either immediately after the relvar‟s creation or at some later date as required.

When changing a real relvar‟s physical data storage from the default arrangement, advantage can be taken of the fact that RAQUEL treats the properties of a relvar as orthogonal to each other (although the properties must be consistent with each other). Thus a change to physical data storage can be specified in whatever is the most effective sequence – a number of sequences are possible, and none of them is guaranteed to be the most effective in all circumstances. However the sequence should make sure that relvar values held in store before the change are not lost in the process of making the change.

References

Related documents

When PWM module is operated in Complementary, Redundant and Push-pull output modes, with Independent Time Base (ITB = 1) and Independent Fault mode (IFLTMOD = 1) enabled, the PWMxH

The catalog is a key part of managing data, and ensuring that logical capacity data is migrated to physical capacity data when the original flash storage is deleted.. There is

The purpose of the agent-based models created in this thesis was to understand how (if at all) the local environment and contextual features of neighbourhoods (as defined by

Resting heart rate, the time domain measure of RMSSD, the fast Fourier transformation (FFT) frequency domain measures of HF power and LF/HF and the autoregressive (AR)

Using historical data to establish the effect of gasoline price changes on consumer vehicle choice, a predictive model has been created showing the expected switch to

In a similar study, Parvin (2004) used the Mississippi State Budget Generator (MSBG) to estimate direct and fixed costs per acre for four different cotton production systems in

Sempre existirá alguém para comprar e vender produtos, sempre existirão o comércio e os mercados. Porém o papel do marketing é fazer com que se conheça tão bem o público-alvo,

Federated Tiered Storage (FTS) allows LUNs that exist on external arrays to be used to provide physical storage for Symmetrix VMAX.. The external LUNs can be used as raw storage