Relational databases - Literature study - Development of a software application utilising class

2 Literature study

2.4 Relational databases

2.4.1 Overview of relational databases

A database can most simply be defined as a collection of logically coherent related data [30]. Databases are created and populated for a specific purpose with a narrow group of intended users

who have a common interest in a specific application [30]. Relational databases store data in such a way that they are easy to use and search, making them ideal for use in scientific studies as file systems [28]. In this section relational database structures, languages and software platforms are discussed.

2.4.1 Database normalisation

The concept of the relational database was first developed by E.F. Codd in 1970, who intended to develop an efficient relational model for data [29, 30]. Codd developed the first normal form of a relational database (and later the second and third) [30], which first introduced the concept of normalization. In database design, normalisation refers to the process whereby the tables, tuples and attributes are arranged to minimise redundancy [30]. This often entail the isolation of data in separate tables that are related in some way, allowing for easy modification of data from a single entry.

Codd defined the following objectives for the first normal form normalisation process [29, 31]:

 To remove any instances of unwanted data dependencies.

 To minimise the database restructuring when additional data is included.  To make the relational database model more explanatory to users.  To remove any effect of time-varying query statistics.

2.4.2 Relational database structures

The primary purpose of relational databases is the storage of data in tables, known as relations.. This data is grouped into tuples, which make up the rows of the relations [28]. These tuples are unique i.e. each tuple consists of a number of attributes, which make up the columns of the relation. Every relation includes a unique attribute known as the Primary Key (PK), which is used purely for identification purposes. The relation structure, including tuples and attributes, is summarised in Figure 2-7 [28].

Attribute 1 Attribute 2 Attribute 3 … Attribute N

Tuple 1 … Tuple 2 … Tuple 3 . . . . . . . . . . . . . . . . . . Tuple N …

2.4.2.1 Primary and foreign keys

Often attributes in a relation may point to attributes in a separate relation, creating a link between the two. These links do not need to be unique, as multiple tuples in one relation may refer to the same tuple in another relation [32]. These tuples must be PKs. Attributes that point to PKs in separate relations are called Foreign Keys (FK). This is illustrated in the example in Figure 2-8 [32].

Relation 1 Relation 2 PK1 Attribute11 FK1 PK2 Attribute21

1 1 1

2 1 2

3 2 3

Figure 2-8: Visual representation showing PK and FK operation [32].

2.4.2.2 Link tables

Link tables refer to relations that link two or more other relations together in a manner similar to that used by PKs and FKs. The use of link tables allows for the easy repurposing, modification and expansion to the structure of an existing database, which brings the database in line with Codd's objectives from section 2.4.1.

A link table will typically have a primary key attribute, like most other relations, and two foreign key attributes. These point to the parent and child relations respectively, as illustrated in Figure 2-9, where Parent ID and Child ID are the foreign keys connecting the parent relation (Relation 1) to the child relation (Relation 2).

While PKs and FKs are specific in their connections, link tables can easily be altered to point to different parent or child relations, allowing a relational database to be modified with significantly less effort.

2.4.3 Database management systems and query languages

The purpose of database management systems (DBMSs) is to allow for the simple retrieval of data from a database, as well as to allow for the creating, populating and management of relations [32]. This is accomplished through queries to the relevant database. Searching for data requires significantly less computational resources when utilising a DBMS. Additionally, DBMSs make sharing data between multiple users possible.

It is generally undesirable to query a database using general programming languages, as these are inefficient and time consuming. Therefore a Query Language (QL) is used. The QL allows for the querying of databases without ambiguity. The user does not need to specify how a query is executed or possess technical knowledge of the database, they only needs to know which results are expected [32].The DBMS usually evaluates queries in an algebraic manner [29]. Currently there are a number of commonly used QLs, some examples of which are listed below [33]:

 SQL  LINQ  ScalaQL  SchemeQL

MySQL was selected for use as the database platform in this project as it is the most commonly used

database system available and, as such, has an extensive base of support and literature available [33]. Although it is very fast and durable, MySQL is open-source, which is preferred in this project. The platform utilises SQL, the most widely used QL [33]. Furthermore, MySQL is easily scalable and can handle small or large datasets with equal ease [33].

2.4.4 WAMP server

Windows Apache MySQL Php (WAMP) server is selected to host the database used in this project.

WAMP is used to host a server locally on Microsoft® Windows™ operating system, which allows for easy testing and development of both the project database and the project software application. Additionally, WAMP is freely available, making it ideal for use in this project.

In document Development of a software application utilising classical efficiency theory, regression and Data Envelopment Analysis in the evaluation of thermal power plant performance. (Page 41-44)