COUNT ( EXAM_MARK )

EXERCISES

According to Figure 5.1, the result of COUNT ( EXAM_MARK ) is 6. The argument to an invocation of COUNT is a relation and so in Tutorial D can be denoted by any legal relational expression. Example 5.2 gives the number of students who have scored more than 50 in at least one exam.

Example 5.2: Using relational operators with COUNT

COUNT ( ( EXAM_MARK WHERE Mark > 50 ) {StudentId} )

Note the projection over StudentId. Without that, the expression would yield the (possibly higher) number of students’ exam scripts scoring more than 50.

Now take a look at Example 5.3.

Example 5.3: Aggregate operator SUM

SUM ( EXAM_MARK WHERE StudentId = SID ( 'S1' ), Mark )

Note the second operand, Mark, being the name of an attribute of the relation denoted by the first operand. Each tuple of the first operand provides a value for this attribute and the result of the invocation is the sum of those values. The result in this example is 219, the sum of the scores obtained by student S1. Note that both appearances of the score 85 in S1’s marks are counted.

Example 5.4: MAX and MIN

MAX ( EXAM_MARK WHERE StudentId = SID ( 'S1' ), Mark )

Download free eBooks at bookboon.com

128

Two more aggregate operators are illustrated in Example 5.4. MAX returns the highest value found for the specified attribute in the given relation and MIN returns the lowest.

Points to note:

• Some aggregations can be thought of in terms of repeated invocation of some dyadic operator, which I shall call the basis operator. In the case of SUM, for example, the basis operator is addition. Because addition is commutative and associative, we could define an

n-adic form of the operator, just as we did in Chapter 4 for operators such as JOIN and UNION. If we call this operator ADD, then we would have, for example, ADD(1,4,1,5) = ((1+4)+1)+5. But those operands, 1, 4, 1, and 5, can be given in any order (thanks, in this case, to the commutativity and associativity of +), and that lack of any significance to the ordering is what allows us to define aggregate operators for relations. The lack of an ordering to the tuples of a relation militates against defining aggregate operators whose results vary according to the order in which the operands are presented. Consider string concatenation, for example. We can concatenate any number of strings together to form a single string, but the result depends on the order in which the input strings are presented. • The basis operators for MAX and MIN might reasonably be called HIGHER and LOWER,

respectively, where HIGHER(x,y) returns x unless y>x, in which case it returns y, and

LOWER(x,y) returns x unless y<x, in which case it returns y. You can confirm for yourself

that HIGHER and LOWER are commutative and associative.

• If the relation operand is empty, then the result of aggregation can be defined only if the basis operator has an identity value, defined thus: if a value i exists such that whenever i is one of the operands of a dyadic operator the result of invoking that operator is the other operand, then i is said to be an identity value under that operator. In the case of SUM, the basis operator is addition, whose identity value is zero. In the cases of MAX and MIN, the type of the result is the type of the attribute given as the second operand. The identity value of the basis operator depends on that type. If the type has a defined least value, min, such that min>v is FALSE for all values v of that type, then min is the identity under HIGHER. If

a least value is not defined, then there is no identity value under HIGHER, and MAX of the empty relation is undefined for attributes of that type. Similarly, MIN(r,a) is defined only when a greatest value is defined for the type of attribute a.

• The examples shown use a simple attribute name as the second operand, and Version 1 of Tutorial D in fact requires that operand to be a simple attribute name. In general, however, the second operand in invocations of SUM, MAX, and MIN should be allowed to be any expression of an appropriate type (obviously a numeric type in the case of SUM). Version 2 of TutorialD does indeed allow this.

Download free eBooks at bookboon.com

Click on the ad to read more

129

• The simple attribute names used in my examples are cases of open expressions, as defined in Chapter 4, Section 4.7. As in other places where open expressions are permitted, closed expressions are also permitted—allowing us to sagely observe, for example, that SUM(r,1) is equivalent to COUNT(r).

Several other aggregate operators are defined in Tutorial D. Here are some that we can now deal with summarily:

AVG ( r,x ) is equivalent to SUM ( r,x ) / COUNT ( r ) and is therefore undefined in the case where r is empty. As an exercise, the reader might like to consider whether there can be a basis operator for AVG.

GOT-THE-ENERGY-TO-LEAD.COM

We believe that energy suppliers should be renewable, too. We are therefore looking for enthusiastic new colleagues with plenty of ideas who want to join RWE in changing the world. Visit us online to ﬁnd out what we are offering and how we are working together to ensure the energy of the future.

Download free eBooks at bookboon.com

130

AND ( r,c ) and OR ( r,c ), where c (a condition) is of type BOOLEAN, are named after their basis operators—recall that logical AND and OR are commutative and associative, with identity values TRUE and FALSE, respectively. Thus, aggregate AND returns TRUE if and only if c evaluates to TRUE for every tuple of r; and aggregate OR returns TRUE if and only if c evaluates to TRUE for some tuple of r. In some languages the names ALL and SOME (or ANY) are used in place of AND and OR. (Indeed, Rel allows ALL and ANY to be used as synonyms for AND and OR.) Some people find it counterintuitive that aggregate AND on an empty relation returns TRUE but this is of course a logical necessity: to say that c is TRUE for every tuple in r is the same as saying there does not exist a tuple in r for which c is FALSE.

Now, suppose we want to find out how many students sat each exam. Do we have to go to the lengths illustrated in Example 5.5? I have used Rel’s explicit OUTPUT statements in that example to emphasise that it involves four distinct queries, one for each course. That would be very tiresome if we had a very large number of courses to consider, impossible if we didn’t even know all the course ids.

Example 5.5: Number of students who sat each exam

OUTPUT COUNT ( EXAM_MARK WHERE CourseId = CID ( 'C1' ) ); OUTPUT COUNT ( EXAM_MARK WHERE CourseId = CID ( 'C2' ) ); OUTPUT COUNT ( EXAM_MARK WHERE CourseId = CID ( 'C3' ) ); OUTPUT COUNT ( EXAM_MARK WHERE CourseId = CID ( 'C4' ) ); Shouldn’t we be able to use just a single query to obtain the desired result, which is shown in Figure 5.4? After all, I have claimed that the operators described in Chapter 4 make Tutorial D relationally complete, so we should, if we support counting at all, be able to obtain the relation representing the predicate, “n

students sat the exam for course CourseId”.

CourseId n

C1 3 C2 1 C3 1 C4 0

Figure 5.4: How many sat each exam

The answer is that we can indeed obtain that relation using a single query and the next section starts to show you the way.

Download free eBooks at bookboon.com

131

5.4 Relations within a Relation

Have a look at Figure 5.5. The figure itself is a two-column table, with a two-column table appearing in every cell of its second column {“second” because columns of tables do necessarily appear in some order, unlike attributes of relations}. The table depicts a relation—let’s call it C_ER—whose attribute named ExamResult is of a certain relation type, namely, RELATION { StudentId SID, Mark INTEGER }. &RXUVH,G ([DP5HVXOW & 6WXGHQW,G 0DUN 6 6 6 & 6WXGHQW,G 0DUN 6 & 6WXGHQW,G 0DUN 6 & 6WXGHQW,G 0DUN Figure 5.5: Relations within a relation

Perhaps you have already noticed that the information represented by the table in Figure 5.5 is exactly the same as that represented by the tables in Figure 5.1, but in a different form. That being the case, we should be able to use relational operators to derive the relation C_ER from the current values of COURSE and EXAM_MARK. In fact it can be done using operators I have already described, as shown in Example 5.6.

Example 5.6: Obtaining C_ER from COURSE and EXAM_MARK

EXTEND COURSE{CourseId} :

{ ExamResult := RELATION { TUPLE { CourseId CourseId } } COMPOSE EXAM_MARK }

Download free eBooks at bookboon.com

132

In document An Introduction to Relational Database Theory - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 127-132)