Comparing character strings - Controlling the flow of your program

Controlling the flow of your program

5.5 Comparing character strings

In Section 5.2 we mentioned that the six relational operators could be used to compare character expressions and constants (or character strings as they are usually referred to), but that the question of determining when one string was greater than another would be left until later. The key to this determination is the collating sequence of letters, digits and other characters. Fortran 90 lays down six rules for this covering letters, digits and the space or blank character.

(1) The 26 upper case letters are collated in the following order:

ABC D E F G H I J K L M N 0 P Q R STU V W X Y Z (2) The 26 lower case letters are collated in the following order:

abc d e f 9 h i j k 1 m n 0p q r stu v w x y z (3) The 10 digits are collated in the following order:

01234 5 6 7 8 9

(4) Digits are either all collated before the letter A, or all after the letter Z (5) Digits are either all collated before the letter a, or all after the letter Z

(6) A space (or blank) is collated before both letters and digits

The other 22 characters in the Fortran character set, and any others which may be available on a particular computer system, do not have any defined position in the collating sequence. In practice they will usually be ordered according to the internal code used by the computer as long as this code satisfies the above rules.

When two character operands are being compared there are three distinct

stages in the process: .

II"

(1) If the two operands are not the same length, the shorter one is treated as though it were extended on the right with blanks until it is the same length as the longer one.

Comparing character strings 147

(2) The two operands are compared character by character, starting with the leftmost character, until either a difference is found or the end of the operands is reached.

(3) If a difference is found, then the relationship between these two different characters defines the relationship between the two operands, with the character which comes earlier in the collating sequence being deemed to be the lesser of the two. If no difference is found, then the strings are considered to be equal.

The result of this process is that the relational expression always has the value we would instinctively expect it to have. Thus

"Adam" > "Eve"

is false because A comes before E, and is thus less than E.

"Adam" < "Adamant"

is true because after Adam has been extended the relationship reduces to " "

<

"a"

after the first four characters have been found to be the same. Since a blank comes before a letter, this is true.

"120"

<

"1201"

is true because the first difference in the strings leads to an evaluation of " "<"1", which is true since a blank also comes before a digit.

Notice, however, that the values of the expressions

"ADAM"

<

"Adam"

"XA" < "X4"

and

"var_1"

>

"var-1"

are not defined in Fortran. In the first case the standard does not define whether upper case letters corne before or after lower case letters or, indeed, whether they are even interleaved, and so the value of "ADAM" < "Adam" will depend upon the particular computer system being used. Similarly, in the second case the standard does not define whether digits corne before or after letters. Finally, in the third case the special characters are not defined at all in the collating sequence, so that, once again, the value of "_" > "-" depends upon the computer system.

These undefined areas are not normally any problem. It is unlikely that most applications would expect to compare character strings (other than for equality) if the order was to be determined by characters other than letters, digits

148 Controlling the flow of your program

LGT(51,52) is the same as 51 > 52 using ASCII character ordering LGE(51,52) is the same as 51 >= 52 using ASCII character ordering LLE (51,52) is the same as 51 <= 52 using ASCII character ordering LLT (51,52) is the same as 51 < 52 using ASCII character ordering

Figure 5.11 Intrinsic functions for lexical comparison.

or blanks. The concepts of alphabetic or numeric ordering are natural ones, as is the concept of shorter strings coming before longer ones which start with the same characters as the shorter one (that is, John comes before Johnson, alpha before alphabet). The only practical area of doubt concerns the question of whether digits come before or after letters.

If, for reasons of portability, it is required to define the ordering of all characters, then another way of comparing them is available. This uses one of the four intrinsic functions shown in Figure 5.11. These functions return the value true or false after a comparison which uses the ordering of characters defined in the American National Standard Code for Information Interchange (ANSIX3,4 1977), which is usually referred to as ASCII. This code, which is widely used as an internal code, is also defined in the International Reference Version (IRV) of the International Standard ISO 646 : 1983; it is included, for reference, in Appendix D of this book.

Thus, for example, whereas the value of

"Miles" > "miles"

cannot be defined with complete certainty, because the Fortran standard does not state whether upper case letters 'come before or after lower case letters, the value of

LGT("Miles","miles")

will always be true, because upper case letters do come before lower case letters in the ASCII collating sequence.

[] Problem

Write a function which takes a single character as its argument and returns a single character according to the following rules:

Comparing character strings 149

• If the input charader is a lower case letter then return its upper case equivalent

• If the input charader is an upper case letter then return its lower case equivalent

• If the input charader is not a letter then return it unchanged

I1J

^Analysis

The major problem here is establishing the relationship between upper and lower case letters, so that conversions may be easily made. Here we can use the ASCII code (see Appendix D) to good effed due to the existence of the two intrinsic fundions IACHAR and ACHAR. The first of these provides the position of its charader argument in the ASCII collating sequence, while the second returns the charader at a specified position in that sequence. Thus IACHAR (" A") is 65, while ACHAR(97) is the charader a. An examination of the ASCII charader set (see Figure D.1 in Appendix D) quickly shows that every lower case charader is exadly 32 positions after its upper case equivalent. We now have both the information and the means to carry out the conversion and so are ready to design our fundi on.

Although we could simply add or subtrad 32 from the ASCII code for the charader, as appropriate, it is not then obvious what is happening. We shall therefore define a constant which has the value of this offset, calculated by subtrading the code for an upper case letter from its lower case equivalent.

Furthermore, to avoid unecessary complication, we shall assume that the upper case letters are contiguous in the processor's charader set (that is, there are no other charaders intervening) and that the lower case charaders are also contiguous in the processor's charader set. If we wished to guarantee this then the tests could be carried out using the ASCII collating sequence by means of the intrinsic fundions LLE etc., but this would be something of an overkill in this instance!

Data design

Purpose Type Name

A Dummy argument:

Character to be CHARACTER"l char converted

B Result variable:

Converted character CHARACTER"l change_case C Local constant:

Offset between upper INTEGER upper_to _lower and lower case in the

ASCII character set

150 Controlling the flow of your program Structure plan

o

^Solution

CHARACTER FUNCTION change_case (char) IMPLICIT NONE

! This function changes the case of its argument (if it

! is alphabetic)

! Dummy argument

CHARACTER, INTENT (IN) :: char

! Local constant

INTEGER, PARAMETER:: 'upper_to_lower = IACHAR("a")-IACHAR("A")

! Check if argument is lower case alphabetic, upper case

! alphabetic, or non-alphabetic IF ("A"<=char .AND. char<="Z") THEN

! Upper case - convert to lower case

change_case = ACHAR(IACHAR(char)+upper_to_lower) ELSE IF ("a"<=char .AND. char<="z") THEN

! Lower case - convert to upper case

change_case = ACHAR(IACHAR(char)-upper_to_lower) ELSE

! Not alphabetic change_case = char END IF

END FUNCTION change_case

SELF-TEST EXERCISES SO. 1

1 What is the difference between a logical operator and a relational operator?

The CASE construct 151 2 What are the values of the following expressions?

(a) 1>2 (h) (1+3) .GE.4 (c) (1+3)<=4 (d) (0.1+0.3) .LE.0.4 (e) 2>1 .AND. 3<4

(f) 3>2 .AND. (1+2)<3 .OR. 4<=3 (g) 3>2 .OR. (1+2)<3 .AND. 4<=3 (h) 3>2 .AND. (1+2)<3 .EQV. 4<=3

If 3 What is the purpose of the blockIF construct?

4 What is the advantage of a blockIF construct over a logicalIFstatement?

5 What are the rules for collating characters?

6 What are the values of the following expressions?

1':

(a) "Me "(" YouII

(h) "Me"("ME" ,~

(d) "Mell("Me?" ^I!,

,

(e) LLT("Me" ,"Me?") l",

'.

^.•

In document T. M. R. Ellis, Ivor R. Philips, Thomas M. Lahey-Fortran 90 programming-Addison-Wesley (1994).pdf (Page 162-167)