• No results found

Essential data handling

IMPLICIT NONE

3.5 Handling CHARACTER data

Having used CHARACTER constants in some of our PRINT statements it is now appropriate to consider how we may declare CHARACTER variables and manipulate CHARACTER data within a program. First, however, we must emphasize that characters and numbers are stored very differently in any computer.

As we have already seen, REAL and INTEGER variables can hold a wide range of numbers in a single variable. We must now introduce the concept of a numeric storage unit, which is that part of the memory of the computer in which a single REAL or INTEGER number can be stored. On most modem computers a numeric storage unit will consist of a contiguous area of memory capable of storing 16, 32, 48 or 64 bits, or binary digits. A 32-bit numeric storage unit is capable of storing integers in the range from about - 2 X 109 to +2 X 109, or real numbers in the range -1038 to + 1038 to an accuracy of about seven significant digits.

Characters, on the other hand, are stored in character storage units, typically occupying 8 or 16 bits, each of which can hold exactly one character in a coded form. A character variable consists of a sequence of one or more consecutive character storage units. There is no assumption about the relationship, if any, between numeric and character storage units, although, in practice, most computers will use the same physical memory devices for both types so that, for example, four 8-bit character storage units may be kept together in what would otherwise be a single 32-bit numeric storage unit.

Programs in the Fortran language are written using characters taken from the Fortran Character Set, which consists of the 26 letters of the Latin alphabet, the ten decimal digits, the underscore character and 21 additional special characters. These 58 characters are shown in Figure 3.11. Note that lower case letters are treated as identical to upper case letters when they appear in Fortran keywords or identifiers, although they are, of course, treated as different in data or in a character string.

Handling CHARACTER data 59

ABC D E F G H I JK L M N 0 P Q R STU V W

x y z o

1 2 3 4 5 678 9

+=+-*/ (),.':! "%&;<>?$

(where • represents the space, or blank, character)

Figure 3.11 The Fortran Character Set.

However, any particular implementation will almost certainly have codes for other characters and these may be used as part of a character constant, may be stored in character variables, may be input or output, and may appear in comments, although such a program may not then work on a different computer.

The processor may, indeed, support several different families of characters, as we shall see in Chapter 14. For the present, however, we shall only concern ourselves with the default character set, which is that set of characters normally available on the computer system being used without any special action on the part of the user.

A character variable is declared in a very similar manner to that used for integer and real numbers, with the important difference that it is necessary to specify how many characters the variable is to be capable of storing. The declaration statement can take a number of similar forms, of which the fundamental one is as follows:

CHARACTER (LEN=length) :: name 1 , name2,

This declares one or more CHARACTER variables, each of which has a length of length. This means that each of the variables declared will hold exactly length characters.

There are two additional ways of writing this statement:

CHARACTER (length) :: nameI,name2, ...

CHARACTER

*

length :: name 1 , name2 , .0

Although both of these are slightly shorter, we recommend, for the sake of greater clarity, that you use the full form of the declaration statement in your programs, as we shall do in this book.

If no length specification is provided, then the length is taken to be one.

Of course it is frequently the case that not all the character variables in a program are required to have the same length, and it is permitted to attach a length specification directly to the variable names in any of the above forms of declaration:

CHARACTER (LEN=length) o. nameI,name2*len_2,name3*len_3, ...

60 Essential data handling

In this example the variable name1 is of length length, as are any other variables in the list without a specific length specification. name2, however, has a length of len_2 while name3 has a length of len_3. However, it is clearer, and less prone to error, to write separate declarations for character variables of different lengths, and we strongly recommend that you should adopt this approach.

The length specification may be either a positive integer constant or an integer constant expression; in the latter case it must be enclosed in parentheses if it is attached to a variable name. Thus the following three sets of declarations have an identical effect:

(1) CHARACTER(LEN=6) :: a,b,c (2) CHARACTER(LEN=l2-6) :: a,b,c (3) CHARACTER::a*6,b*(8-2) ,c*(2*3)

The fact that character variables always hold a specified number of characters leads to a number of potential problems when carrying out assignment or input. For example, what will be stored in the three variables a, band c by the following program?

Here we have three character variables declared, two of length four, and one (string_l) of length three. The first assignment statement assigns the character constant End to string_l. We can readily see that the value to be assigned (the constant) has a length of three and so it exactly occupies the three storage units which constitute the variable string_l, and all is well.

The next assignment statement is, however, more of a problem. string_l has a length of 3 and contains the three characters End; string_2, however, has a length of 4, so what will be stored in the four storage units?

The answer is that if a character string has a shorter length than the length of the variable to which it is to be assigned then it is extended to the right with blank (or space) characters until it is the correct length. In this case, therefore, the contents of string_l will have a single blank character added after the letter d, thus making a length of four, before being assigned to string_2.

The third assignment statement poses the opposite problem. Here the character constant to be assigned has a length of 5, whereas the variable, string_3, only has a length of 4. In this case the string is truncated from the right to the correct length (4) before assignment.

Handling CHARACTER data 61 At the end of this program, therefore, the three variables string_l,

string_2 and string_3 contain the character strings End, End. and Fina, respectively, where • represents a blank, or space, character.

The importance of this extension and truncation makes it desirable that we restate these rules more formally:

• When assigning a character string to a character variable whose length is not the same as that of the string, the string stored in the variable is extended on the right with blanks, or truncated from the right, so as to exactly fill the character variable to which it is being assigned.

A similar situation can arise during the input of character data by aREAD

statement if the number of characters which form the input data is different from the length of the variable into which they are being read. Before discussing this in detail, however, we must examine the way in which character data is input and output by list-directed input/output statements.

The form of any character data to be read by a list-directed READ

statement is normally the same as that of a character constant. In other words it must be delimited by either quotation marks or by apostrophes. There are some exceptions to this rule, however, in order to cater for common situations where the need for the apostrophes or quotes would be annoying. The delimiting characters are not required ifall of the following conditions are met:

(1) the character data does not contain any blanks, any commas or any slashes (that is, it does not contain any of the value separators discussed earlier);

(2) the character data is all contained within a single record or line;

(3) the first non-blank character is not a quotation mark or an apostrophe, since this would be taken as a delimiting character;

(4) the leading characters are not numeric followed by an asterisk, since this would be confused with the multiple data item form (n*c).

In this case the character constant is terminated by any of the value separators which will terminate a numeric data item (blank, comma, slash or end of record), and it may be repeated by means of a multiple data item of the form n*c.

If the character data which is read by a list-directed READ statement is too long or too short for the variable concerned then it is truncated or extended on the right in exactly the same way as for assignment.

The output situation is rather simpler, and a list-directed PRINT statement will output exactly what is stored in a character variable or constant, including any trailing blanks, without any delimiting apostrophes or quotation marks.

Thus we could modify our earlier program to print the values of the three variables as follows:

62 Essential data handling

The result of running this program would be the following line of text:

EndEnd Fina

The ability to assign a character literal constant, or the string stored in a character variable, or to input and output character data, does not in itself take us very far. Just as we can write arithmetic expressions, therefore, so we can also create character expressions. The major difference between character expressions and the other types of expressions, however, is that there are very few things we can actually do with strings of characters!

One thing that we can do, though, is combine two strings to form a third, composite, string. This process is called concatenation and is carried out by means of the concatenation operator, consisting of two consecutive slashes:

char = "Fred"//"die"

The composite string will, of course, have a length equal to the sum of the lengths of the two strings which were concatenated to form it, and the variable char will contain the string Freddie, as long as it has a length of at least 7.

This is the only operator provided in Fortran for use with character strings; Fortran does, however, include one important additional capability, namely the identification of substrings. This is achieved by following the character variable name or character constant by two integer expressions separated by a colon and enclosed in parentheses. The two integer values represent the positions in the character variable or constant of the first and last characters of the substring. Either may be omitted, but not both, in which case the first or last character position is assumed, as appropriate.

Thus the substring "rhubarb" (2 :4) specifies a substring consisting of the three characters hub taken from positions 2 to 4 of the character constant. In a similar way alpha (5 :7) represents a three character substring of the value of the character variable alpha, while beta (4:) represents a substring starting at the fourth character of the value of beta and continuing to the last character, and gamma (:6)represents a substring consisting of the first six characters of the value of gamma.

It is also permitted to assign a value to a substring without altering the rest of the variable. Thus the following program fragment will result in the variable ch having the value Alpine ••, where, as before, • represents a space:

Handling CHARACTER data 63 PROGRAM substring

IMPLICIT NONE

CHARACTER (LEN=8) .. ch ch = "Alphabet"

ch(4:) = "ine"

It is instructive to examine this in detail.

The substring ch (4 :) is the substring from character 4 to the end of ch - a total of five characters. The character constant" ine" only has a length of 3 so it is extended by adding two blank characters before being assigned to ch (4 : ). The assignment means that the old substring value ("habet") is replaced by the new value ("ineH"), leaving the rest of ch unchanged. The final result, therefore, is that ch contains "Alpine "

IT]

Problem

Write a program which asks fhe user for her title, first name and last name, and then prints a welcome message using both the full name and first name.

m

Analysis

This program is simply an exercise in simple character manipulation. However, there are some slight difficulties in combining the title, first and last names in a form which will avoid multiple spaces within the composite name. For example, if variables with a length of 12 characters were chosen, then the name Kathy would be followed by seven spaces.

In Chapter 2 we pointed out that many of the detailed aspects of programs can often be carried out in procedures which can be written later, or can be written by someone else, or which may already exist somewhere else. The Fortran language contains a large number of special procedures, known as intrinsic procedures, which provide a great many useful additional features. We shall examine this topic in some detail in Chapter 4, but for the present we shall simply note that there are several intrinsic procedures whose purpose is to assist in the manipulation of character strings. A list of all the intrinsic procedures in Fortran 90 will be found in Appendix A.

The most useful intrinsic procedure, for our present purpose, is TRIM, which removes any trailing blanks from the character string provided as its argument. There would still be a difficulty if the user types one or more blanks before the name, but we shall assume that this does not happen and ignore the

64 Essential data handling

problem for the present - although there is another intrinsic procedure that could be used to deal with it.

Armed with this intrinsic procedure we can develop our structure plan:

o

Solution

PROGRAM welcome IMPLICIT NONE

This program manipulates character strings to produce a properly formatted welcome message

Variable declarations

CHARACTER (LEN=20) title,first_name,last_name CHARACTER (LEN=40) :: full_name

! Ask for name, etc

PRINT *,"Please give your full name in,the form requested"

PRINT *,"Title (Mr./Mrs./Ms./Professor/etc: "

READ *, ti tle

PRINT *, "First name: "

READ *,first_name PRINT' *, "Last name: ",' READ *,last_name

! Create full name

full_name = TRIM(title)lI" "//TRIM(first_name)//" "//last_name

~

! Print messages

PRINT *,"Welcome ",full_name

PRINT *,"May I call you ",TRIM(first_name),"?"

I,

END PROGRAM welcome

Notice that TRIM has been used in the second PRINT statement to ensure that the question mark at the end of the question comes immediately after the name, and not separated from it by sever~l spaces.

Initial values and constants 65 SELF-TEST EXERCISES 3.2

'1 What is the difference between the Fortran Character Set and the default character set?

2 What is the most obvious difference between the declaration of an integer or real variable and the declaration of a character variable?

3 Write declaration statements for six character variables, of which four are to contain character strings of up to 20 characters, one is to contain only a single character, and one is to contain the month of the year. I.

4 Write a single declaration statement for the same variables as in Question 3.

5 What will be printed by the following program?

PROGRAM test3_2_S IMPLICIT NONE

CHARACTER (LEN=16) .. a,b,c,d a = "A kindly giant"

b = "A small man"

c = b(:8)//"step"

d = "for a"//b(8:)

b = " "//d(:4)//b(9:11)//a(3:6) a = a(:2)//a(lO:lS)//"leap"

PRINT *,c(:13),d PRINT *,TRIM(a(:12)),b END PROGRAM test3_2_S