The char Type
The C char type stores small integers. It is usually 8 bits.
char variables guaranteed to be able to hold integers 0 .. +127.
char variables mostly used to store characters encoded in ASCII.
Only store characters in charvariables in COMP1911.
Even if a numeric variable is only use for the values 0..9, use the type intfor the variable.
Andrew Taylor COMP1911 Computing 1A
ASCII Encoding
ASCII ( American Standard Code for Information Interchange) specifies encoding for 128 characters to the integers 0..127.
The characters encoded include upper and lower case (English) letters, digits, common punctuation symbols (as you might find on the keyboard) plus a series ofnon-printable control characters (includingnewlineandtab).
Andrew Taylor COMP1911 Computing 1A
Manipulating Characters
Does this mean you will have to remember 100+ character codes!?
Luckily no! We use character literalsinstead!
For example:
c h a r a = ’ a ’ ; // A S C I I c o d e 97 c h a r A = ’A ’ ; // A S C I I c o d e 65 c h a r s p a c e = ’ ’ ; // A S C I I c o d e 32 c h a r p l u s = ’ + ’ ; // A S C I I c o d e 43 c h a r n e w l i n e = ’ \ n ’ ; // A S C I I c o d e 10 Style Note
Always use character literals in your code! Even if you are really proud of having memorised the ASCII Table!
Manipulating Characters
NOTE
The ASCII codes for the digits (48–57), the upper case letters (65–90) and lower case letters (97–122) are in sequence.
Knowing this allows us to do some neat things:
i n t a = ’ a ’ ;
i n t b = a + 1 ; // t h i s i s p o s s i b l e due t o i n t c = ’ a ’ + 2 ; // t h e u n d e r l y i n g n u m e r i c t y p e i n t B = b − ( ’ a ’ − ’A ’ ) ;
Manipulating Characters
We can also test various properties of characters:
// c h e c k f o r l o w e r c a s e i f ( c >= ’ a ’ && c <= ’ z ’ ) {
. . .
Problem: Convert a digit character to the integer it represents, e.g., ‘0’ 7→ 0, ‘7’ 7→ 7, etc.
We use the fact that the digits codes are in order:
// c h e c k i s a d i g i t
i f ( c >= ’ 0 ’ && c <= ’ 9 ’ ) {
v a l = c − ’ 0 ’ ; // why d o e s t h i s work ? }
Andrew Taylor COMP1911 Computing 1A
Printing and Reading Characters
C provides library functions for reading and writing characters Thegetcharfunction
This function reads and returns one input character. Note that, unlikescanf, it has no arguments; its return value is collected by assigning it to a variable.
Theputcharfunction
This function takes a singleintargument and prints it out Here is an example:
i n t c ;
p r i n t f ( ” P l e a s e e n t e r a c h a r a c t e r : ” ) ; c = g e t c h a r ( ) ;
p u t c h a r ( c ) ;
Andrew Taylor COMP1911 Computing 1A
Reading Characters
Consider the following code:
i n t c1 , c 2 ;
p r i n t f ( ” P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ” ) ; c1 = g e t c h a r ( ) ;
p r i n t f ( ” P l e a s e e n t e r s e c o n d c h a r a c t e r : \ n ” ) ; c2 = g e t c h a r ( ) ;
p r i n t f ( ” F i r s t %c \ nSe cond : %c \ n ” , c1 , c2 ) ;
What is the output? Turns out that the newline input by pressing Enterafter the first character is read by the second getchar.
Andrew Taylor COMP1911 Computing 1A
Reading Characters
How can we fix the program?
i n t c1 , c2 ;
p r i n t f ( ” P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ” ) ; c1 = g e t c h a r ( ) ;
g e t c h a r ( ) ; // r e a d s and d i s c a r d s a c h a r a c t e r p r i n t f ( ” P l e a s e e n t e r s e c o n d c h a r a c t e r : \ n ” ) ; c2 = g e t c h a r ( ) ;
p r i n t f ( ” F i r s t : %c \ nSe cond : %c \ n ” , c1 , c2 ) ;
Printing characters
The conversion specifier for characters is%c. Using it we can supply variables of char type to printffor output:
Andrew Taylor COMP1911 Computing 1A
End of Input
An input funcion can such as scanfor getcharcan fail because there is no input is available.
This can occur, for example, if input is coming from a file and the end of the file is reached.
On UNIX-like systems such Linux & OSX typing Ctrl + D on a terminal signals to the operating system there is no more input from the terminal. Windows has no equivalent, but some windows program interprert Ctrl + Zsimilarly.
getchar returns a special non-ASCII value to indicate there is no input was available.
This non-ASCII value is #defined as EOFin stdio.h.
On most systems EOF == -1. Note -1, isn’t an ASCII value (0..127)
There is no end-of-file character on Linux or other modern operating systems.
Andrew Taylor COMP1911 Computing 1A
Reading Characters to End of Input
Programming pattern for reading characters to the end of input:
i n t ch ;
ch = g e t c h a r ( ) ; w h i l e ( ch != EOF) {
p r i n t f ( ” you e n t e r e d t h e c h a r a c t e r : ’%c ’ w h i c h h a s A S C I I c o d e %d \ n ” , ch , ch ) ; ch = g e t c h a r ( ) ;
}
For comparison the programming pattern for reading integers to end of input:
i n t num ;
// s c a n f r e t u r n s t h e number o f i t e m s r e a d w h i l e ( s c a n f (”%d ” , &num ) == 1 ) {
p r i n t f ( ” you e n t e r e d t h e number : %d \ n ” , num ) ; }
Andrew Taylor COMP1911 Computing 1A
Strings
A string is a sequence of characters.
C uses a special ASCII character ’\0’ to mark the end of strings.
This is convenient because programs don’t have to track the lnegth of the string.
Definition
A C string is anull-terminated character array.
Consider the following:
// t h i s i s i n c o r r e c t , ’ \ 0 ’ w i l l be d i s c a r d e d c h a r h e l l o [ 5 ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;
// t h i s i s OK
c h a r h e l l o [ 6 ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;
// t h i s i s b e t t e r
c h a r h e l l o [ ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;
String Length
Useful C Library Functions for Characters
The C library includes some useful functions which operate on characters.
Several of the more useful listed below.
Note the you need to #include <ctype.h> to use them.
#i n c l u d e <c t y p e . h>
i n t t o u p p e r ( i n t c ) ; // c o n v e r t c t o u p p e r c a s e i n t t o l o w e r ( i n t c ) ; // c o n v e r t c t o l o w e r c a s e
i n t i s a l p h a ( i n t c ) ; // t e s t i f c i s a l e t t e r i n t i s d i g i t ( i n t c ) ; // t e s t i f c i s a d i g i t
i n t i s l o w e r ( i n t c ) ; // t e s t i f c i s a l o w e r c a s e l e t t e r i n t i s u p p e r ( i n t c ) ; // t e s t i f c i s a u p p e r c a s e l e t t e r
Strings
Because working with strings is so common, C provides some convenient syntax.
Instead of writing:
c h a r h e l l o [ ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ; You can write
c h a r h e l l o [ ] = ” h e l l o ” ;
Notehellowill have 6 elements The compiler automatically appends ’\0’ when strings are initialised with string literals.
Again, remember to allow space for it if sizing the array manually.
Andrew Taylor COMP1911 Computing 1A
Reading an Entire Input Line
The function fgets reads an entire line:
#d e f i n e MAX LINE LENGTH 1024 . . .
c h a r l i n e [ MAX LINE LENGTH ] ;
f g e t s ( l i n e , MAX LINE LENGTH , s t d i n ) ; f p u t s ( l i n e , s t d o u t ) ;
Andrew Taylor COMP1911 Computing 1A
Reading an Entire Input Line
You might use fgets as follows:
#d e f i n e MAX LINE LENGTH 1024 . . .
c h a r l i n e [ MAX LINE LENGTH ] ;
f g e t s ( l i n e , MAX LINE LENGTH , s t d i n ) ;
f p u t s ( l i n e , s t d o u t ) ; // e q u i v a l e n t t o p r i n t f (”% s ” , l i n e )
Andrew Taylor COMP1911 Computing 1A
Reading Lines to End of Input
Programming pattern for reading lines to end of input:
// f g e t s r e t u r n s NULL i f i t can ’ t r e a d a n y c h a r a c t e r s w h i l e ( f g e t s ( l i n e , MAX LINE , s t d i n ) != NULL ) {
p r i n t f ( ” you e n t e r e d t h e l i n e : %s ” , l i n e ) ; }
Andrew Taylor COMP1911 Computing 1A
String Manipulation
The header filestring.h provides some useful string functions:
// s t r i n g l e n g t h ( n o t i n c l u d i n g ’ \ 0 ’ ) s i z e t s t r l e n ( c o n s t c h a r ∗ s ) ;
// s t r i n g c o p y
c h a r ∗ s t r c p y ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c ) ;
c h a r ∗ s t r n c p y ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c , s i z e t n ) ; // s t r i n g c o n c a t e n a t i o n / a p p e n d
c h a r ∗ s t r c a t ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c ) ;
c h a r ∗ s t r n c a t ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c , s i z e t n ) ; // s t r i n g c o m p a r e
i n t s t r c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s 2 ) ;
i n t s t r n c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s2 , s i z e t n ) ; i n t s t r c a s e c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s 2 ) ;
i n t s t r n c a s e c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s2 , s i z e t n ) ; // c h a r a c t e r s e a r c h
c h a r ∗ s t r c h r ( c o n s t c h a r ∗ s , i n t c ) ; c h a r ∗ s t r r c h r ( c o n s t c h a r ∗ s , i n t c ) ;
Andrew Taylor COMP1911 Computing 1A
String Manipulation
#i n c l u d e < s t r i n g . h>
. . .
c h a r s t r 1 [ 1 0 0 ] = ” H e l l o World ! ” ; c h a r s t r 2 [ 1 0 0 ] ;
s t r n c p y ( s t r 2 , s t r 1 , 1 0 0 ) ; // c o p y s t r 1 t o s t r 2 i f ( s t r c m p ( s t r 1 , s t r 2 ) == 0 ) { // c a s e − s e n s i t i v e c o m p a r e
p r i n t f ( ” S t r i n g s match ! \ n ” ) ;
} // a p p e n d s t r 1 t o s t r 2
s t r n c a t ( s t r 2 , s t r 1 , 100 − ( s t r l e n ( s t r 2 ) + 1 ) ) ;
i f ( s t r c a s e c m p ( s t r 1 , s t r 2 ) ) { // c a s e − i n s e n s i t i v e c o m p a r e p r i n t f ( ” S t r i n g s do n o t match ! \ n ” ) ;
}
p r i n t f (”%d \ n ” , s t r l e n ( s t r 2 ) ) ; // s t r i n g l e n g t h
Note thatstrlen does not count the null character!
Andrew Taylor COMP1911 Computing 1A
String Manipulation
Remember
You can find out about what else is available in string.h by running man string.
When working with strings we use the null character as a guard:
c h a r s t r 1 [ 1 0 0 ] = ” H e l l o World ! ” ; c h a r s t r 2 [ 1 0 0 ] ;
i n t i ;
// t h e f o l l o w i n g c o d e c o p i e s s t r 1 i n t o s t r 2
i = 0 ; // s t a r t a t i n d e x 0
w h i l e ( s t r 1 [ i ] != ’ \ 0 ’ ) { // s t o p on n u l l
s t r 2 [ i ] = s t r 1 [ i ] ; // c o p y i n d i v i d u a l c h a r a c t e r s
i = i + 1 ; // i n c r e m e n t i n d e x
}
s t r 2 [ i ] = ’ \ 0 ’ ; // MUST s e t t h i s f o r s t r 2 !
Strings
In summary strings:
are null-terminatedcharacter arrays can beinitialised with string literals
can be manipulated by scanf/printf, use%s
benefit from the string manipulation functions in string.h since they are arrays they cannot be copied via assignment (=) Careful
The main error encountered when working with strings is
mishandling the terminating null character, e.g., forgetting to set it! Check this first if your strings are behaving strangely.
Arrays of Strings
Sometimes, instead of manipulating each individual cell, as for matrices, we need to manipulate whole arrays. This is generally the case when working with arrays of strings!
Consider:
c h a r names [ 3 ] [ 2 0 ] = {” Mark ” , ” Luke ” , ” John ” } ;
// why d o e s t h i s work ?
p r i n t f (”% s %s %s \ n ” , names [ 0 ] , names [ 1 ] , names [ 2 ] ) ;
Array of arrays
If we take this view (array of arrays!) of 2D arrays, it makes sense why using only the first index gives us a whole array!
Andrew Taylor COMP1911 Computing 1A
Command-line Arguments
What are command-line arguments? Arguments that are supplied to a program when it is run.
We have seen them already, for example:
% diff -i file1.txt file2.txt
% gedit prog.c
Here, -i, file1.txt and file2.txt are command-line arguments to diff and prog.c is a command-line argument to gedit.
Command-line arguments are automatically supplied to our C programs, by the operating system, via the arguments of the main function (argc and argv)!
i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] ) { . . .
Andrew Taylor COMP1911 Computing 1A
Command-line Arguments
Arguments to main
argc stores the number of command-line arguments argv stores the command-line arguments as strings
i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] ) { i n t i ;
f o r ( i = 0 ; i < a r g c ; i = i + 1 ) { // p r i n t a r g u m e n t s p r i n t f ( ” Argument %d i s : %s \ n ” , i + 1 , a r g v [ i ] ) ; }
. . .
NB
The first argument is always the program name! This means that argc is always at least 1 and argv contains at least one value.
Andrew Taylor COMP1911 Computing 1A
Command-line Arguments
By default, command-line arguments are space delimited. We can use quotes if arguments include spaces.
% ./prog1 nospace one space
% ./prog2 nospace "one space"
In the above, prog1 sees three command-line arguments, while prog2 sees only two. What about?
% ./prog3 *.c
% ./prog4 10 < in > out
Sometimes argv is typed as: char **argv.
Andrew Taylor COMP1911 Computing 1A