• No results found

ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters

N/A
N/A
Protected

Academic year: 2021

Share "ASCII Encoding. The char Type. Manipulating Characters. Manipulating Characters"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

The char Type

The C char type stores small integers. It is usually 8 bits.

char variables guaranteed to be able to hold integers 0 .. +127.

char variables mostly used to store characters encoded in ASCII.

Only store characters in charvariables in COMP1911.

Even if a numeric variable is only use for the values 0..9, use the type intfor the variable.

Andrew Taylor COMP1911 Computing 1A

ASCII Encoding

ASCII ( American Standard Code for Information Interchange) specifies encoding for 128 characters to the integers 0..127.

The characters encoded include upper and lower case (English) letters, digits, common punctuation symbols (as you might find on the keyboard) plus a series ofnon-printable control characters (includingnewlineandtab).

Andrew Taylor COMP1911 Computing 1A

Manipulating Characters

Does this mean you will have to remember 100+ character codes!?

Luckily no! We use character literalsinstead!

For example:

c h a r a = ’ a ’ ; // A S C I I c o d e 97 c h a r A = ’A ’ ; // A S C I I c o d e 65 c h a r s p a c e = ’ ’ ; // A S C I I c o d e 32 c h a r p l u s = ’ + ’ ; // A S C I I c o d e 43 c h a r n e w l i n e = ’ \ n ’ ; // A S C I I c o d e 10 Style Note

Always use character literals in your code! Even if you are really proud of having memorised the ASCII Table!

Manipulating Characters

NOTE

The ASCII codes for the digits (48–57), the upper case letters (65–90) and lower case letters (97–122) are in sequence.

Knowing this allows us to do some neat things:

i n t a = ’ a ’ ;

i n t b = a + 1 ; // t h i s i s p o s s i b l e due t o i n t c = ’ a ’ + 2 ; // t h e u n d e r l y i n g n u m e r i c t y p e i n t B = b − ( ’ a ’ − ’A ’ ) ;

(2)

Manipulating Characters

We can also test various properties of characters:

// c h e c k f o r l o w e r c a s e i f ( c >= ’ a ’ && c <= ’ z ’ ) {

. . .

Problem: Convert a digit character to the integer it represents, e.g., ‘0’ 7→ 0, ‘7’ 7→ 7, etc.

We use the fact that the digits codes are in order:

// c h e c k i s a d i g i t

i f ( c >= ’ 0 ’ && c <= ’ 9 ’ ) {

v a l = c − ’ 0 ’ ; // why d o e s t h i s work ? }

Andrew Taylor COMP1911 Computing 1A

Printing and Reading Characters

C provides library functions for reading and writing characters Thegetcharfunction

This function reads and returns one input character. Note that, unlikescanf, it has no arguments; its return value is collected by assigning it to a variable.

Theputcharfunction

This function takes a singleintargument and prints it out Here is an example:

i n t c ;

p r i n t f ( ” P l e a s e e n t e r a c h a r a c t e r : ” ) ; c = g e t c h a r ( ) ;

p u t c h a r ( c ) ;

Andrew Taylor COMP1911 Computing 1A

Reading Characters

Consider the following code:

i n t c1 , c 2 ;

p r i n t f ( ” P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ” ) ; c1 = g e t c h a r ( ) ;

p r i n t f ( ” P l e a s e e n t e r s e c o n d c h a r a c t e r : \ n ” ) ; c2 = g e t c h a r ( ) ;

p r i n t f ( ” F i r s t %c \ nSe cond : %c \ n ” , c1 , c2 ) ;

What is the output? Turns out that the newline input by pressing Enterafter the first character is read by the second getchar.

Andrew Taylor COMP1911 Computing 1A

Reading Characters

How can we fix the program?

i n t c1 , c2 ;

p r i n t f ( ” P l e a s e e n t e r f i r s t c h a r a c t e r : \ n ” ) ; c1 = g e t c h a r ( ) ;

g e t c h a r ( ) ; // r e a d s and d i s c a r d s a c h a r a c t e r p r i n t f ( ” P l e a s e e n t e r s e c o n d c h a r a c t e r : \ n ” ) ; c2 = g e t c h a r ( ) ;

p r i n t f ( ” F i r s t : %c \ nSe cond : %c \ n ” , c1 , c2 ) ;

Printing characters

The conversion specifier for characters is%c. Using it we can supply variables of char type to printffor output:

Andrew Taylor COMP1911 Computing 1A

(3)

End of Input

An input funcion can such as scanfor getcharcan fail because there is no input is available.

This can occur, for example, if input is coming from a file and the end of the file is reached.

On UNIX-like systems such Linux & OSX typing Ctrl + D on a terminal signals to the operating system there is no more input from the terminal. Windows has no equivalent, but some windows program interprert Ctrl + Zsimilarly.

getchar returns a special non-ASCII value to indicate there is no input was available.

This non-ASCII value is #defined as EOFin stdio.h.

On most systems EOF == -1. Note -1, isn’t an ASCII value (0..127)

There is no end-of-file character on Linux or other modern operating systems.

Andrew Taylor COMP1911 Computing 1A

Reading Characters to End of Input

Programming pattern for reading characters to the end of input:

i n t ch ;

ch = g e t c h a r ( ) ; w h i l e ( ch != EOF) {

p r i n t f ( ” you e n t e r e d t h e c h a r a c t e r : ’%c ’ w h i c h h a s A S C I I c o d e %d \ n ” , ch , ch ) ; ch = g e t c h a r ( ) ;

}

For comparison the programming pattern for reading integers to end of input:

i n t num ;

// s c a n f r e t u r n s t h e number o f i t e m s r e a d w h i l e ( s c a n f (”%d ” , &num ) == 1 ) {

p r i n t f ( ” you e n t e r e d t h e number : %d \ n ” , num ) ; }

Andrew Taylor COMP1911 Computing 1A

Strings

A string is a sequence of characters.

C uses a special ASCII character ’\0’ to mark the end of strings.

This is convenient because programs don’t have to track the lnegth of the string.

Definition

A C string is anull-terminated character array.

Consider the following:

// t h i s i s i n c o r r e c t , ’ \ 0 ’ w i l l be d i s c a r d e d c h a r h e l l o [ 5 ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;

// t h i s i s OK

c h a r h e l l o [ 6 ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;

// t h i s i s b e t t e r

c h a r h e l l o [ ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ;

String Length

Useful C Library Functions for Characters

The C library includes some useful functions which operate on characters.

Several of the more useful listed below.

Note the you need to #include <ctype.h> to use them.

#i n c l u d e <c t y p e . h>

i n t t o u p p e r ( i n t c ) ; // c o n v e r t c t o u p p e r c a s e i n t t o l o w e r ( i n t c ) ; // c o n v e r t c t o l o w e r c a s e

i n t i s a l p h a ( i n t c ) ; // t e s t i f c i s a l e t t e r i n t i s d i g i t ( i n t c ) ; // t e s t i f c i s a d i g i t

i n t i s l o w e r ( i n t c ) ; // t e s t i f c i s a l o w e r c a s e l e t t e r i n t i s u p p e r ( i n t c ) ; // t e s t i f c i s a u p p e r c a s e l e t t e r

(4)

Strings

Because working with strings is so common, C provides some convenient syntax.

Instead of writing:

c h a r h e l l o [ ] = { ’ h ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ , ’ \ 0 ’ } ; You can write

c h a r h e l l o [ ] = ” h e l l o ” ;

Notehellowill have 6 elements The compiler automatically appends ’\0’ when strings are initialised with string literals.

Again, remember to allow space for it if sizing the array manually.

Andrew Taylor COMP1911 Computing 1A

Reading an Entire Input Line

The function fgets reads an entire line:

#d e f i n e MAX LINE LENGTH 1024 . . .

c h a r l i n e [ MAX LINE LENGTH ] ;

f g e t s ( l i n e , MAX LINE LENGTH , s t d i n ) ; f p u t s ( l i n e , s t d o u t ) ;

Andrew Taylor COMP1911 Computing 1A

Reading an Entire Input Line

You might use fgets as follows:

#d e f i n e MAX LINE LENGTH 1024 . . .

c h a r l i n e [ MAX LINE LENGTH ] ;

f g e t s ( l i n e , MAX LINE LENGTH , s t d i n ) ;

f p u t s ( l i n e , s t d o u t ) ; // e q u i v a l e n t t o p r i n t f (”% s ” , l i n e )

Andrew Taylor COMP1911 Computing 1A

Reading Lines to End of Input

Programming pattern for reading lines to end of input:

// f g e t s r e t u r n s NULL i f i t can ’ t r e a d a n y c h a r a c t e r s w h i l e ( f g e t s ( l i n e , MAX LINE , s t d i n ) != NULL ) {

p r i n t f ( ” you e n t e r e d t h e l i n e : %s ” , l i n e ) ; }

Andrew Taylor COMP1911 Computing 1A

(5)

String Manipulation

The header filestring.h provides some useful string functions:

// s t r i n g l e n g t h ( n o t i n c l u d i n g ’ \ 0 ’ ) s i z e t s t r l e n ( c o n s t c h a r ∗ s ) ;

// s t r i n g c o p y

c h a r ∗ s t r c p y ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c ) ;

c h a r ∗ s t r n c p y ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c , s i z e t n ) ; // s t r i n g c o n c a t e n a t i o n / a p p e n d

c h a r ∗ s t r c a t ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c ) ;

c h a r ∗ s t r n c a t ( c h a r ∗ d e s t , c o n s t c h a r ∗ s r c , s i z e t n ) ; // s t r i n g c o m p a r e

i n t s t r c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s 2 ) ;

i n t s t r n c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s2 , s i z e t n ) ; i n t s t r c a s e c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s 2 ) ;

i n t s t r n c a s e c m p ( c o n s t c h a r ∗ s1 , c o n s t c h a r ∗ s2 , s i z e t n ) ; // c h a r a c t e r s e a r c h

c h a r ∗ s t r c h r ( c o n s t c h a r ∗ s , i n t c ) ; c h a r ∗ s t r r c h r ( c o n s t c h a r ∗ s , i n t c ) ;

Andrew Taylor COMP1911 Computing 1A

String Manipulation

#i n c l u d e < s t r i n g . h>

. . .

c h a r s t r 1 [ 1 0 0 ] = ” H e l l o World ! ” ; c h a r s t r 2 [ 1 0 0 ] ;

s t r n c p y ( s t r 2 , s t r 1 , 1 0 0 ) ; // c o p y s t r 1 t o s t r 2 i f ( s t r c m p ( s t r 1 , s t r 2 ) == 0 ) { // c a s e − s e n s i t i v e c o m p a r e

p r i n t f ( ” S t r i n g s match ! \ n ” ) ;

} // a p p e n d s t r 1 t o s t r 2

s t r n c a t ( s t r 2 , s t r 1 , 100 − ( s t r l e n ( s t r 2 ) + 1 ) ) ;

i f ( s t r c a s e c m p ( s t r 1 , s t r 2 ) ) { // c a s e − i n s e n s i t i v e c o m p a r e p r i n t f ( ” S t r i n g s do n o t match ! \ n ” ) ;

}

p r i n t f (”%d \ n ” , s t r l e n ( s t r 2 ) ) ; // s t r i n g l e n g t h

Note thatstrlen does not count the null character!

Andrew Taylor COMP1911 Computing 1A

String Manipulation

Remember

You can find out about what else is available in string.h by running man string.

When working with strings we use the null character as a guard:

c h a r s t r 1 [ 1 0 0 ] = ” H e l l o World ! ” ; c h a r s t r 2 [ 1 0 0 ] ;

i n t i ;

// t h e f o l l o w i n g c o d e c o p i e s s t r 1 i n t o s t r 2

i = 0 ; // s t a r t a t i n d e x 0

w h i l e ( s t r 1 [ i ] != ’ \ 0 ’ ) { // s t o p on n u l l

s t r 2 [ i ] = s t r 1 [ i ] ; // c o p y i n d i v i d u a l c h a r a c t e r s

i = i + 1 ; // i n c r e m e n t i n d e x

}

s t r 2 [ i ] = ’ \ 0 ’ ; // MUST s e t t h i s f o r s t r 2 !

Strings

In summary strings:

are null-terminatedcharacter arrays can beinitialised with string literals

can be manipulated by scanf/printf, use%s

benefit from the string manipulation functions in string.h since they are arrays they cannot be copied via assignment (=) Careful

The main error encountered when working with strings is

mishandling the terminating null character, e.g., forgetting to set it! Check this first if your strings are behaving strangely.

(6)

Arrays of Strings

Sometimes, instead of manipulating each individual cell, as for matrices, we need to manipulate whole arrays. This is generally the case when working with arrays of strings!

Consider:

c h a r names [ 3 ] [ 2 0 ] = {” Mark ” , ” Luke ” , ” John ” } ;

// why d o e s t h i s work ?

p r i n t f (”% s %s %s \ n ” , names [ 0 ] , names [ 1 ] , names [ 2 ] ) ;

Array of arrays

If we take this view (array of arrays!) of 2D arrays, it makes sense why using only the first index gives us a whole array!

Andrew Taylor COMP1911 Computing 1A

Command-line Arguments

What are command-line arguments? Arguments that are supplied to a program when it is run.

We have seen them already, for example:

% diff -i file1.txt file2.txt

% gedit prog.c

Here, -i, file1.txt and file2.txt are command-line arguments to diff and prog.c is a command-line argument to gedit.

Command-line arguments are automatically supplied to our C programs, by the operating system, via the arguments of the main function (argc and argv)!

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] ) { . . .

Andrew Taylor COMP1911 Computing 1A

Command-line Arguments

Arguments to main

argc stores the number of command-line arguments argv stores the command-line arguments as strings

i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] ) { i n t i ;

f o r ( i = 0 ; i < a r g c ; i = i + 1 ) { // p r i n t a r g u m e n t s p r i n t f ( ” Argument %d i s : %s \ n ” , i + 1 , a r g v [ i ] ) ; }

. . .

NB

The first argument is always the program name! This means that argc is always at least 1 and argv contains at least one value.

Andrew Taylor COMP1911 Computing 1A

Command-line Arguments

By default, command-line arguments are space delimited. We can use quotes if arguments include spaces.

% ./prog1 nospace one space

% ./prog2 nospace "one space"

In the above, prog1 sees three command-line arguments, while prog2 sees only two. What about?

% ./prog3 *.c

% ./prog4 10 < in > out

Sometimes argv is typed as: char **argv.

Andrew Taylor COMP1911 Computing 1A

References

Related documents

The bank may close your credit card account, if you inform the bank that you don’t want the bank to collect, use, and share your personal information. WHAT WILL BE DONE WITH

• Standards defining names of machine (ASCII) • Machines do not support special characters • Conversion of special characters into.. punycode (use IDN conversion

Since 1970’s Slotted-Aloha is one of the most efficient protocol practically used in different communication scenar- ios [19]. Channels in Slotted Aloha is divided in to slots,

Follow MENU options to configure the basic features of the phone – for example: the IP address if using a static IP.. For details, please check GXP Series

Among the group of static governance indicators, the result shows that established foreign banks, STATIC_FOR , are more cost efficient than state-owned, STATIC_STA ,

RealNetworks offers free access to Internet radio and some music videos on its Web site, recently including an exclusive live concert clip of The Vines, as well as paid content

Mood disorder segment and schizophrenia segment were formed based on diagnoses, while other mental health, substance abuse and super service user segments were formed

GABB Board of Directors- (Georgia Association of Business Brokers) – 9 years GABB Secretary. GABB