Amendment 1 to ISOC90 defines functions to classify wide characters. Al-
though the originalISOC90 standard already defined the typewchar_t, no func- tions operating on them were defined.
The general design of the classification functions for wide characters is more general. It allows extensions to the set of available classifications, beyond those that are always available. The POSIX standard specifies how extensions can be made, and this is already implemented in theGNUC Library implementation of the
localedefprogram.
The character class functions are normally implemented with bitsets, with a bitset per character. For a given character, the appropriate bitset is read from a table and a test is performed as to whether a certain bit is set. Which bit is tested for is determined by the class.
For the wide-character classification functions this is made visible. There is a type classification type defined, a function to retrieve this value for a given class, and a function to test whether a given character is in this class, using the classifica- tion value. On top of this the normal character classification functions as used for
charobjects can be defined.
Data type
wctype t
Thewctype_tcan hold a value which represents a character class. The only defined way to generate such a value is by using thewctypefunction.
This type is defined in ‘wctype.h’.
Function
wctype_twctype (const char *property)
Thewctypereturns a value representing a class of wide characters that is iden- tified by the stringproperty. Besides some standard properties, each locale can define its own ones. In case no property with the given name is known for the current locale selected for theLC_CTYPEcategory, the function returns zero. The properties known in every locale are
Chapter 4: Character Handling 83
"graph" "lower" "print" "punct" "space" "upper" "xdigit"
This function is declared in ‘wctype.h’.
To test the membership of a character to one of the nonstandard classes, the
ISOC standard defines a completely new function.
Function
intiswctype (wint_twc, wctype_tdesc)
This function returns a nonzero value ifwcis in the character class specified by
desc.desc must previously be returned by a successful call towctype. This function is declared in ‘wctype.h’.
To make it easier to use the commonly used classification functions, they are defined in the C library. There is no need to usewctypeif the property string is one of the known character classes. In some situations it is desirable to construct the property strings, and then it is important thatwctypecan also handle the standard classes.
Function
intiswalnum (wint_twc)
This function returns a nonzero value if wc is an alphanumeric character (a letter or number). In other words, if eitheriswalphaoriswdigitis true of a character, theniswalnumis also true.
This function can be implemented using:
iswctype (wc, wctype ("alnum"))
It is declared in ‘wctype.h’.
Function
intiswalpha (wint_twc)
Returns true if wc is an alphabetic character (a letter). If iswlower or
iswupperis true of a character, theniswalphais also true.
In some locales, there may be additional characters for which iswalphais true—letters that are neither uppercase nor lowercase. But in the standard"C"
locale, there are no such additional characters. This function can be implemented using:
iswctype (wc, wctype ("alpha"))
It is declared in ‘wctype.h’.
Function
intiswcntrl (wint_twc)
Returns true ifwcis a control character; that is, a character that is not a printing character.
This function can be implemented using:
iswctype (wc, wctype ("cntrl"))
Function
intiswdigit (wint_twc)
Returns true ifwcis a digit (e.g., ‘0’ through ‘9’). Please note that this function does not only return a nonzero value fordecimaldigits, but for all kinds of digits. A consequence is that code like the following willnotwork unconditionally for wide characters: n = 0; while (iswdigit (*wc)) { n *= 10; n += *wc++ - L’0’; }
This function can be implemented using:
iswctype (wc, wctype ("digit"))
It is declared in ‘wctype.h’.
Function
intiswgraph (wint_twc)
Returns true if wc is a graphic character; that is, a character that has a glyph associated with it. The white-space characters are not considered graphic. This function can be implemented using:
iswctype (wc, wctype ("graph"))
It is declared in ‘wctype.h’.
Function
intiswlower (wint_twc)
Returns true ifwc is a lowercase letter. The letter need not be from the Latin alphabet—any representable alphabet is valid.
This function can be implemented using:
iswctype (wc, wctype ("lower"))
It is declared in ‘wctype.h’.
Function
intiswprint (wint_twc)
Returns true if wc is a printing character. Printing characters include all the graphic characters, plus the space (‘ ’) character.
This function can be implemented using:
iswctype (wc, wctype ("print"))
It is declared in ‘wctype.h’.
Function
intiswpunct (wint_twc)
Returns true ifwcis a punctuation character. This means any printing character that is not alphanumeric or a space character.
Chapter 4: Character Handling 85
iswctype (wc, wctype ("punct"))
It is declared in ‘wctype.h’.
Function
intiswspace (wint_twc)
Returns true if wc is a white-space character. In the standard "C" locale,
iswspacereturns true for only the standard white-space characters:
L’ ’ space L’\f’ formfeed L’\n’ newline L’\r’ carriage return L’\t’ horizontal tab L’\v’ vertical tab
This function can be implemented using:
iswctype (wc, wctype ("space"))
It is declared in ‘wctype.h’.
Function
intiswupper (wint_twc)
Returns true ifwc is an uppercase letter. The letter need not be from the Latin alphabet—any representable alphabet is valid.
This function can be implemented using:
iswctype (wc, wctype ("upper"))
It is declared in ‘wctype.h’.
Function
intiswxdigit (wint_twc)
Returns true ifwcis a hexadecimal digit. Hexadecimal digits include the normal decimal digits ‘0’ through ‘9’ and the letters ‘A’ through ‘F’ and ‘a’ through ‘f’.
This function can be implemented using:
iswctype (wc, wctype ("xdigit"))
It is declared in ‘wctype.h’.
The GNU C Library also provides a function that is not defined in the ISOC standard but that is available as a version for single-byte characters as well.
Function
intiswblank (wint_twc)
Returns true ifwcis a blank character; that is, a space or a tab. This function is aGNUextension. It is declared in ‘wchar.h’.