• No results found

In some locales, the conventions for lexicographic ordering differ from the strict numeric ordering of character codes. For example, in Spanish most glyphs with diacritical marks such as accents are not considered distinct letters for the purposes of collation. On the other hand, the two-character sequence ‘ll’ is treated as a single letter that is collated immediately after ‘l’.

You can use the functions strcoll and strxfrm (declared in the head- ers file ‘string.h’) andwcscollandwcsxfrm(declared in the headers file ‘wchar’) to compare strings using a collation ordering appropriate for the current locale. The locale used by these functions in particular can be specified by setting the locale for theLC_COLLATEcategory (seeChapter 7 [Locales and Internation- alization], page 181).

In the standard C locale, the collation sequence forstrcollis the same as that forstrcmp. Similarly,wcscollandwcscmpare the same in this situation.

Effectively, the way these functions work is by applying a mapping to transform the characters in a string to a byte sequence that represents the string’s position in

the collating sequence of the current locale. Comparing two such byte-sequences in a simple fashion is equivalent to comparing the strings with the locale’s collating sequence.

The functions strcolland wcscollperform this translation implicitly, in order to do one comparison. By contrast, strxfrmandwcsxfrmperform the mapping explicitly. If you are making multiple comparisons using the same string or set of strings, it is likely to be more efficient to use strxfrm orwcsxfrm

to transform all the strings just once, and subsequently compare the transformed strings withstrcmporwcscmp.

Function

intstrcoll (const char *s1, const char *s2)

The strcollfunction is similar tostrcmpbut uses the collating sequence of the current locale for collation (theLC_COLLATElocale).

Function

intwcscoll (const wchar_t *ws1, const wchar_t *ws2)

The wcscollfunction is similar towcscmpbut uses the collating sequence of the current locale for collation (theLC_COLLATElocale).

Here is an example of sorting an array of strings, using strcollto compare them. The actual sort algorithm is not written here; it comes from qsort (see

Section 12.3 [Array Sort Function], page 344). The job of the code shown here is to say how to compare the strings while sorting them (later on in this section, we will show a way to do this more efficiently usingstrxfrm).

/* This is the comparison function used withqsort. */

int

compare_elements (char **p1, char **p2) {

return strcoll (*p1, *p2); }

/* This is the entry point—the function to sort strings using the locale’s collating sequence. */

void

sort_strings (char **array, int nstrings) {

/* Sorttemp_arrayby comparing the strings. */ qsort (array, nstrings,

sizeof (char *), compare_elements); }

Function

size_tstrxfrm (char *restrictto, const char *restrictfrom, size_tsize)

The functionstrxfrmtransforms the stringfromusing the collation transfor- mation determined by the locale currently selected for collation, and stores the

Chapter 5: String and Array Utilities 111 transformed string in the arrayto. Up tosizecharacters (including a terminating null character) are stored.

The behavior is undefined if the strings to andfrom overlap (see Section 5.4 [Copying and Concatenation], page 93).

The return value is the length of the entire transformed string. This value is not affected by the value of size, but if it is greater than or equal tosize, it means that the transformed string did not entirely fit in the arrayto. In this case, only as much of the string as actually fits was stored. To get the whole transformed string, callstrxfrmagain with a bigger output array.

The transformed string may be longer than the original string, and it may also be shorter.

If size is zero, no characters are stored in to. In this case, strxfrmsimply returns the number of characters that would be the length of the transformed string. This is useful for determining what size the allocated array should be. It does not matter whattois ifsize is zero;tomay even be a null pointer.

Function

size_twcsxfrm (wchar_t *restrictwto, const wchar_t *wfrom, size_tsize)

The functionwcsxfrmtransforms wide-character stringwfromusing the col- lation transformation determined by the locale currently selected for collation, and stores the transformed string in the arraywto. Up tosize wide characters (including a terminating null character) are stored.

The behavior is undefined if the stringswtoandwfromoverlap (seeSection 5.4 [Copying and Concatenation], page 93).

The return value is the length of the entire transformed wide-character string. This value is not affected by the value ofsize, but if it is greater than or equal to

size, it means that the transformed wide-character string did not entirely fit in the arraywto. In this case, only as much of the wide-character string as actually fits was stored. To get the whole transformed wide-character string, callwcsxfrm

again with a bigger output array.

The transformed wide-character string may be longer than the original wide- character string, and it may also be shorter.

Ifsize is zero, no characters are stored into. In this case,wcsxfrmsimply re- turns the number of wide characters that would be the length of the transformed wide-character string. This is useful for determining what size the allocated ar- ray should be (remember to multiply withsizeof (wchar_t)). It does not matter whatwtois ifsize is zero;wtomay even be a null pointer.

Here is an example of how you can use strxfrmwhen you plan to do many comparisons. It does the same thing as the previous example, but much faster, because it has to transform each string only once, no matter how many times it is compared with other strings. Even the time needed to allocate and free storage is much less than the time we save, when there are many strings.

/* This is the comparison function used withqsort

to sort an array ofstruct sorter. */

int

compare_elements (struct sorter *p1, struct sorter *p2) {

return strcmp (p1->transformed, p2->transformed); }

/* This is the entry point—the function to sort strings using the locale’s collating sequence. */

void

sort_strings_fast (char **array, int nstrings) {

struct sorter temp_array[nstrings]; int i;

/* Set uptemp_array. Each element contains one input string and its transformed string. */ for (i = 0; i < nstrings; i++)

{

size_t length = strlen (array[i]) * 2; char *transformed;

size_t transformed_length;

temp_array[i].input = array[i];

/* First make a buffer that you guess is big enough. */ transformed = (char *) xmalloc (length);

/* Transformarray[i]. */

transformed_length = strxfrm (transformed, array[i], length);

/* If the buffer was not large enough, resize it and try again. */

if (transformed_length >= length) {

/* Allocate the needed space.+1 for terminating

NULcharacter. */

transformed = (char *) xrealloc (transformed,

transformed_length + 1);

Chapter 5: String and Array Utilities 113 how long the transformed string is. */

(void) strxfrm (transformed, array[i], transformed_length + 1); }

temp_array[i].transformed = transformed; }

/* Sorttemp_arrayby comparing transformed strings. */ qsort (temp_array, sizeof (struct sorter),

nstrings, compare_elements);

/* Put the elements back in the permanent array in their sorted order. */

for (i = 0; i < nstrings; i++) array[i] = temp_array[i].input;

/* Free the strings we allocated. */ for (i = 0; i < nstrings; i++)

free (temp_array[i].transformed); }

The interesting part of this code for the wide-character version would look like this:

void

sort_strings_fast (wchar_t **array, int nstrings) {

...

/* Transformarray[i]. */

transformed_length = wcsxfrm (transformed, array[i], length);

/* If the buffer was not large enough, resize it and try again. */

if (transformed_length >= length) {

/* Allocate the needed space.+1 for terminating

NULcharacter. */

transformed = (wchar_t *) xrealloc (transformed,

(transformed_length + 1) * sizeof (wchar_t));

/* The return value is not interesting because we know how long the transformed string is. */

(void) wcsxfrm (transformed, array[i], transformed_length + 1);

} ...

Note the additional multiplication with sizeof (wchar_t) in the realloc

call.

Compatibility Note: The string collation functions are a new feature of

ISOC90. Older C dialects have no equivalent feature. The wide-character versions were introduced in Amendment 1 toISOC90.