• No results found

String Functions

In document IBM SPSS Modeler 15 User s Guide (Page 151-156)

In CLEM, you can perform the following operations with strings: Compare strings

Create strings Access characters

In CLEM, a string is any sequence of characters between matching double quotation marks ("string quotes"). Characters (CHAR) can be any single alphanumeric character. They are declared in CLEM expressions using single backquotes in the form of`<character>`, such as`z`,`A`, or`2`. Characters that are out-of-bounds or negative indices to a string will result in undefined behavior. Note. Comparisons between strings that do and do not use SQL pushback may generate different results where trailing spaces exist.

Function Result Description

allbutfirst(N, STRING) String Returns a string, which isSTRINGwith the

firstNcharacters removed.

allbutlast(N, STRING) String Returns a string, which isSTRINGwith the last characters removed.

alphabefore(STRING1, STRING2) Boolean

Used to check the alphabetical ordering of strings. Returns true ifSTRING1precedes

STRING2.

endstring(LENGTH, STRING) String

Extracts the lastNcharacters from the specified string. If the string length is less than or equal to the specified length, then it is unchanged.

hasendstring(STRING, SUBSTRING) Integer This function is the same as

isendstring(SUBSTRING, STRING). hasmidstring(STRING, SUBSTRING) Integer

This function is the same as

ismidstring(SUBSTRING, STRING)

(embedded substring).

hasstartstring(STRING, SUBSTRING) Integer This function is the same as

isstartstring(SUBSTRING, STRING).

hassubstring(STRING, N, SUBSTRING) Integer

This function is the same as

issubstring(SUBSTRING, N, STRING), whereNdefaults to 1.

count_substring(STRING, SUBSTRING) Integer

Returns the number of times the specified substring occurs within the string. For example,

count_substring("foooo.txt", "oo")returns 3.

hassubstring(STRING, SUBSTRING) Integer

This function is the same as

issubstring(SUBSTRING, 1, STRING), whereNdefaults to 1.

isalphacode(CHAR) Boolean

Returns a value of true ifCHARis a character in the specified string (often afield name) whose character code is a letter. Otherwise, this function returns a value of 0. For example,isalphacode(produce_num(1)).

isendstring(SUBSTRING, STRING) Integer

If the stringSTRINGends with the substring SUBSTRING, then this function returns the integer subscript ofSUBSTRINGinSTRING. Otherwise, this function returns a value of 0.

islowercode(CHAR) Boolean

Returns a value of true ifCHARis a lowercase letter character for the specified string (often afield name). Otherwise, this function returns a value of 0. For example, bothislowercode(``)and

islowercode(country_name(2))are valid expressions.

143 CLEM Language Reference

Function Result Description

ismidstring(SUBSTRING, STRING) Integer

IfSUBSTRINGis a substring ofSTRING

but does not start on thefirst character of STRINGor end on the last, then this function returns the subscript at which the substring starts. Otherwise, this function returns a value of 0.

isnumbercode(CHAR) Boolean

Returns a value of true ifCHARfor the specified string (often afield name) is a character whose character code is a digit. Otherwise, this function returns a value of 0. For example,isnumbercode(product_id(2)).

isstartstring(SUBSTRING, STRING) Integer

If the stringSTRINGstarts with the substring SUBSTRING, then this function returns the subscript 1. Otherwise, this function returns a value of 0.

issubstring(SUBSTRING, N, STRING) Integer

Searches the stringSTRING, starting from itsNth character, for a substring equal to the stringSUBSTRING. If found, this function returns the integer subscript at which the matching substring begins. Otherwise, this function returns a value of 0. IfNis not given, this function defaults to 1.

issubstring(SUBSTRING, STRING) Integer

Searches the stringSTRING, starting from itsNth character, for a substring equal to the stringSUBSTRING. If found, this function returns the integer subscript at which the matching substring begins. Otherwise, this function returns a value of 0. IfNis not given, this function defaults to 1.

issubstring_count(SUBSTRING, N, STRING): Integer

Returns the index of theNth occurrence of SUBSTRINGwithin the specifiedSTRING. If there are fewer thanNoccurrences of SUBSTRING, 0 is returned.

issubstring_lim(SUBSTRING, N, STARTLIM,

ENDLIM, STRING) Integer

This function is the same asissubstring, but the match is constrained to start on or before the subscriptSTARTLIMand to end on or before the subscriptENDLIM.

TheSTARTLIMorENDLIMconstraints

may be disabled by supplying a value of false for either argument—for example,

issubstring_lim(SUBSTRING, N, false, false, STRING)is the same asissubstring.

isuppercode(CHAR) Boolean

Returns a value of true ifCHARis an uppercase letter character. Otherwise, this function returns a value of 0. For example, bothisuppercode(``)and

isuppercode(country_name(2))are valid expressions.

last(CHAR) String Returns the last character(which must be at least one character long).CHARofSTRING

length(STRING) Integer

Returns the length of the string

STRING—that is, the number of characters in it.

Function Result Description

locchar(CHAR, N, STRING) Integer

Used to identify the location of characters in symbolicfields. The function searches the stringSTRINGfor the characterCHAR, starting the search at theNth character ofSTRING. This function returns a value indicating the location (starting atN) where the character is found. If the character is not found, this function returns a value of 0. If the function has an invalid offset(N)(for example, an offset that is beyond the length of the string), this function returns$null$. For example,locchar(`n`, 2, web_page)

searches thefield calledweb_pagefor the`n`

character beginning at the second character in thefield value.

Note: Be sure to use single backquotes to encapsulate the specified character.

locchar_back(CHAR, N, STRING) Integer

Similar tolocchar, except that the search is performed backward starting from theNth character. For example,locchar_back(`n`, 9, web_page)searches thefieldweb_page starting from the ninth character and moving backward toward the start of the string. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns$null$. Ideally, you should uselocchar_backin conjunction with the functionlength(<field>)

to dynamically use the length of the current value of thefield. For example,

locchar_back(`n`, (length(web_page)), web_page). lowertoupper(CHAR) lowertoupper (STRING) CHARor String

Input can be either a string or character, which is used in this function to return a new item of the same type, with any lowercase characters converted to their uppercase equivalents. For example,

lowertoupper(`a`),lowertoupper(“My string”), andlowertoupper(field_name(2))

are all valid expressions.

matches Boolean

Returns true if a string matches a specified pattern. The pattern must be a string literal; it cannot be afield name containing a pattern. A question mark (?) can be included in the pattern to match exactly one character; an asterisk (*) matches zero or more characters. To match a literal question mark or asterisk (rather than using these as wildcards), a backslash (\) can be used as an escape character.

replace(SUBSTRING, NEWSUBSTRING,

STRING) String

Within the specifiedSTRING, replace all instances ofSUBSTRINGwith

NEWSUBSTRING.

replicate(COUNT, STRING) String Returns a string that consists of the original string copied the specified number of times.

145 CLEM Language Reference

Function Result Description

stripchar(CHAR,STRING) String

Enables you to remove specified characters from a string orfield. You can use this function, for example, to remove extra symbols, such as currency notations, from data to achieve a simple number or name. For example, using the syntaxstripchar(`$`, 'Cost')returns a newfield with the dollar sign removed from all values.

Note: Be sure to use single backquotes to encapsulate the specified character.

skipchar(CHAR, N, STRING) Integer

Searches the stringSTRINGfor any character other thanCHAR, starting at the Nth character. This function returns an integer substring indicating the point at which one is found or 0 if every character from theNth onward is aCHAR. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns$null$.

loccharis often used in conjunction with the

skipcharfunctions to determine the value ofN(the point at which to start searching the string). For example,skipchar(`s`, (locchar(`s`, 1, "MyString")), "MyString").

skipchar_back(CHAR, N, STRING) Integer

Similar toskipchar, except that the search is performedbackward, starting from theNth character.

startstring(LENGTH, STRING) String

Extracts thefirstNcharacters from the specified string. If the string length is less than or equal to the specified length, then it is unchanged.

strmember(CHAR, STRING) Integer

Equivalent tolocchar(CHAR, 1, STRING). It returns an integer substring indicating the point at whichCHARfirst occurs, or 0. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns$null$.

subscrs(N, STRING) CHAR

Returns theNth characterCHARof the input stringSTRING. This function can also be written in a shorthand form asSTRING(N). For example,lowertoupper(“name”(1))is a valid expression.

substring(N, LEN, STRING) String

Returns a stringSUBSTRING, which consists of theLENcharacters of the stringSTRING, starting from the character at subscriptN.

substring_between(N1, N2, STRING) String

Returns the substring ofSTRING, which begins at subscriptN1and ends at subscript N2.

trim(STRING) String Removes leading and trailing white spacecharacters from the speci

fied string.

trim_start(STRING) String Removes leading white space charactersfrom the speci

fied string.

trimend(STRING) String Removes trailing white space characters from the specified string.

Function Result Description

unicode_char(NUM) CHAR Returns the character with Unicode valueNUM.

unicode_value(CHAR) NUM Returns the Unicode value ofCHAR

uppertolower(CHAR) uppertolower (STRING)

CHARor

String

Input can be either a string or character and is used in this function to return a new item of the same type with any uppercase characters converted to their lowercase equivalents. Note: Remember to specify strings with double quotes and characters with single backquotes. Simplefield names should be specified without quotes.

SoundEx Functions

SoundEx is a method used tofind strings when the sound is known but the precise spelling is not. Developed in 1918, it searches out words with similar sounds based on phonetic assumptions about how certain letters are pronounced. It can be used to search names in a database, for example, where spellings and pronunciations for similar names may vary. The basic SoundEx algorithm is documented in a number of sources and, despite known limitations (for example, leading letter combinations such asphandfwill not match even though they sound the same), is supported in some form by most databases.

Function Result Description

soundex(STRING) Integer Returns the four-character SoundEx code forthe speci

fiedSTRING.

soundex_difference(STRING1, STRING2) Integer

Returns an integer between 0 and 4 that indicates the number of characters that are the same in the SoundEx encoding for the two strings, where 0 indicates no similarity and 4 indicates strong similarity or identical strings.

In document IBM SPSS Modeler 15 User s Guide (Page 151-156)