• No results found

Working with strings and regular expressions

In document Kotlin in Action v12 MEAP (Page 70-74)

Defining and calling f 3 unctions

TOP-LEVEL PROPERTIES

3.5 Working with strings and regular expressions

The section 7.4 will describe when it’s possible to destructure an expression and assign it to several variables.

Theto function is an extension function. You can create a pair of any elements, which means it’s an extension to a generic receiver: you can write1 to "one","one"

to 1,list to list.size(), and so one.

Let’s look at the declaration of themapOffunction:

Like listOf, mapOf accepts a variable number of arguments, but this time they should be pairs of keys and values.

Even though the creation of a new map may look like a special construct in Kotlin, it’s a regular function with a concise syntax. Next, let’s discuss how extensions simplify dealing with strings and regular expressions.

3.5 Working with strings and regular expressions

Kotlin strings are exactly the same things as Java strings. You can pass a string created in Kotlin code to any Java method, and you can also use any Kotlin standard library methods on strings that you receive from Java code. No conversion is involved, and no additional wrapper objects are created.

Kotlin makes working with standard Java strings more enjoyable by providing a bunch of useful extension functions. Also, it hides some confusing methods, adding extensions that are clearer. As our first example of the API differences, let’s look at how Kotlin handles splitting strings.

3.5.1 Splitting strings

You’re probably familiar with the split method on String. Everyone uses it, but sometimes people complain about it on Stack Overflow8: "Thesplit() method in Java doesn’t work on a dot." It’s a common trap to write "12.345-6.A".split(".") and to expect an array[12, 345-6, A] as a result. But Java’ssplit method returns an empty array! That happens because it takes a regular expression as a parameter, and it splits a string into several strings according to the expression. Here, the dot

expression that denotes any character.

. is a regular

Footnote 8 http://stackoverflow.com.

Kotlin hides the confusing method and provides as replacements several overloaded

fun <K, V> mapOf(vararg values: Pair<K, V>): Map<K, V>

extensions named split that have different arguments. The one that takes a regular expression requires a value ofRegex type, notString. This ensures that it’s always clear whether a string passed to a method is interpreted as plain text or a regular expression.

Here’s how you’d split the string by either a dot or a dash:

Creates a regular expression explicitly

Kotlin uses exactly the same regular-expression syntax as in Java. The pattern here matches a dot (we escaped it to indicate that we mean a literal character, not a wildcard) or a dash. The APIs for working with regular expressions are also similar to the standard Java library APIs, but they’re more idiomatic. For instance, in Kotlin you use an extension functiontoRegex to convert a string into a regular expression.

But for such a simple case, you don’t need to use regular expressions. The other overload of the split extension function in Kotlin takes an arbitrary number of delimiters as plain-text strings:

Specifies several delimiters

Note that you can specify character arguments instead, and write

"12.345-6.A".split('.', '-'), which will lead to the same result. This method hides the similar Java method that can take only one character as a delimiter.

3.5.2 Regular expressions and triple-quoted strings

Let’s look at another example with two different implementations: the first one will use extensions, and the second will work with regular expressions. Your task will be to to parse a file’s full path name into its components: a directory, a filename, and an extension. The Kotlin standard library contains functions to get the substring before (or after) the first (or the last) occurrence of the given delimiter. Here’s how you can use them to solve this task (also see figure 3.4):

>>> println("12.345-6.A".split("\\.|-".toRegex())) [12, 345, 6, A]

>>> println("12.345-6.A".split(".", "-")) [12, 345, 6, A]

fun parsePath(path: String) {

val directory = path.substringBeforeLast("/") val fullName = path.substringAfterLast("/") val fileName = fullName.substringBeforeLast(".") val extension = fullName.substringAfterLast(".")

println("Dir: $directory, name: $fileName, ext: $extension") }

>>> parsePath("/Users/yole/kotlin-book/chapter.adoc") Dir: /Users/yole/kotlin-book, name: chapter, ext: adoc

Figure 3.4 Splitting a path into a directory, a filename, and a file extension by using the substringBeforeLast and substringAfterLast functions

The substring before the last slash symbol of the filepath is the path to an enclosing directory, the substring after the last dot is a file extension, and the filename goes between them.

As you can see, Kotlin makes it easier to parse strings without resorting to regular expressions, which are powerful but also sometimes hard to understand after they’ve been written. If you do want to use regular expressions, the Kotlin standard library can help. Here’s how the same task can be done using regular expressions:

In this example, the regular expression is written in a triple-quoted string. In such a string, you don’t need to escape any characters, including the backslash, so you can encode the dot symbol with\. rather than\\. as you’d write in an ordinary string literal (see figure 3.5).

Figure 3.5 The regular expression for splitting a path into a directory, a filename, and a file extension

This regular expression divides a path into three groups separated by slashes and dots.

The pattern. matches any character from the beginning, so the first group(.+) contains

fun parsePathRegexp(path: String) {

val regex = """(.+)/(.+)\.(.+)""".toRegex() val matchResult = regex.matchEntire(path) if (matchResult != null) {

val (directory, filename, extension) = matchResult.destructured println("Dir: $directory, name: $filename, ext: $extension") } }

the substring before the last slash. This substring includes all the previous slashes, because they match the pattern "any character". Similarly, the second group contains the substring before the last dot, and the third group contains all the rest.

Now let’s discuss the implementation of the parsePathRegexp function in the previous example. You create a regular expression and match it against an input path. If the match result is successful (isn’t null), you assign the value of its destructured

property to the corresponding variables. This is the same syntax used when you assigned a pair to two variables; the section 7.4 will cover the details.

3.5.3 Multiline triple-quoted strings

The purpose of triple-quoted strings is not only to avoid escaping characters. Such a string literal can contain any characters, including line breaks. That gives you an easy way to embed in your programs text containing line breaks. As an example, let’s draw some ASCII art:

The multiline string contains all the characters between the triple quotes, including indents used to format the code. If you want a better representation of such a string, you can trim the indentation (in other words, the left margin). To do that, you add a prefix to the string content, marking the end of the margin, and then call trimMargin() to delete the text before the prefix in each line. The previous example uses the dot as such a prefix.

The triple-quoted string contains line breaks, but you can’t use special characters like

\n. On the other hand, you don’t have to escape \, so the Windows-style path

"C:\\Users\\yole\\kotlin-book" can be written as

"""C:\Users\yole\kotlin-book""".

You can also use string templates in multiline strings. Because multiline strings don’t support escape sequences, you have to use an embedded expression if you need to use a literal dollar sign in the contents of your string. It looks like this: val price =

"""${'$'}99.9""".

One of the areas where multiline strings can be useful in your programs (besides games that use ASCII art) is tests. In tests, it’s fairly common to execute an operation that produces multiline text (for example, a web page fragment) and to compare the result with the expected output. Multiline strings give you a perfect solution for including the expected output as part of your test. No need for clumsy escaping or loading the text from external files—just put in some quotation marks and place the expected HTML or

val kotlinLogo = """| //

.|//.|/ \"""

>>> println(kotlinLogo.trimMargin("."))

| //|//

|/ \

other output between them. And for better formatting, use the aforementioned

trimMarginfunction, which is another example of an extension function.

Now that you can see how Kotlin gives you better APIs for the libraries you use, let’s turn our attention back to your code. You’ll see some new uses for extension functions, and we’ll also discuss a new concept: local functions.

In document Kotlin in Action v12 MEAP (Page 70-74)