Bytes and Byte Strings - 3 Built-In Datatypes

3 Built-In Datatypes

3.5 Bytes and Byte Strings

A byte is an exact integer between0and255, inclusive. Thebyte?predicate recognizes numbers that represent bytes.

Examples:

> (byte? 0)

#t> (byte? 256)

A byte string is similar to a string—see §3.4 “Strings (Unicode)”—but its content is a se-quence of bytes instead of characters. Byte strings can be used in applications that process pure ASCII instead of Unicode text. The printed form of a byte string supports such uses in particular, because a byte string prints like the ASCII decoding of the byte string, but prefixed with a#. Unprintable ASCII characters or non-ASCII bytes in the byte string are

written with octal notation. §1.3.7 “Reading

Strings” in The Racket Reference documents the fine points of the syntax of byte strings.

Examples:

> #"Apple"

#"Apple"

> (bytes-ref #"Apple" 0) 65> (make-bytes 3 65)

#"AAA"

> (define b (make-bytes 2 0))

> b

#"\0\0"

> (bytes-set! b 0 1)

> (bytes-set! b 1 255)

> b

#"\1\377"

The displayform of a byte string writes its raw bytes to the current output port (see §8

“Input and Output”). Technically, displayof a normal (i.e,. character) string prints the UTF-8 encoding of the string to the current output port, since output is ultimately defined in terms of bytes;displayof a byte string, however, writes the raw bytes with no encoding.

Along the same lines, when this documentation shows output, it technically shows the UTF-8-decoded form of the output.

Examples:

> (display #"Apple") Apple

> (display "\316\273") ; same as "Î»"

Î»> (display #"\316\273") ; UTF-8 encoding of λ λ

For explicitly converting between strings and byte strings, Racket supports three kinds of encodings directly: UTF-8, Latin-1, and the current locale’s encoding. General facilities

for byte-to-byte conversions (especially to and from UTF-8) fill the gap to support arbitrary string encodings.

Examples:

> (bytes->string/utf-8 #"\316\273")

"λ"

> (bytes->string/latin-1 #"\316\273")

"Î»"

> (parameterize ([current-locale "C"]) ; C locale supports ASCII, (bytes->string/locale #"\316\273")) ; only, so...

bytes-ąstring/locale: byte string is not a valid encoding for the current locale

byte string: #"z316z273"

> (let ([cvt (bytes-open-converter "cp1253" ; Greek code page

"UTF-8")]

[dest (make-bytes 2)])

(bytes-convert cvt #"\353" 0 1 dest) (bytes-close-converter cvt)

(bytes->string/utf-8 dest))

"λ"

§4.5 “Byte Strings”

in The Racket Referenceprovides more on byte strings and byte-string procedures.

3.6 Symbols

A symbol is an atomic value that prints like an identifier preceded with'. An expression that starts with'and continues with an identifier produces a symbol value.

Examples:

> 'a

'a> (symbol? 'a)

For any sequence of characters, exactly one corresponding symbol is interned; calling the string->symbolprocedure, orreading a syntactic identifier, produces an interned symbol.

Since interned symbols can be cheaply compared witheq?(and thuseqv?orequal?), they serve as a convenient values to use for tags and enumerations.

Symbols are case-sensitive. By using a#ci prefix or in other ways, the reader can be made to case-fold character sequences to arrive at a symbol, but the reader preserves case by default.

Examples:

> (eq? 'a 'a)

#t> (eq? 'a (string->symbol "a"))

#t> (eq? 'a 'b)

#f> (eq? 'a 'A)

#f> #ci'A 'a

Any string (i.e., any character sequence) can be supplied tostring->symbolto obtain the corresponding symbol. For reader input, any character can appear directly in an identifier, except for whitespace and the following special characters:

( ) [ ] { } " , ' ` ; # | \

Actually,#is disallowed only at the beginning of a symbol, and then only if not followed by

%; otherwise,#is allowed, too. Also,.by itself is not a symbol.

Whitespace or special characters can be included in an identifier by quoting them with|or

\. These quoting mechanisms are used in the printed form of identifiers that contain special characters or that might otherwise look like numbers.

Examples:

> (string->symbol "one, two") '|one, two|

> (string->symbol "6") '|6|

§1.3.2 “Reading Symbols” in The Racket Reference documents the fine points of the syntax of symbols.

Thewritefunction prints a symbol without a'prefix. Thedisplayform of a symbol is the same as the corresponding string.

Examples:

> (write 'Apple) Apple

> (display 'Apple) Apple

> (write '|6|)

|6|> (display '|6|) 6

The gensym and string->uninterned-symbol procedures generate fresh uninterned

symbols that are not equal (according toeq?) to any previously interned or uninterned sym-bol. Uninterned symbols are useful as fresh tags that cannot be confused with any other value.

Examples:

> (define s (gensym))

> s

'g42> (eq? s 'g42)

#f> (eq? 'a (string->uninterned-symbol "a"))

A keyword value is similar to a symbol (see §3.6 “Symbols”), but its printed form is prefixed

with#:. §1.3.15 “Reading

Keywords” in The Racket Reference documents the fine points of the syntax of keywords.

> (eq? '#:apple (string->keyword "apple"))

More precisely, a keyword is analogous to an identifier; in the same way that an identifier can be quoted to produce a symbol, a keyword can be quoted to produce a value. The same term “keyword” is used in both cases, but we sometimes use keyword value to refer more specifically to the result of a quote-keyword expression or ofstring->keyword. An unquoted keyword is not an expression, just as an unquoted identifier does not produce a symbol:

Examples:

> not-a-symbol-expression not-a-symbol-expression: undefined;

cannot reference an identifier before its definition in module: top-level

> #:not-a-keyword-expression

eval:2:0: #%datum: keyword misused as an expression at: #:not-a-keyword-expression

Despite their similarities, keywords are used in a different way than identifiers or symbols.

Keywords are intended for use (unquoted) as special markers in argument lists and in certain syntactic forms. For run-time flags and enumerations, use symbols instead of keywords. The example below illustrates the distinct roles of keywords and symbols.

Examples:

> (define dir (find-system-path 'temp-dir)) ; not '#:temp-dir

> (with-output-to-file (build-path dir "stuff.txt") (lambda () (printf "example\n"))

; optional #:mode argument can be 'text or 'binary

#:mode 'text

; optional #:exists argument can be 'replace, 'truncate, ...

#:exists 'replace)

In document The Racket Guide. Version Matthew Flatt, Robert Bruce Findler, and PLT. February 1, 2022 (Page 50-55)