• No results found

ISO Syntax Support

In document AINN ICT4101 FS (Page 51-57)

2.15 The SWI-Prolog syntax

2.15.1 ISO Syntax Support

This section lists various extensions w.r.t. the ISO Prolog syntax. Processor Character Set

The processor character set specifies the class of each character used for parsing Prolog source text. Character classification is fixed toUnicode. See also section2.18.

Nested comments

SWI-Prolog allows for nesting /* ...*/ comments. Where the ISO standard accepts /* .../* ...*/ as a comment, SWI-Prolog will search for a terminating */. This is useful if some code with/* ...*/comment statements in it should be commented out. This modification also avoids unintended commenting in the example below, where the closing*/of the first comment has been forgotten.19

19

Recent copies of GCC give a style warning if/*is encountered in a comment, which suggests that this problem has been recognised more widely.

/* comment code

/* second comment */ code

Character Escape Syntax

Within quoted atoms (using single quotes: ’<atom>’) special characters are represented using es- cape sequences. An escape sequence is led in by the backslash (\) character. The list of escape sequences is compatible with the ISO standard but contains some extensions, and the interpretation of numerically specified characters is slightly more flexible to improve compatibility. Undefined escape characters raise asyntax errorexception.20

\a

Alert character. Normally the ASCII character 7 (beep). \b

Backspace character. \c

No output. All input characters up to but not including the first non-layout character are skipped. This allows for the specification of pretty-looking long lines. Not supported by ISO. Example:

format(’This is a long line that looks better if it was \c split across multiple physical lines in the input’) \hNEWLINEi

When in ISO mode (see the Prolog flagiso), only skip this sequence. In native mode, white space that follows the newline is skipped as well and a warning is printed, indicating that this construct is deprecated and advising to use \c. We advise using \c or putting the layout beforethe\, as shown below. Using\cis supported by various other Prolog implementations and will remain supported by SWI-Prolog. The style shown below is the most compatible solution.21

format(’This is a long line that looks better if it was \ split across multiple physical lines in the input’)

instead of 20

Up to SWI-Prolog 6.1.9, undefined escape characters were copied verbatim, i.e., removing the backslash.

21Future versions will interpret\h

2.15. THE SWI-PROLOG SYNTAX 51

format(’This is a long line that looks better if it was\ split across multiple physical lines in the input’) \e

Escape character (ASCII 27). Not ISO, but widely supported. \f

Form-feed character. \n

Next-line character. \r

Carriage-return only (i.e., go back to the start of the line). \s

Space character. Intended to allow writing0’\sto get the character code of the space charac- ter. Not ISO.

\t

Horizontal tab character. \v

Vertical tab character (ASCII 11). \xXX..\

Hexadecimal specification of a character. The closing \ is obligatory according to the ISO standard, but optional in SWI-Prolog to enhance compatibility with the older Edinburgh stan- dard. The code\xa\3emits the character 10 (hexadecimal ‘a’) followed by ‘3’. Characters specified this way are interpreted as Unicode characters. See also\u.

\uXXXX

Unicode character specification where the character is specified using exactly 4 hexadecimal digits. This is an extension to the ISO standard, fixing two problems. First, where\xdefines a numeric character code, it doesn’t specify the character set in which the character should be interpreted. Second, it is not needed to use the idiosyncratic closing\ISO Prolog syntax. \UXXXXXXXX

Same as\uXXXX, but using 8 digits to cover the whole Unicode set. \40

Octal character specification. The rules and remarks for hexadecimal specifications apply to octal specifications as well.

\\

Escapes the backslash itself. Thus,’\\’is an atom consisting of a single\. \’

Single quote. Note that ’\’’ and ’’’’ both describe the atom with a single ’, i.e., ’\’’ == ’’’’is true.

\"

Double quote. \‘

Back quote.

Character escaping is only available ifcurrent prolog flag(character escapes, true) is active (default). Seecurrent prolog flag/2. Character escapes conflict withwritef/2 in two ways: \40 is interpreted as decimal 40 by writef/2, but as octal 40 (decimal 32) by read. Also, the writef/2 sequence \l is illegal. It is advised to use the more widely supported format/[2,3]predicate instead. If you insist upon using writef/2, either switch character escapestofalse, or use double\\, as inwritef(’\\l’).

Syntax for non-decimal numbers

SWI-Prolog implements both Edinburgh and ISO representations for non-decimal numbers. Accord- ing to Edinburgh syntax, such numbers are written ashradixi’<number>, wherehradixiis a number between 2 and 36. ISO defines binary, octal and hexadecimal numbers using0[bxo]hnumberi. For example:A is 0b100 \/ 0xf00is a valid expression. Such numbers are always unsigned. Using digit groups in large integers

SWI-Prolog supports splitting long integers intodigit groups. Digit groups can be separated with the sequencehunderscorei, hoptional white spacei. If the hradixi is 10 or lower, they may also be separated with exactly one space. The following all express the integer 1 million:

1_000_000 1 000 000

1_000_/*more*/000

Integers can be printed using this notation withformat/2, using the˜Iformat specifier. For exam- ple:

?- format(’˜I’, [1000000]). 1_000_000

The current syntax has been proposed by Ulrich Neumerkel on the SWI-Prolog mailinglist. NaN and Infinity floats and their syntax

SWI-Prolog supports reading an printing ‘special’ floating point values according toProposal for Pro- log Standard core update wrt floating point arithmeticby Joachim Schimpf and available in ECLiPSe Prolog. In particular,

• Infinity is printed as 1.0Inf or-1.0Inf. Any sequence matching the regular expression [+-]?\sd+[.]\sd+Infis mapped to plus or minus infinity.

2.15. THE SWI-PROLOG SYNTAX 53

• NaN (Not a Number) is printed as 1.xxxNaN, where 1.xxx is the float after replacing the exponent by ‘1’. Such numbers are read, resulting in the sameNaN. TheNaNconstant can also be produced using the functionnan/0, e.g.,

?- A is nan. A = 1.5NaN.

Note that, compliant with the ISO standard, SWI-Prolog arithmetic (see section4.27) never returns one of the above values but instead raises anexception, e.g.,

?- A is 1/0.

ERROR: //2: Arithmetic: evaluation error: ‘zero_divisor’

There is one exception to this rule. For compatibility the functionsinf/0andnan/0return1.0Inf and the default systemNaN. The ability to create, read and write such values is primarily provided to exchange data with languages that can represent the full range of IEEE doubles.

Force only underscore to introduce a variable

According to the ISO standard and most Prolog systems, identifiers that start with an uppercase letter or an underscore are variables. In the past, Prolog by BIM provided an alternative syntax, where only the underscore ( ) introduces a variable. As of SWI-Prolog 7.3.27 SWI-Prolog supports this alternative syntax, controlled by the Prolog flagvar prefix. As thecharacter escapesflag, this flag is maintained per module, where the default isfalse, supporting standard syntax.

Having only the underscore introduce a variable is particularly useful if code contains identifiers for case sensitive external languages. Examples are the RDF library where code frequently specifies property and class names22and the R interface for specifying functions or variables that start with an uppercase character. Lexical databases were part of the terms start with an uppercase letter is another category were the readability of the code improves using this option.

Unicode Prolog source

The ISO standard specifies the Prolog syntax in ASCII characters. As SWI-Prolog supports Unicode in source files we must extend the syntax. This section describes the implication for the source files, while writing international source files is described in section3.1.3.

The SWI-Prolog Unicode character classification is based on version 6.0.0 of the Unicode stan- dard. Please note thatchar type/2and friends, intended to be used with all text except Prolog source code, is based on the C library locale-based classification routines.

• Quoted atoms and strings

Any character of any script can be used in quoted atoms and strings. The escape sequences \uXXXX and \UXXXXXXXX (see section 2.15.1) were introduced to specify Unicode code points in ASCII files.

• Atoms and Variables

We handle them in one item as they are closely related. The Unicode standard defines a syntax for identifiers in computer languages.23 In this syntax identifiers start withID Startfollowed by a sequence ofID Continuecodes. Such sequences are handled as a single token in SWI- Prolog. The token is a variableiff it starts with an uppercase character or an underscore ( ). Otherwise it is an atom. Note that many languages do not have the notion of character case. In such languages variablesmustbe written as_name.

• White space

All characters marked as separators (Z*) in the Unicode tables are handled as layout characters. • Control and unassigned characters

Control and unassigned (C*) characters produce a syntax error if encountered outside quoted atoms/strings and outside comments.

• Other characters

The first 128 characters follow the ISO Prolog standard. Unicode symbol and punctuation characters (general category S* and P*) act as glueing symbol characters (i.e., just like==: an unquoted sequence of symbol characters are combined into an atom).

Other characters (this is mainlyNo:a numeric character of other type) are currently handled as ‘solo’.

Singleton variable checking

Asingleton variableis a variable that appears only one time in a clause. It can always be replaced by _, the anonymousvariable. In some cases, however, people prefer to give the variable a name. As mistyping a variable is a common mistake, Prolog systems generally give a warning (controlled bystyle check/1) if a variable is used only once. The system can be informed that a variable is meant to appear once by startingit with an underscore, e.g.,_Name. Please note that any variable, except plain_, shares with variables of the same name. The term t(_X, _X) is equivalent to t(X, X), which isdifferentfromt(_, _).

As Unicode requires variables to start with an underscore in many languages, this schema needs to be extended.24 First we define the two classes of named variables.

• Named singleton variables

Named singletons start with a double underscore (__) or a single underscore followed by an uppercase letter, e.g.,__varor_Var.

• Normal variables

All other variables are ‘normal’ variables. Note this makes_vara normal variable.25

Any normal variable appearing exactly once in the clause and any named singleton variables appearing more than once are reported. Below are some examples with warnings in the right column. Singleton messages can be suppressed using thestyle check/1directive.

23

http://www.unicode.org/reports/tr31/

24

After a proposal by Richard O’Keefe.

In document AINN ICT4101 FS (Page 51-57)