SQL Language Elements - SQL-99 Complete Really

The SQL Standard has numerous rules for such basic issues as what makes a legal name and how to put together SQL syntax. The starting point for these rules is knowing what the basic scalar language elements are. The SQL basic scalar language elements are defined in the set of <SQL language character>s.

<SQL language character>

According to the SQL Standard, the syntactic element <SQL language character> defines "the terminal symbols of the SQL language and the elements of strings." In other words, you'll use

<SQL language character>s to write SQL syntax or <token>s. <SQL language character>s are case insensitive; that is, uppercase and lowercase simple Latin letters are interchangeable so that, to a SQL parser, these three words are exactly alike:

SELECT select Select

The set of <SQL language character>s contains:

• The uppercase simple Latin letters A to Z.

• The lowercase simple Latin letters a to z.

• The digits 0 to 9.

• The set of <SQL special character>s.

<SQL special character>

The set of <SQL special character>s is part of the set of <SQL language character>s and contains:break

The space character ( The left parenthesis ) The right parenthesis

" The double quote mark ' The single quote mark

% The percent sign

& The ampersand

* The asterisk or multiplication sign / The solidus or division sign + The plus sign

- The minus sign or dash

, The comma

. The period

Page 35

: The colon

; The semicolon

< The less than operator

> The greater than operator

? The question mark [ The left bracket ] The right bracket _ The underscore

| The vertical bar

= The equals operator { The left brace } The right brace

^ The circumflex

<token>

A <token> is either a <literal>, a <keyword>, an <identifier>, or an <SQL special character> or symbol — that is, a <token> is a group of characters that is recognized as a single unit by a SQL parser. For example, there are a total of 7 <token>s (SELECT, a, +, 5, FROM, t, and ;) in the following SQL statement.

SELECT a+5 FROM t;

In SQL, <token>s are grouped into two types: <nondelimiter token>s and <delimiter token>s.

The difference between them lies in the fact that, while any <token> may be followed by a

<separator>, a <nondelimiter token> must be followed either by a <separator> or a <delimiter token>.

A <nondelimiter token> is an <unsigned numeric literal>, a <national character string literal>, a

<bit string literal>, a <hex string literal>, a <keyword>, or a <regular identifier>. A <delimiter token> is a <character string literal>, a <date literal>, a <time literal>, a <timestamp literal>, an

<interval literal>, a <delimited identifier>, an <SQL special character>, or one of these symbols:break

<> The not equals operator

>= The greater than or equals operator

<= The less than or equals operator

|| The concatenation operator

??( The left trigraph

??) The right trigraph -> The right arrow

=> The keyword parameter tag

Page 36

For example, the <keyword> <token> SELECT may be followed either by a <separator> (usually a space) or by an <SQL special character>. Thus, both of the following are examples of legal SQL syntax.

SELECT column_1

is legal syntax because a space separates the <token> SELECT from the <token> column_1.

SELECT*

is legal syntax because, although no space separates the <token> SELECT from the <token> *, the asterisk is identified as a separate <token> because it is an <SQL special character>.

A <token> may not include any <separator>s unless it is a <character string literal>, a <bit string literal>, a <hex string literal>, a <timestamp literal>, an <interval literal>, or a <delimited identifier>.

<separator>

[Obscure Rule] applies to this entire section.

Your SQL parser must know where one <token> ends and another begins. To do so, it recognizes white space, a newline character, a simple comment, and a bracketed comment as <separator>s.

White space is usually just one or more spaces, but it can also consist of any consecutive sequence of these Unicode characters:break

U+0009 Horizontal Tab

U+000A Line Feed

U+000B Vertical Tabulation

U+000C Form Feed

U+000D Carriage Return

U+0020 Space

U+00A0 No-Break Space

U+2000 En Quad

U+2001 Em Quad

U+2002 En Space

U+2003 Em Space

U+2004 Three-Per-Em Space U+2005 Four-Per-Em Space U+2006 Six-Per-Em Space U+2007 Figure Space U+2008 Punctuation Space U+2009 Thin Space

U+200A Hair Space

U+200B Zero Width Space

Page 37

U+200C Zero Width Non-Joiner U+200D Zero Width Joiner U+200E Left-To-Right Mark U+200F Right-To-Left Mark U+3000 Ideographic Space U+2028 Line Separator U+2029 Paragraph Separator

U+FEFF Zero Width No-Break Space

[NON-PORTABLE] A newline character marks the end of a line. It is non-standard because the SQL Standard requires implementors to define which white space character(s) will be recognized as end-of-line indicators by their parsers. [OCELOT Implementation] The OCELOT DBMS that comes with this book recognizes carriage returns and line feeds as new-line characters.

A simple comment begins with two or more consecutive dashes, contains any number of

characters (including spaces and more dashes), and ends with a newline character. For example, these two SQL statements are both followed by a simple comment.

SELECT a+5 FROM t; -- this is a simple comment SELECT a+5 FROM t; --- this is a simple comment too

A bracketed comment is a C-style comment. It begins with /*, ends with */, and contains any number of characters including zero or more <separator>s. For example, this SQL statement is followed by a bracketed comment.

SELECT a+5 FROM t; /* this is a bracketed comment that contains a carriage return */

If you want to restrict your code to Core SQL, don't use bracketed comments.

<literal>

A <literal> is a <token> that represents a non-null data value. SQL values are normally

atomic — they cannot be subdivided — and are either non-null values or the null value. The null value isn't represented by a <literal>. Instead, the <keyword> NULL is used whenever it's

necessary to indicate that the null value is represented.

[NON-PORTABLE] The logical representation of the null value is non-standard because the SQL Standard requires implementors to define that character used to display the null value.

[OCELOT Implementation] The OCELOT DBMS that comes with this book displays a question mark to represent the null value.

In SQL, a <literal> is either a signed <numeric literal> (for example, +52.6), an unsigned

<numeric literal> (for example, 15), or a general literal. An unsigned literal is thus either an unsigned <numeric literal> or a general literal. A general literal is one of the following:

• A <bit string literal>, for example, B'1011'

• A <hex string literal>, for example, X'4A'

• A <binary string literal>, for example, X'44AF'break

Page 38

• A <character string literal>, for example, 'hello'

• A <national character string literal>, for example, N'hello'

• A <date literal>, for example, DATE '1997-07-15'

• A <time literal>, for example,

TIME '19:30:20' TIME '19:30:20.05' TIME '19:30:20+03:00'

• A <timestamp literal>, for example,

TIMESTAMP '1997-07-15 19:30:20' TIMESTAMP '1997-07-15 19:30:20.05'

TIMESTAMP '1997-07-15 19:30:20.05-10:30'

• A <year-month literal>, for example,

INTERVAL '20' YEAR INTERVAL '10' MONTH

INTERVAL '20-10' YEAR TO MONTH

• A <day-time literal>, for example,

INTERVAL -'20' DAY INTERVAL '-10' HOUR INTERVAL '15' MINUTE INTERVAL '10' SECOND

INTERVAL '20 10:15:10' DAY TO SECOND

• A <boolean literal>, either TRUE, FALSE, or UNKNOWN.

<keyword>

A <keyword> is a word that has a special meaning for the SQL parser. There are two types of SQL <keyword>s: reserved <keyword>s and non-reserved <keyword>s. Reserved <keyword>s may not be used as <regular identifier>s. Non-reserved <keyword>s are not so restricted, but it's probably not a good idea to use them as <regular identifier>s anyway.break

Page 39

A <keyword> is case insensitive because all its characters are part of the set of <SQL language character>s. That is, uppercase and lowercase letters within a <keyword> are interchangeable; for example, these three <keyword>s are exactly alike to a SQL parser:break

SELECT select Select

The Set of Reserved <keyword>s

ABSOLUTE CASCADED CURRENT_ROLE DO

ACTION CASE CURRENT_TIME DOMAIN

ADD CAST CURRENT_TIMESTAMP DOUBLE

ADMIN CATALOG CURRENT_USER DROP

AFTER CHAR CURSOR DYNAMIC

AGGREGATE CHARACTER CYCLE EACH

ALIAS CHECK DATA ELSE

ALL CLASS DATALINK ELSEIF

ALLOCATE CLOB DATE END

ALTER CLOSE DAY END-EXEC

AND COLLATE DEALLOCATE EQUALS

ANY COLLATION DEC ESCAPE

ARE COLUMN DECIMAL EVERY

ARRAY COMMIT DECLARE EXCEPT

AS COMPLETION DEFAULT EXCEPTION

ASC CONDITION DEFERRABLE EXEC

ASSERTION CONNECT DEFERRED EXECUTE

AT CONNECTION DELETE EXIT

AUTHORIZATION CONSTRAINT DEPTH EXPAND

BEFORE COSTRAINTS DEREF EXPANDING

BEGIN CONSTRUCTOR DESC EXTERNAL

BINARY CONTAINS DESCRIBE FALSE

BIT CONTINUE DESCRIPTOR FETCH

BLOB CORRESPONDING DESTROY FIRST

BOOLEAN CREATE DESTRUCTOR FLOAT

BOTH CROSS DETERMINISTIC FOR

BREADTH CUBE DICTIONARY FOREIGN

BY CURRENT DIAGNOSTICS FOUND

CALL CURRENT_DATE DISCONNECT FROM

CASCADE CURRENT_PATH DISTINCT FREE

Page 40

FULL LANGUAGE OF REF

FUNCTION LARGE OFF REFERENCES

GENERAL LAST OLD REFERENCING

GET LATERAL ON RELATIVE

GLOBAL LEADING ONLY REPEAT

GO LEAVE OPEN RESIGNAL

GOTO LEFT OPERATION RESTRICT

GRANT LESS OPTION RESULT

GROUP LEVEL OR RETURN

GROUPING LIKE ORDER RETURNS

HANDLER LIMIT ORDINALITY REVOKE

HAVING LOCAL OUT RIGHT

HASH LOCALTIME OUTER ROLE

HOST LOCALTIMESTAMP OUTPUT ROLLBACK

HOUR LOCATOR PAD ROLLUP

IDENTITY LOOP PARAMETER ROUTINE

IF MATCH PARAMETERS ROW

IGNORE MEETS PARTIAL ROWS

IMMEDIATE MINUTE PATH SAVEPOINT

IN MODIFIES PERIOD SCHEMA

INDICATOR MODIFY POSTFIX SCROLL

INITIALIZE MODULE PRECEDES SEARCH

INITIALLY MONTH PRECISION SECOND

INNER NAMES PREFIX SECTION

INOUT NATIONAL PREORDER SELECT

INPUT NATURAL PREPARE SEQUENCE

INSERT NCHAR PRESERVE SESSION

INT NCLOB PRIMARY SESSION_USER

INTEGER NEW PRIOR SET

INTERSECT NEXT PRIVILEGES SETS

INTERVAL NO PROCEDURE SIGNAL

INTO NONE PUBLIC SIZE

IS NORMALIZE READ SMALLINT

ISOLATION NOT READS SOME

ITERATE NULL REAL SPACE

JOIN NUMERIC RECURSIVE SPECIFIC

KEY OBJECT REDO SPECIFICTYPE

Page 41

SQL THAN UNDER VARIABLE

SQLEXCEPTION THEN UNDO VARYING

SQLSTATE TIME UNION VIEW

SQLWARNING TIMESTAMP UNIQUE WHEN

START TIMEZONE_HOUR UNKNOWN WHENEVER

STATE TIMEZONE_MINUTE UNTIL WHERE

STATIC TO UPDATE WHILE

STRUCURE TRAILING USAGE WITH

SUCCEEDS TRANSACTION USER WITHOUT

SYSTEM_USER TRANSLATION USING WORK

TABLE TREAT VALUE WRITE

TEMPORARY TRIGGER VALUES YEAR

TERMINATE TRUE VARCHAR ZONE

Note: SQL-92 and SQL3 both added a considerable number of words to the set of SQL reserved <keyword>s. The Standard acknowledges this and — as an aid to users — suggests that you include either a digit or an underline character in your <regular identifier>s and avoid names that begin with CURRENT_, SESSION_, SYSTEM_, or TIMEZONE_ and those that end with _LENGTH to avoid conflicts with reserved <keyword>s added in future revisions.

The Set of Non-Reserved <keyword>s

ABS CHARACTER_LENGTH CONDITION_NUMBER

ADA CHARACTER_SET_CATALOG CONNECTION_NAME ASENSITIVE CHARACTER_SET_NAME CONSTRAINT_CATALOG ASSIGNMENT CHARACTER_SET_SCHEMA CONSTRAINT_NAME

ASYMMETRIC CHECKED CONSTRAINT_SCHEMA

ATOMIC CLASS_ORIGIN CONTAINS

AVG COALESCE CONTROL

BETWEEN COBOL CONVERT

BIT_LENGTH COLLATION_CATALOG COUNT

BITVAR COLLATION_NAME CURSOR_NAME

BLOCKED COLLATION_SCHEMA DATETIME_INTERVAL_CODE

C COLUMN_NAME DATETIME_INTERVAL_PRECISION

CARDINALITY COMMAND_FUNCTION DB

CATALOG_NAME COMMAND_FUNCTION_CODE DISPATCH

CHAIN COMMITTED DLCOMMENT

CHAR_LENGTH CONCATENATE DLFILESIZE

Page 42

DLFILESIZEEXACT NULLABLE SERVER_NAME

DLLINKTYPE NUMBER SIMPLE

DLURLCOMPLETE NULLIF SOURCE

DLURLPATH OCTET_LENGTH SPECIFIC_NAME

DLURLPATHONLY OPTIONS SIMILAR

DLURLSCHEMA OVERLAPS STRUCTURE

DLURLSERVER OVERLAY SUBLIST

DLVALUE OVERRIDING SUBSTRING

DYNAMIC_FUNCTION PASCAL SUM

DYNAMIC_FUNCTION_CODE PARAMETER_MODE STYLE

EXISTING PARAMETER_ORDINAL_POSITION SUBCLASS_ORIGIN

EXISTS PARAMETER_SPECIFIC_CATALOG SYMMETRIC

EXTRACT PARAMETER_SPECIFIC_NAME SYSTEM

FILE PARAMETER_SPECIFIC_SCHEMA TABLE_NAME

FINAL PERMISSION TRANSACTIONS_COMMITTED

FORTRAN PLI TRANSACTIONS_ROLLED_BACK

GENERATED POSITION TRANSACTION_ACTIVE

HOLD RECOVERY TRANSFORM

INFIX REPEATABLE TRANSLATE

INSENSITIVE RESTORE TRIGGER_CATALOG

INSTANTIABLE RETURNED_LENGTH TRIGGER_SCHEMA

INTEGRITY RETURNED_OCTET_LENGTH TRIGGER_NAME

KEY_MEMBER RETURNED_SQLSTATE TRIM

KEY_TYPE ROUTINE_CATALOG TYPE

LENGTH ROUTINE_NAME UNCOMMITTED

LINK ROUTINE_SCHEMA UNLINK

LOWER ROW_COUNT UNNAMED

MAX ROW_TYPE_CATALOG UPPER

MIN ROW_TYPE_SCHEMA USER_DEFINED_TYPE_CATALOG

MESSAGE_LENGTH ROW_TYPE_NAME USER_DEFINED_TYPE_NAME

MESSAGE_OCTET_LENGTH SCALE USER_DEFINED_TYPE_SCHEMA

MESSAGE_TEXT SCHEMA_NAME YES

METHOD SELECTIVE

MOD SELF

MORE SENSITIVE

MUMPS SERIALIZABLE

NAME

The SQL Standard allows implementations to define more reserved words for their own DBMSs.

Here are some words that are reserved in some dialect of one of the major vendorscontinue

Page 43

(e.g., Oracle, Sybase, Microsoft). You may be able to use these words as <regular identifier>s, but if you do so, you will lose portability.

ABORT DICTIONARY NUMBER ROWNUN

ACCEPT DIGITS NUMBER_BASE ROWTYPE

ANALYZE DISPLACEMENT OFF RUN

ARCHIVELOG DISPOSE OID SEPARATE

ARRAY ELEMENT OLD_TABLE SEQUENCE

ASSIGN ENTRY OPERATOR SQLCA

ASYNCH EXCEPTION_INIT OPERATORS SQLCODE

ATTRIBUTES FACTOR OTHERS SQLERRM

AUDIT FORM PACKAGE SQLWARNING

BACKUP FREELISTS PARTITION STATEMENT

BINARY_INTEGER GENERIC PCTFREE STDDEV

BODY IDENTIFIED PENDANT SUBTYPE

CACHE IGNORE POSITIVE SYMBOL

CHAR_BASE INCLUDE PRAGMA TABAUTH

CLUSTER INDEX PREORDERED TABLES

CLUSTERS INDEXES PRIVATE TASK

COLAUTH INFILE PROTECTED TERM

COLUMNS INSTEAD RAISE TEST

COMPRESS INSTANCE RANGE THERE

CONSTANT LIMITED RAW TUPLE

CRASH LIST RECORD USE

CURVAL MAXEEXTENTS RELEASE VARCHAR2

DATA_BASE MINUS REM VARIANCE

DATABASE MLSLABEL RENAME VIEWS

DBA MODE REPLACE VIRTUAL

DEBUGOFF NEW RESOURCE VISIBLE

DEBUGON NEW_TABEL REUSE WAIT

DEFINITION NEXTVAL REVERSE XOR

DELAY NOCOMPRESS ROWID

DELTA NONE ROWLABEL

<identifier>

An <identifier> (a <token> that names an SQL Object) is a character string, up to 128 characters long, from one Character set. Within a CREATE SCHEMA statement, an <identifier> that doesn't include an explicit <Schema name> names an Object that belongs to the Schema you're creating.

In any other SQL statement, an <identifier> that doesn't include an explicit <Schema name>

names an Object that belongs to the Schema named in the SCHEMA clause (or if there is no SCHEMA clause, in the AUTHORIZATION clause) of the MODULE statement that

definescontinue

Page 44

the Module you're running. SQL recognizes three types of <identifier>s: the <regular identifier>, the <SQL language identifier>, and the <delimited identifier>.

<regular identifier>

The required syntax for a <regular identifier> is:

<regular identifier> ::=

Object name

A <regular identifier> is a character string, up to 128 characters long, that consists only of letters, digits, and underscore characters. It must begin with a letter.

[Obscure Rule] We usually think of a ''letter" as one of the simple Latin letters, but in

fact — depending on the Character set being used — a "letter" can also be an accented character, a character in a non-Latin alphabet, or a syllable or ideograph; i.e., it can be any character with the Unicode alphabetic property or ideographic property. The "letter" that begins a <regular identifier> may not have the Unicode combining property; the letters following it may, with the proviso that these characters are not legal anywhere in a <regular identifier>:

U+06DD Arabic End of Ayah

U+06DE Arabic Start of Rub El Hizb U+20DD Combining Enclosing Circle U+20DE Combining Enclosing Square U+20DF Combining Enclosing Diamond

U+20E0 Combining Enclosing Circle Backslash

Depending on the Character set in use, you may also use these characters in a <regular identifier>, as long as they're not used as the <identifier>'s first character:break

U+00B7 Middle Dot

U+02D0 Modifier Letter Triangular Colon U+20D1 Modifier Letter Half Triangular Colon

U+0640 Arabic Tatweel

U+0E46 Thai Character Maiyamok

U+0EC6 Lao Ko La

U+3005 Ideographic Iteration Mark U+3031 to

U+3035 inclusive

Variations of Vertical Kana Repeat Mark

U+309B to U+309E inclusive

Variations of Combining Katakana-Hiragana Sound Mark and Hiragana Iteration Mark

U+30FC to U+30FE inclusive

Variations of Katakana-Hiragana Prolonged Sound Mark and Katakana Iteration Mark

U+FF70 Halfwidth Katakana-Hiragana Prolonged Sound Mark U+FF9E Halfwidth Katakana Voiced Sound Mark

Page 45

U+FF9F Halfwidth Katakana Semi-voiced Sound Mark U+200C Zero Width Non-Joiner

U+200D Zero Width Joiner

U+200E Left-To-Right Mark

U+200F Right-To-Left Mark

U+202A Left-To-Right Embedding U+202B Right-To-Left Embedding U+202C Pop Directional Formatting U+202D Left-To-Right Override U+202E Right-To-Left Override U+206A Inhibit Symmetric Swapping U+206B Activate Symmetric Swapping U+206C Inhibit Arabic Form Shaping

U+206C Inhibit Arabic Form Shaping U+206D Activate Arabic Form Shaping U+206E National Digit Shapes

U+206F Nominal Digit Shapes U+FEFF Zero-Width No-Break Space

U+203F Undertie

U+2040 Character Tie

U+FE33 Presentation Form for Vertical Low Line U+FE34 Presentation Form for Vertical Wavy Low Line

U+FE4D Dashed Low Line

U+FE4E Centreline Low Line

U+FE4F Wavy Low Line

U+FF3F Fullwidth Low Line

A <regular identifier> is case insensitive. That is, uppercase and lowercase letters within a

<regular identifier> are interchangeable; for example, these three <regular identifier>s are exactly alike to a SQL parser:

SAMS_TABLE sams_table Sams_Table

SQL doesn't allow a reserved <keyword> to be used as a <regular identifier>. When comparing a

<regular identifier> and a reserved <keyword> to check for equality, your DBMS will replace the lowercase letters in each with their uppercase equivalents and assume that both belong to the SQL_TEXT Character set. In fact, your DBMS will replace all lowercase letters in a <regular identifier> with their uppercase equivalents prior to any comparison and prior to storing the

<identifier> either in a Catalog's INFORMATION_SCHEMA or a diagnostics area.break

Page 46

Here are some examples of <regular identifier>s:

TABLE_1 a <regular identifier>

OCELOT_COMPUTER_SERVICES another <regular identifier>

DATE_ a <regular identifier> that looks like a reserved <keyword>

MÜLLER_DATEI a <regular identifier> that doesn't exclusively use simple Latin letters

If you want to restrict your code to Core SQL, make sure your <regular identifier>s are no more than 18 characters long.

<SQL language identifier>

The required syntax for a <SQL language identifier> is:

<SQL language identifier> ::=

Object name

An <SQL language identifier> is a <regular identifier> that consists only of simple Latin letters, digits, and underscore characters. It must begin with a simple Latin letter. Here are two examples of <SQL language identifier>s:

TABLE_1 BOB_SCHEMA

<delimited identifier>

The required syntax for a <delimited identifier> is:

<delimited identifier> ::=

"Object name"

A <delimited identifier> is a character string, up to 128 characters long, surrounded by a pair of double quote marks. (The delimiting double quotes aren't part of the <identifier>, so they're not

In document SQL-99 Complete Really (Page 63-83)