The SQL Standard has numerous rules for such basic issues as what makes a legal name and how to put together SQL syntax. The starting point for these rules is knowing what the basic scalar language elements are. The SQL basic scalar language elements are defined in the set of <SQL language character>s.
<SQL language character>
According to the SQL Standard, the syntactic element <SQL language character> defines "the terminal symbols of the SQL language and the elements of strings." In other words, you'll use
<SQL language character>s to write SQL syntax or <token>s. <SQL language character>s are case insensitive; that is, uppercase and lowercase simple Latin letters are interchangeable so that, to a SQL parser, these three words are exactly alike:
SELECT select Select
The set of <SQL language character>s contains:
• The uppercase simple Latin letters A to Z.
• The lowercase simple Latin letters a to z.
• The digits 0 to 9.
• The set of <SQL special character>s.
<SQL special character>
The set of <SQL special character>s is part of the set of <SQL language character>s and contains:break
The space character ( The left parenthesis ) The right parenthesis
" The double quote mark ' The single quote mark
% The percent sign
% The percent sign
& The ampersand
* The asterisk or multiplication sign / The solidus or division sign + The plus sign
- The minus sign or dash
, The comma
. The period
Page 35
: The colon
; The semicolon
< The less than operator
> The greater than operator
? The question mark [ The left bracket ] The right bracket _ The underscore
| The vertical bar
= The equals operator { The left brace } The right brace
^ The circumflex
<token>
A <token> is either a <literal>, a <keyword>, an <identifier>, or an <SQL special character> or symbol — that is, a <token> is a group of characters that is recognized as a single unit by a SQL parser. For example, there are a total of 7 <token>s (SELECT, a, +, 5, FROM, t, and ;) in the following SQL statement.
SELECT a+5 FROM t;
In SQL, <token>s are grouped into two types: <nondelimiter token>s and <delimiter token>s.
The difference between them lies in the fact that, while any <token> may be followed by a
<separator>, a <nondelimiter token> must be followed either by a <separator> or a <delimiter token>.
A <nondelimiter token> is an <unsigned numeric literal>, a <national character string literal>, a
<bit string literal>, a <hex string literal>, a <keyword>, or a <regular identifier>. A <delimiter token> is a <character string literal>, a <date literal>, a <time literal>, a <timestamp literal>, an
<interval literal>, a <delimited identifier>, an <SQL special character>, or one of these symbols:break
<> The not equals operator
>= The greater than or equals operator
<= The less than or equals operator
|| The concatenation operator
??( The left trigraph
??) The right trigraph -> The right arrow
=> The keyword parameter tag
Page 36
For example, the <keyword> <token> SELECT may be followed either by a <separator> (usually a space) or by an <SQL special character>. Thus, both of the following are examples of legal SQL syntax.
SELECT column_1
is legal syntax because a space separates the <token> SELECT from the <token> column_1.
SELECT*
is legal syntax because, although no space separates the <token> SELECT from the <token> *, the asterisk is identified as a separate <token> because it is an <SQL special character>.
A <token> may not include any <separator>s unless it is a <character string literal>, a <bit string literal>, a <hex string literal>, a <timestamp literal>, an <interval literal>, or a <delimited identifier>.
<separator>
[Obscure Rule] applies to this entire section.
Your SQL parser must know where one <token> ends and another begins. To do so, it recognizes white space, a newline character, a simple comment, and a bracketed comment as <separator>s.
White space is usually just one or more spaces, but it can also consist of any consecutive sequence of these Unicode characters:break
U+0009 Horizontal Tab
U+000A Line Feed
U+000B Vertical Tabulation
U+000C Form Feed
U+000D Carriage Return
U+0020 Space
U+00A0 No-Break Space
U+2000 En Quad
U+2001 Em Quad
U+2002 En Space
U+2002 En Space
U+2003 Em Space
U+2004 Three-Per-Em Space U+2005 Four-Per-Em Space U+2006 Six-Per-Em Space U+2007 Figure Space U+2008 Punctuation Space U+2009 Thin Space
U+200A Hair Space
U+200B Zero Width Space
Page 37
U+200C Zero Width Non-Joiner U+200D Zero Width Joiner U+200E Left-To-Right Mark U+200F Right-To-Left Mark U+3000 Ideographic Space U+2028 Line Separator U+2029 Paragraph Separator
U+FEFF Zero Width No-Break Space
[NON-PORTABLE] A newline character marks the end of a line. It is non-standard because the SQL Standard requires implementors to define which white space character(s) will be recognized as end-of-line indicators by their parsers. [OCELOT Implementation] The OCELOT DBMS that comes with this book recognizes carriage returns and line feeds as new-line characters.
A simple comment begins with two or more consecutive dashes, contains any number of
characters (including spaces and more dashes), and ends with a newline character. For example, these two SQL statements are both followed by a simple comment.
SELECT a+5 FROM t; -- this is a simple comment SELECT a+5 FROM t; --- this is a simple comment too
A bracketed comment is a C-style comment. It begins with /*, ends with */, and contains any number of characters including zero or more <separator>s. For example, this SQL statement is followed by a bracketed comment.
SELECT a+5 FROM t; /* this is a bracketed comment that contains a carriage return */
If you want to restrict your code to Core SQL, don't use bracketed comments.
<literal>
A <literal> is a <token> that represents a non-null data value. SQL values are normally
atomic — they cannot be subdivided — and are either non-null values or the null value. The null value isn't represented by a <literal>. Instead, the <keyword> NULL is used whenever it's
necessary to indicate that the null value is represented.
[NON-PORTABLE] The logical representation of the null value is non-standard because the SQL Standard requires implementors to define that character used to display the null value.
[OCELOT Implementation] The OCELOT DBMS that comes with this book displays a question mark to represent the null value.
In SQL, a <literal> is either a signed <numeric literal> (for example, +52.6), an unsigned
<numeric literal> (for example, 15), or a general literal. An unsigned literal is thus either an unsigned <numeric literal> or a general literal. A general literal is one of the following:
• A <bit string literal>, for example, B'1011'
• A <hex string literal>, for example, X'4A'
• A <binary string literal>, for example, X'44AF'break
Page 38
• A <character string literal>, for example, 'hello'
• A <national character string literal>, for example, N'hello'
• A <date literal>, for example, DATE '1997-07-15'
• A <time literal>, for example,
TIME '19:30:20' TIME '19:30:20.05' TIME '19:30:20+03:00'
• A <timestamp literal>, for example,
TIMESTAMP '1997-07-15 19:30:20' TIMESTAMP '1997-07-15 19:30:20.05'
TIMESTAMP '1997-07-15 19:30:20.05-10:30'
• A <year-month literal>, for example,
INTERVAL '20' YEAR INTERVAL '10' MONTH
INTERVAL '20-10' YEAR TO MONTH
• A <day-time literal>, for example,
INTERVAL -'20' DAY INTERVAL '-10' HOUR INTERVAL '15' MINUTE INTERVAL '10' SECOND
INTERVAL '20 10:15:10' DAY TO SECOND
• A <boolean literal>, either TRUE, FALSE, or UNKNOWN.
<keyword>
A <keyword> is a word that has a special meaning for the SQL parser. There are two types of SQL <keyword>s: reserved <keyword>s and non-reserved <keyword>s. Reserved <keyword>s may not be used as <regular identifier>s. Non-reserved <keyword>s are not so restricted, but it's probably not a good idea to use them as <regular identifier>s anyway.break
Page 39
A <keyword> is case insensitive because all its characters are part of the set of <SQL language character>s. That is, uppercase and lowercase letters within a <keyword> are interchangeable; for example, these three <keyword>s are exactly alike to a SQL parser:break
SELECT select Select
The Set of Reserved <keyword>s
ABSOLUTE CASCADED CURRENT_ROLE DO
ACTION CASE CURRENT_TIME DOMAIN
ADD CAST CURRENT_TIMESTAMP DOUBLE
ADD CAST CURRENT_TIMESTAMP DOUBLE
ADMIN CATALOG CURRENT_USER DROP
AFTER CHAR CURSOR DYNAMIC
AGGREGATE CHARACTER CYCLE EACH
ALIAS CHECK DATA ELSE
ALL CLASS DATALINK ELSEIF
ALLOCATE CLOB DATE END
ALTER CLOSE DAY END-EXEC
AND COLLATE DEALLOCATE EQUALS
ANY COLLATION DEC ESCAPE
ARE COLUMN DECIMAL EVERY
ARRAY COMMIT DECLARE EXCEPT
AS COMPLETION DEFAULT EXCEPTION
ASC CONDITION DEFERRABLE EXEC
ASSERTION CONNECT DEFERRED EXECUTE
AT CONNECTION DELETE EXIT
AUTHORIZATION CONSTRAINT DEPTH EXPAND
BEFORE COSTRAINTS DEREF EXPANDING
BEGIN CONSTRUCTOR DESC EXTERNAL
BINARY CONTAINS DESCRIBE FALSE
BIT CONTINUE DESCRIPTOR FETCH
BLOB CORRESPONDING DESTROY FIRST
BOOLEAN CREATE DESTRUCTOR FLOAT
BOTH CROSS DETERMINISTIC FOR
BREADTH CUBE DICTIONARY FOREIGN
BY CURRENT DIAGNOSTICS FOUND
CALL CURRENT_DATE DISCONNECT FROM
CASCADE CURRENT_PATH DISTINCT FREE
Page 40
FULL LANGUAGE OF REF
FUNCTION LARGE OFF REFERENCES
GENERAL LAST OLD REFERENCING
GET LATERAL ON RELATIVE
GLOBAL LEADING ONLY REPEAT
GO LEAVE OPEN RESIGNAL
GOTO LEFT OPERATION RESTRICT
GRANT LESS OPTION RESULT
GROUP LEVEL OR RETURN
GROUPING LIKE ORDER RETURNS
HANDLER LIMIT ORDINALITY REVOKE
HAVING LOCAL OUT RIGHT
HASH LOCALTIME OUTER ROLE
HOST LOCALTIMESTAMP OUTPUT ROLLBACK
HOUR LOCATOR PAD ROLLUP
IDENTITY LOOP PARAMETER ROUTINE
IF MATCH PARAMETERS ROW
IGNORE MEETS PARTIAL ROWS
IMMEDIATE MINUTE PATH SAVEPOINT
IN MODIFIES PERIOD SCHEMA
INDICATOR MODIFY POSTFIX SCROLL
INITIALIZE MODULE PRECEDES SEARCH
INITIALLY MONTH PRECISION SECOND
INNER NAMES PREFIX SECTION
INOUT NATIONAL PREORDER SELECT
INOUT NATIONAL PREORDER SELECT
INPUT NATURAL PREPARE SEQUENCE
INSERT NCHAR PRESERVE SESSION
INT NCLOB PRIMARY SESSION_USER
INTEGER NEW PRIOR SET
INTERSECT NEXT PRIVILEGES SETS
INTERVAL NO PROCEDURE SIGNAL
INTO NONE PUBLIC SIZE
IS NORMALIZE READ SMALLINT
ISOLATION NOT READS SOME
ITERATE NULL REAL SPACE
JOIN NUMERIC RECURSIVE SPECIFIC
KEY OBJECT REDO SPECIFICTYPE
Page 41
SQL THAN UNDER VARIABLE
SQLEXCEPTION THEN UNDO VARYING
SQLSTATE TIME UNION VIEW
SQLWARNING TIMESTAMP UNIQUE WHEN
START TIMEZONE_HOUR UNKNOWN WHENEVER
STATE TIMEZONE_MINUTE UNTIL WHERE
STATIC TO UPDATE WHILE
STRUCURE TRAILING USAGE WITH
SUCCEEDS TRANSACTION USER WITHOUT
SYSTEM_USER TRANSLATION USING WORK
TABLE TREAT VALUE WRITE
TEMPORARY TRIGGER VALUES YEAR
TERMINATE TRUE VARCHAR ZONE
TERMINATE TRUE VARCHAR ZONE
Note: SQL-92 and SQL3 both added a considerable number of words to the set of SQL reserved <keyword>s. The Standard acknowledges this and — as an aid to users — suggests that you include either a digit or an underline character in your <regular identifier>s and avoid names that begin with CURRENT_, SESSION_, SYSTEM_, or TIMEZONE_ and those that end with _LENGTH to avoid conflicts with reserved <keyword>s added in future revisions.
The Set of Non-Reserved <keyword>s
ABS CHARACTER_LENGTH CONDITION_NUMBER
ADA CHARACTER_SET_CATALOG CONNECTION_NAME ASENSITIVE CHARACTER_SET_NAME CONSTRAINT_CATALOG ASSIGNMENT CHARACTER_SET_SCHEMA CONSTRAINT_NAME
ASYMMETRIC CHECKED CONSTRAINT_SCHEMA
ATOMIC CLASS_ORIGIN CONTAINS
AVG COALESCE CONTROL
BETWEEN COBOL CONVERT
BIT_LENGTH COLLATION_CATALOG COUNT
BITVAR COLLATION_NAME CURSOR_NAME
BLOCKED COLLATION_SCHEMA DATETIME_INTERVAL_CODE
C COLUMN_NAME DATETIME_INTERVAL_PRECISION
CARDINALITY COMMAND_FUNCTION DB
CATALOG_NAME COMMAND_FUNCTION_CODE DISPATCH
CHAIN COMMITTED DLCOMMENT
CHAR_LENGTH CONCATENATE DLFILESIZE
Page 42
DLFILESIZEEXACT NULLABLE SERVER_NAME
DLLINKTYPE NUMBER SIMPLE
DLLINKTYPE NUMBER SIMPLE
DLURLCOMPLETE NULLIF SOURCE
DLURLPATH OCTET_LENGTH SPECIFIC_NAME
DLURLPATHONLY OPTIONS SIMILAR
DLURLSCHEMA OVERLAPS STRUCTURE
DLURLSERVER OVERLAY SUBLIST
DLVALUE OVERRIDING SUBSTRING
DYNAMIC_FUNCTION PASCAL SUM
DYNAMIC_FUNCTION_CODE PARAMETER_MODE STYLE
EXISTING PARAMETER_ORDINAL_POSITION SUBCLASS_ORIGIN
EXISTS PARAMETER_SPECIFIC_CATALOG SYMMETRIC
EXTRACT PARAMETER_SPECIFIC_NAME SYSTEM
FILE PARAMETER_SPECIFIC_SCHEMA TABLE_NAME
FINAL PERMISSION TRANSACTIONS_COMMITTED
FORTRAN PLI TRANSACTIONS_ROLLED_BACK
GENERATED POSITION TRANSACTION_ACTIVE
HOLD RECOVERY TRANSFORM
INFIX REPEATABLE TRANSLATE
INSENSITIVE RESTORE TRIGGER_CATALOG
INSTANTIABLE RETURNED_LENGTH TRIGGER_SCHEMA
INTEGRITY RETURNED_OCTET_LENGTH TRIGGER_NAME
KEY_MEMBER RETURNED_SQLSTATE TRIM
KEY_TYPE ROUTINE_CATALOG TYPE
LENGTH ROUTINE_NAME UNCOMMITTED
LINK ROUTINE_SCHEMA UNLINK
LOWER ROW_COUNT UNNAMED
MAX ROW_TYPE_CATALOG UPPER
MIN ROW_TYPE_SCHEMA USER_DEFINED_TYPE_CATALOG
MESSAGE_LENGTH ROW_TYPE_NAME USER_DEFINED_TYPE_NAME
MESSAGE_LENGTH ROW_TYPE_NAME USER_DEFINED_TYPE_NAME
MESSAGE_OCTET_LENGTH SCALE USER_DEFINED_TYPE_SCHEMA
MESSAGE_TEXT SCHEMA_NAME YES
METHOD SELECTIVE
MOD SELF
MORE SENSITIVE
MUMPS SERIALIZABLE
NAME
The SQL Standard allows implementations to define more reserved words for their own DBMSs.
Here are some words that are reserved in some dialect of one of the major vendorscontinue
Page 43
(e.g., Oracle, Sybase, Microsoft). You may be able to use these words as <regular identifier>s, but if you do so, you will lose portability.
ABORT DICTIONARY NUMBER ROWNUN
ACCEPT DIGITS NUMBER_BASE ROWTYPE
ANALYZE DISPLACEMENT OFF RUN
ARCHIVELOG DISPOSE OID SEPARATE
ARRAY ELEMENT OLD_TABLE SEQUENCE
ASSIGN ENTRY OPERATOR SQLCA
ASYNCH EXCEPTION_INIT OPERATORS SQLCODE
ATTRIBUTES FACTOR OTHERS SQLERRM
AUDIT FORM PACKAGE SQLWARNING
BACKUP FREELISTS PARTITION STATEMENT
BINARY_INTEGER GENERIC PCTFREE STDDEV
BODY IDENTIFIED PENDANT SUBTYPE
CACHE IGNORE POSITIVE SYMBOL
CHAR_BASE INCLUDE PRAGMA TABAUTH
CHAR_BASE INCLUDE PRAGMA TABAUTH
CLUSTER INDEX PREORDERED TABLES
CLUSTERS INDEXES PRIVATE TASK
COLAUTH INFILE PROTECTED TERM
COLUMNS INSTEAD RAISE TEST
COMPRESS INSTANCE RANGE THERE
CONSTANT LIMITED RAW TUPLE
CRASH LIST RECORD USE
CURVAL MAXEEXTENTS RELEASE VARCHAR2
DATA_BASE MINUS REM VARIANCE
DATABASE MLSLABEL RENAME VIEWS
DBA MODE REPLACE VIRTUAL
DEBUGOFF NEW RESOURCE VISIBLE
DEBUGON NEW_TABEL REUSE WAIT
DEFINITION NEXTVAL REVERSE XOR
DELAY NOCOMPRESS ROWID
DELTA NONE ROWLABEL
<identifier>
An <identifier> (a <token> that names an SQL Object) is a character string, up to 128 characters long, from one Character set. Within a CREATE SCHEMA statement, an <identifier> that doesn't include an explicit <Schema name> names an Object that belongs to the Schema you're creating.
In any other SQL statement, an <identifier> that doesn't include an explicit <Schema name>
names an Object that belongs to the Schema named in the SCHEMA clause (or if there is no SCHEMA clause, in the AUTHORIZATION clause) of the MODULE statement that
definescontinue
Page 44
the Module you're running. SQL recognizes three types of <identifier>s: the <regular identifier>, the <SQL language identifier>, and the <delimited identifier>.
<regular identifier>
The required syntax for a <regular identifier> is:
<regular identifier> ::=
Object name
A <regular identifier> is a character string, up to 128 characters long, that consists only of letters, digits, and underscore characters. It must begin with a letter.
[Obscure Rule] We usually think of a ''letter" as one of the simple Latin letters, but in
fact — depending on the Character set being used — a "letter" can also be an accented character, a character in a non-Latin alphabet, or a syllable or ideograph; i.e., it can be any character with the Unicode alphabetic property or ideographic property. The "letter" that begins a <regular identifier> may not have the Unicode combining property; the letters following it may, with the proviso that these characters are not legal anywhere in a <regular identifier>:
U+06DD Arabic End of Ayah
U+06DE Arabic Start of Rub El Hizb U+20DD Combining Enclosing Circle U+20DE Combining Enclosing Square U+20DF Combining Enclosing Diamond
U+20E0 Combining Enclosing Circle Backslash
Depending on the Character set in use, you may also use these characters in a <regular identifier>, as long as they're not used as the <identifier>'s first character:break
U+00B7 Middle Dot
U+02D0 Modifier Letter Triangular Colon U+20D1 Modifier Letter Half Triangular Colon
U+0640 Arabic Tatweel
U+0E46 Thai Character Maiyamok
U+0EC6 Lao Ko La
U+0EC6 Lao Ko La
U+3005 Ideographic Iteration Mark U+3031 to
U+3035 inclusive
Variations of Vertical Kana Repeat Mark
U+309B to U+309E inclusive
Variations of Combining Katakana-Hiragana Sound Mark and Hiragana Iteration Mark
U+30FC to U+30FE inclusive
Variations of Katakana-Hiragana Prolonged Sound Mark and Katakana Iteration Mark
U+FF70 Halfwidth Katakana-Hiragana Prolonged Sound Mark U+FF9E Halfwidth Katakana Voiced Sound Mark
Page 45
U+FF9F Halfwidth Katakana Semi-voiced Sound Mark U+200C Zero Width Non-Joiner
U+200D Zero Width Joiner
U+200E Left-To-Right Mark
U+200F Right-To-Left Mark
U+202A Left-To-Right Embedding U+202B Right-To-Left Embedding U+202C Pop Directional Formatting U+202D Left-To-Right Override U+202E Right-To-Left Override U+206A Inhibit Symmetric Swapping U+206B Activate Symmetric Swapping U+206C Inhibit Arabic Form Shaping
U+206C Inhibit Arabic Form Shaping U+206D Activate Arabic Form Shaping U+206E National Digit Shapes
U+206F Nominal Digit Shapes U+FEFF Zero-Width No-Break Space
U+203F Undertie
U+2040 Character Tie
U+FE33 Presentation Form for Vertical Low Line U+FE34 Presentation Form for Vertical Wavy Low Line
U+FE4D Dashed Low Line
U+FE4E Centreline Low Line
U+FE4F Wavy Low Line
U+FF3F Fullwidth Low Line
A <regular identifier> is case insensitive. That is, uppercase and lowercase letters within a
<regular identifier> are interchangeable; for example, these three <regular identifier>s are exactly alike to a SQL parser:
SAMS_TABLE sams_table Sams_Table
SQL doesn't allow a reserved <keyword> to be used as a <regular identifier>. When comparing a
<regular identifier> and a reserved <keyword> to check for equality, your DBMS will replace the lowercase letters in each with their uppercase equivalents and assume that both belong to the SQL_TEXT Character set. In fact, your DBMS will replace all lowercase letters in a <regular identifier> with their uppercase equivalents prior to any comparison and prior to storing the
<identifier> either in a Catalog's INFORMATION_SCHEMA or a diagnostics area.break
Page 46
Here are some examples of <regular identifier>s:
TABLE_1 a <regular identifier>
TABLE_1 a <regular identifier>
OCELOT_COMPUTER_SERVICES another <regular identifier>
DATE_ a <regular identifier> that looks like a reserved <keyword>
MÜLLER_DATEI a <regular identifier> that doesn't exclusively use simple Latin letters
If you want to restrict your code to Core SQL, make sure your <regular identifier>s are no more than 18 characters long.
<SQL language identifier>
The required syntax for a <SQL language identifier> is:
<SQL language identifier> ::=
Object name
An <SQL language identifier> is a <regular identifier> that consists only of simple Latin letters, digits, and underscore characters. It must begin with a simple Latin letter. Here are two examples of <SQL language identifier>s:
TABLE_1 BOB_SCHEMA
<delimited identifier>
The required syntax for a <delimited identifier> is:
<delimited identifier> ::=
"Object name"
A <delimited identifier> is a character string, up to 128 characters long, surrounded by a pair of double quote marks. (The delimiting double quotes aren't part of the <identifier>, so they're not
A <delimited identifier> is a character string, up to 128 characters long, surrounded by a pair of double quote marks. (The delimiting double quotes aren't part of the <identifier>, so they're not