SQL User’s Guide
Part number: DC38124-01-0902-01
Last modified: October 2004
Copyright © 1989–2004 Sybase, Inc. Portions copyright © 2001–2004 iAnywhere Solutions, Inc. All rights reserved.
No part of this publication may be reproduced, transmitted, or translated in any form or by any means, electronic, mechanical, manual, optical, or otherwise, without the prior written permission of iAnywhere Solutions, Inc. iAnywhere Solutions, Inc. is a subsidiary of Sybase, Inc.
Sybase, SYBASE (logo), AccelaTrade, ADA Workbench, Adaptable Windowing Environment, Adaptive Component Architecture, Adaptive Server, Adaptive Server Anywhere, Adaptive Server Enterprise, Adaptive Server Enterprise Monitor, Adaptive Server Enterprise Replication, Adaptive Server Everywhere, Adaptive Server IQ, Adaptive Warehouse, AnswerBase, Anywhere Studio, Application Manager, AppModeler, APT Workbench, APT-Build, APT-Edit, APT-Execute, APT-Library, APT-Translator, ASEP, AvantGo, AvantGo Application Alerts, AvantGo Mobile Delivery, AvantGo Mobile Document Viewer, AvantGo Mobile Inspection, AvantGo Mobile Marketing Channel, AvantGo Mobile Pharma, AvantGo Mobile Sales, AvantGo Pylon, AvantGo Pylon Application Server, AvantGo Pylon Conduit, AvantGo Pylon PIM Server, AvantGo Pylon Pro, Backup Server, BayCam, Bit-Wise, BizTracker, Certified PowerBuilder Developer, Certified SYBASE Professional, Certified SYBASE Professional Logo, ClearConnect, Client Services, Client-Library, CodeBank, Column Design, ComponentPack, Connection Manager, Convoy/DM, Copernicus, CSP, Data Pipeline, Data Workbench, DataArchitect, Database Analyzer, DataExpress, DataServer, DataWindow, DB-Library, dbQueue, Developers Workbench, Direct Connect Anywhere, DirectConnect, Distribution Director, Dynamic Mobility Model, Dynamo, e-ADK, E-Anywhere, e-Biz Integrator, EC Gateway, ECMAP, ECRTP, eFulfillment Accelerator, Electronic Case Management, Embedded SQL, EMS, Enterprise Application Studio, Enterprise Client/Server, Enterprise Connect, Enterprise Data Studio, Enterprise Manager, Enterprise Portal (logo), Enterprise SQL Server Manager, Enterprise Work Architecture, Enterprise Work Designer, Enterprise Work Modeler, eProcurement Accelerator, eremote, Everything Works Better When Everything Works Together, EWA, E-Whatever, Financial Fusion, Financial Fusion (and design), Financial Fusion Server, Formula One, Fusion Powered e-Finance, Fusion Powered Financial Destinations, Fusion Powered STP, Gateway Manager, GeoPoint, GlobalFIX, iAnywhere, iAnywhere Solutions, ImpactNow, Industry Warehouse Studio, InfoMaker, Information Anywhere, Information Everywhere, InformationConnect, InstaHelp, Intelligent Self-Care, InternetBuilder, iremote, iScript, Jaguar CTS, jConnect for JDBC,
KnowledgeBase, Logical Memory Manager, Mail Anywhere Studio, MainframeConnect, Maintenance Express, Manage Anywhere Studio, MAP, M-Business Channel, M-Business Network, M-Business Server, MDI Access Server, MDI Database Gateway, media.splash, Message Anywhere Server, MetaWorks, MethodSet, ML Query, MobiCATS, My AvantGo, My AvantGo Media Channel, My AvantGo Mobile Marketing, MySupport, Net-Gateway, Net-Library, New Era of Networks, Next Generation Learning, Next Generation Learning Studio, O DEVICE, OASiS, OASiS logo, ObjectConnect, ObjectCycle, OmniConnect, OmniSQL Access Module, OmniSQL Toolkit, Open Biz, Open Business Interchange, Open Client, Open Client/Server, Open Client/Server Interfaces, Open ClientConnect, Open Gateway, Open Server, Open ServerConnect, Open Solutions, Optima++, Orchestration Studio, Partnerships that Work, PB-Gen, PC APT Execute, PC DB-Net, PC Net Library, PhysicalArchitect, Pocket PowerBuilder, PocketBuilder, Power Through Knowledge, power.stop, Power++, PowerAMC, PowerBuilder, PowerBuilder Foundation Class Library, PowerDesigner, PowerDimensions, PowerDynamo, Powering the New Economy, PowerJ, PowerScript, PowerSite, PowerSocket, Powersoft, Powersoft Portfolio, Powersoft Professional, PowerStage, PowerStudio, PowerTips, PowerWare Desktop, PowerWare Enterprise, ProcessAnalyst, QAnywhere, Rapport, Relational Beans, RepConnector, Replication Agent, Replication Driver, Replication Server, Replication Server Manager, Replication Toolkit, Report Workbench, Report-Execute, Resource Manager, RW-DisplayLib, RW-Library, S.W.I.F.T. Message Format Libraries, SAFE, SAFE/PRO, SDF, Secure SQL Server, Secure SQL Toolset, Security Guardian, SKILS, smart.partners, smart.parts, smart.script, SQL Advantage, SQL Anywhere, SQL Anywhere Studio, SQL Code Checker, SQL Debug, SQL Edit, SQL Edit/TPU, SQL Everywhere, SQL Modeler, SQL Remote, SQL Server, SQL Server Manager, SQL Server SNMP SubAgent, SQL Server/CFT, SQL Server/DBM, SQL SMART, SQL Station, SQL Toolset, SQLJ, Stage III Engineering, Startup.Com, STEP, SupportNow, Sybase Central, Sybase Client/Server Interfaces, Sybase Development Framework, Sybase Financial Server, Sybase Gateways, Sybase Learning Connection, Sybase MPP, Sybase SQL Desktop, Sybase SQL Lifecycle, Sybase SQL Workgroup, Sybase Synergy Program, Sybase User Workbench, Sybase Virtual Server Architecture, SybaseWare, Syber Financial, SyberAssist, SybMD, SyBooks, System 10, System 11, System XI (logo), SystemTools, Tabular Data Stream, The Enterprise Client/Server Company, The Extensible Software Platform, The Future Is Wide Open, The Learning Connection, The Model For Client/Server Solutions, The Online Information Center, The Power of One, TotalFix, TradeForce, Transact-SQL, Translation Toolkit, Turning Imagination Into Reality, UltraLite, UltraLite.NET, UNIBOM, Unilib, Uninull, Unisep, Unistring, URK Runtime Kit for UniCode, Versacore, Viewer, VisualWriter, VQL, Warehouse Control Center, Warehouse Studio, Warehouse WORKS, WarehouseArchitect, Watcom, Watcom SQL, Watcom SQL Server, Web Deployment Kit, Web.PB, Web.SQL, WebSights, WebViewer, WorkGroup SQL Server, XA-Library, XA-Server, and XP Server are trademarks of Sybase, Inc. or its subsidiaries.
All other trademarks are property of their respective owners.
About This Manual ix
SQL Anywhere Studio documentation . . . . x
Documentation conventions . . . . xiii
The Adaptive Server Anywhere sample database . . . . xv
Finding out more and providing feedback . . . . xvi
I Designing and Creating Databases 1 1 Designing Your Database 3 Introduction . . . . 4
Database design concepts . . . . 5
The design process . . . . 12
Designing the database table properties . . . . 27
2 Working with Database Objects 29 Introduction . . . . 30
Working with databases . . . . 31
Working with tables . . . . 40
Working with views . . . . 54
Working with indexes . . . . 62
Working with temporary tables . . . . 76
Copying database objects in Sybase Central . . . . 77
3 Ensuring Data Integrity 79 Data integrity overview . . . . 80
Using column defaults . . . . 83
Using table and column constraints . . . . 88
Using domains . . . . 92
Enforcing entity and referential integrity . . . . 95
Integrity rules in the system tables . . . . 100
4 Using Transactions and Isolation Levels 101 Introduction to transactions . . . . 102
Isolation levels and consistency . . . . 106
Transaction blocking and deadlock . . . . 113
Choosing isolation levels . . . . 116
Isolation level tutorials . . . . 120
How locking works . . . . 135
Particular concurrency issues . . . . 149
Replication and concurrency . . . . 152
Summary . . . . 155
5 Monitoring and Improving Performance 157 Performance analysis tools . . . . 158
Top performance tips . . . . 165
Using the cache to improve performance . . . . 180
Using indexes to improve performance . . . . 186
Using keys to improve query performance . . . . 187
Sorting query results . . . . 189
Use of work tables in query processing . . . . 190
Monitoring database performance . . . . 192
Fragmentation . . . . 198
Profiling database procedures . . . . 202
II Querying and Modifying Data 211 6 Queries: Selecting Data from a Table 213 Query overview . . . . 214
The SELECT list: specifying columns . . . . 217
The FROM clause: specifying tables . . . . 224
The WHERE clause: specifying rows . . . . 225
7 Summarizing, Grouping, and Sorting Query Results 237 Summarizing query results using aggregate functions . . . . 238
The GROUP BY clause: organizing query results into groups 243 Understanding GROUP BY . . . . 244
The HAVING clause: selecting groups of data . . . . 248
The ORDER BY clause: sorting query results . . . . 250
Performing set operations on query results with UNION, IN- TERSECT, and EXCEPT . . . . 253
Standards and compatibility . . . . 260
8 Joins: Retrieving Data from Several Tables 263 Sample database schema . . . . 264
How joins work . . . . 265
Joins overview . . . . 266
Explicit join conditions (the ON phrase) . . . . 271
Cross joins . . . . 274
Inner and outer joins . . . . 276
Specialized joins . . . . 283
Natural joins . . . . 290
9 Common Table Expressions 307
About common table expressions . . . . 308
Typical applications of common table expressions . . . . 312
Recursive common table expressions . . . . 316
Parts explosion problems . . . . 319
Data type declarations in recursive common table expressions 322 Least distance problem . . . . 324
Using multiple recursive common table expressions . . . . . 327
10 Using OLAP 329 Understanding OLAP . . . . 330
Understanding subtotals . . . . 331
NULL values and subtotal rows . . . . 334
Using ROLLUP . . . . 335
Using CUBE . . . . 340
Using GROUPING SETS . . . . 343
Working with OLAP functions . . . . 346
11 Using Subqueries 355 Introduction to subqueries . . . . 356
Using subqueries in the WHERE clause . . . . 357
Subqueries in the HAVING clause . . . . 358
Subquery comparison test . . . . 360
Quantified comparison tests with ANY and ALL . . . . 361
Testing set membership with IN conditions . . . . 364
Existence test . . . . 366
Outer references . . . . 368
Subqueries and joins . . . . 369
Nested subqueries . . . . 371
How subqueries work . . . . 373
12 Adding, Changing, and Deleting Data 383 Data modification statements . . . . 384
Adding data using INSERT . . . . 385
Changing data using UPDATE . . . . 390
Changing data using INSERT . . . . 392
Deleting data using DELETE . . . . 393
13 Query Optimization and Execution 395 The role of the optimizer . . . . 396
How the optimizer works . . . . 397
Query execution algorithms . . . . 407
Physical data organization and access . . . . 423
Indexes . . . . 426
Semantic query transformations . . . . 435
Subquery and function caching . . . . 449
Reading access plans . . . . 451
III SQL Dialects and Compatibility 471 14 Transact-SQL Compatibility 473 An overview of Transact-SQL support . . . . 474
Adaptive Server architectures . . . . 477
Configuring databases for Transact-SQL compatibility . . . . 483
Writing compatible SQL statements . . . . 492
Transact-SQL procedure language overview . . . . 497
Automatic translation of stored procedures . . . . 500
Returning result sets from Transact-SQL procedures . . . . . 501
Variables in Transact-SQL procedures . . . . 502
Error handling in Transact-SQL procedures . . . . 503
15 Differences from Other SQL Dialects 507 Adaptive Server Anywhere SQL features . . . . 508
IV XML in the Database 511 16 Using XML in the Database 513 What is XML? . . . . 514
Storing XML documents in relational databases . . . . 515
Exporting relational data as XML . . . . 516
Importing XML documents as relational data . . . . 517
Obtaining query results as XML . . . . 524
Using SQL/XML to obtain query results as XML . . . . 544
V Remote Data and Bulk Operations 553 17 Importing and Exporting Data 555 Transferring data into and out of databases . . . . 556
Performance aspects of bulk operations . . . . 557
Data recovery issues for bulk operations . . . . 558
Importing data into databases . . . . 559
Exporting data from databases . . . . 569
Rebuilding databases . . . . 580
Using SQL command files . . . . 596
Adaptive Server Enterprise compatibility . . . . 599
18 Accessing Remote Data 601 Introduction . . . . 602
Basic concepts to access remote data . . . . 604
Working with remote servers . . . . 606
Working with external logins . . . . 612
Working with proxy tables . . . . 614
Joining remote tables . . . . 619
Joining tables from multiple local databases . . . . 621
Sending native statements to remote servers . . . . 622
Using remote procedure calls (RPCs) . . . . 623
Transaction management and remote data . . . . 626
Internal operations . . . . 628
Troubleshooting remote data access . . . . 632
19 Server Classes for Remote Data Access 635 Overview . . . . 636
JDBC-based server classes . . . . 637
ODBC-based server classes . . . . 640
VI Stored Procedures and Triggers 653 20 Using Procedures, Triggers, and Batches 655 Procedure and trigger overview . . . . 657
Benefits of procedures and triggers . . . . 658
Introduction to procedures . . . . 659
Introduction to user-defined functions . . . . 666
Introduction to triggers . . . . 670
Introduction to batches . . . . 677
Control statements . . . . 679
The structure of procedures and triggers . . . . 682
Returning results from procedures . . . . 686
Using cursors in procedures and triggers . . . . 692
Errors and warnings in procedures and triggers . . . . 695
Using the EXECUTE IMMEDIATE statement in procedures . 704 Transactions and savepoints in procedures and triggers . . . 706
Tips for writing procedures . . . . 707
Statements allowed in batches . . . . 709
Calling external libraries from procedures . . . . 710
21 Debugging Logic in the Database 719
Introduction to debugging in the database . . . . 720
Tutorial: Getting started with the debugger . . . . 722
Working with breakpoints . . . . 731
Working with variables . . . . 734
Working with connections . . . . 735
Index 737
Subject This book describes how to design and create databases; how to import, export, and modify data; how to retrieve data; and how to build stored procedures and triggers.
Audience This manual is for all users of Adaptive Server Anywhere.
Before you begin This manual assumes that you have an elementary familiarity with
database-management systems and Adaptive Server Anywhere in particular.
If you do not have such a familiarity, you should consider reading
Introducing SQL Anywhere Studio before reading this manual.
SQL Anywhere Studio documentation
This book is part of the SQL Anywhere documentation set. This section describes the books in the documentation set and how you can use them.
The SQL Anywhere Studio documentation
The SQL Anywhere Studio documentation is available in a variety of forms:
in an online form that combines all books in one large help file; as separate PDF files for each book; and as printed books that you can purchase. The documentation consists of the following books:
♦
Introducing SQL Anywhere StudioThis book provides an overview of the SQL Anywhere Studio database management and synchronization technologies. It includes tutorials to introduce you to each of the pieces that make up SQL Anywhere Studio.
♦
What’s New in SQL Anywhere StudioThis book is for users of previous versions of the software. It lists new features in this and previous releases of the product and describes upgrade procedures.
♦
Adaptive Server Anywhere Database Administration GuideThis book covers material related to running, managing, and configuring databases and database servers.
♦
Adaptive Server Anywhere SQL User’s GuideThis book describes how to design and create databases; how to import, export, and modify data; how to retrieve data; and how to build stored procedures and triggers.
♦
Adaptive Server Anywhere SQL Reference ManualThis book provides a complete reference for the SQL language used by Adaptive Server Anywhere. It also describes the Adaptive Server Anywhere system tables and procedures.
♦
Adaptive Server Anywhere Programming GuideThis book describes how to build and deploy database applications using the C, C++, and Java programming languages. Users of tools such as Visual Basic and
PowerBuilder can use the programming interfaces provided by those tools. It also describes the Adaptive Server Anywhere ADO.NET data provider.
♦
Adaptive Server Anywhere SNMP Extension Agent User’s GuideThis book describes how to configure the Adaptive Server Anywhere SNMP Extension Agent for use with SNMP management applications to manage Adaptive Server Anywhere databases.
♦
Adaptive Server Anywhere Error MessagesThis book provides a
complete listing of Adaptive Server Anywhere error messages together
with diagnostic information.
databases. Adaptive Server Anywhere 7.0 was awarded a TCSEC (Trusted Computer System Evaluation Criteria) C2 security rating from the U.S. Government. This book may be of interest to those who wish to run the current version of Adaptive Server Anywhere in a manner equivalent to the C2-certified environment.
♦
MobiLink Administration GuideThis book describes how to use the MobiLink data synchronization system for mobile computing, which enables sharing of data between a single Oracle, Sybase, Microsoft or IBM database and many Adaptive Server Anywhere or UltraLite databases.
♦
MobiLink ClientsThis book describes how to set up and synchronize Adaptive Server Anywhere and UltraLite remote databases.
♦
MobiLink Server-Initiated Synchronization User’s GuideThis book describes MobiLink server-initiated synchronization, a feature of MobiLink that allows you to initiate synchronization from the consolidated database.
♦
MobiLink TutorialsThis book provides several tutorials that walk you through how to set up and run MobiLink applications.
♦
QAnywhere User’s GuideThis manual describes MobiLink QAnywhere, a messaging platform that enables the development and deployment of messaging applications for mobile and wireless clients, as well as traditional desktop and laptop clients.
♦
iAnywhere Solutions ODBC DriversThis book describes how to set up ODBC drivers to access consolidated databases other than Adaptive Server Anywhere from the MobiLink synchronization server and from Adaptive Server Anywhere remote data access.
♦
SQL Remote User’s GuideThis book describes all aspects of the SQL Remote data replication system for mobile computing, which enables sharing of data between a single Adaptive Server Anywhere or Adaptive Server Enterprise database and many Adaptive Server
Anywhere databases using an indirect link such as e-mail or file transfer.
♦
SQL Anywhere Studio HelpThis book includes the context-sensitive help for Sybase Central, Interactive SQL, and other graphical tools. It is not included in the printed documentation set.
♦
UltraLite Database User’s GuideThis book is intended for all
UltraLite developers. It introduces the UltraLite database system and
provides information common to all UltraLite programming interfaces.
♦
UltraLite Interface GuidesA separate book is provided for each UltraLite programming interface. Some of these interfaces are provided as UltraLite components for rapid application development, and others are provided as static interfaces for C, C++, and Java development.
In addition to this documentation set, PowerDesigner and InfoMaker include their own online documentation.
Documentation formats SQL Anywhere Studio provides documentation in the following formats:
♦
Online documentationThe online documentation contains the complete SQL Anywhere Studio documentation, including both the books and the context-sensitive help for SQL Anywhere tools. The online documentation is updated with each maintenance release of the product, and is the most complete and up-to-date source of documentation.
To access the online documentation on Windows operating systems, choose Start
➤Programs
➤SQL Anywhere 9
➤Online Books. You can navigate the online documentation using the HTML Help table of contents, index, and search facility in the left pane, as well as using the links and menus in the right pane.
To access the online documentation on UNIX operating systems, see the HTML documentation under your SQL Anywhere installation.
♦
PDF booksThe SQL Anywhere books are provided as a set of PDF files, viewable with Adobe Acrobat Reader.
The PDF books are accessible from the online books, or from the Windows Start menu.
♦
Printed booksThe complete set of books is available from Sybase sales or from eShop, the Sybase online store at
http://eshop.sybase.com/eshop/documentation .
This section lists the typographic and graphical conventions used in this documentation.
Syntax conventions The following conventions are used in the SQL syntax descriptions:
♦
KeywordsAll SQL keywords appear in upper case, like the words ALTER TABLE in the following example:
ALTER TABLE [ owner.]table-name
♦
PlaceholdersItems that must be replaced with appropriate identifiers or expressions are shown like the words owner and table-name in the following example:
ALTER TABLE [ owner.]table-name
♦
Repeating itemsLists of repeating items are shown with an element of the list followed by an ellipsis (three dots), like column-constraint in the following example:
ADD column-definition [ column-constraint, . . . ]
One or more list elements are allowed. In this example, if more than one is specified, they must be separated by commas.
♦
Optional portionsOptional portions of a statement are enclosed by square brackets.
RELEASE SAVEPOINT [ savepoint-name ]
These square brackets indicate that the savepoint-name is optional. The square brackets should not be typed.
♦
OptionsWhen none or only one of a list of items can be chosen, vertical bars separate the items and the list is enclosed in square brackets.
[ ASC | DESC ]
For example, you can choose one of ASC, DESC, or neither. The square brackets should not be typed.
♦
AlternativesWhen precisely one of the options must be chosen, the alternatives are enclosed in curly braces and a bar is used to separate the options.
[ QUOTES { ON | OFF } ]
If the QUOTES option is used, one of ON or OFF must be provided. The
brackets and braces should not be typed.
Graphic icons The following icons are used in this documentation.
♦ A client application.
♦ A database server, such as Sybase Adaptive Server Anywhere.
♦ A database. In some high-level diagrams, the icon may be used to represent both the database and the database server that manages it.
♦ Replication or synchronization middleware. These assist in sharing data among databases. Examples are the MobiLink Synchronization Server and the SQL Remote Message Agent.
♦ A programming interface.
API
Many of the examples throughout the documentation use the Adaptive Server Anywhere sample database.
The sample database is held in a file named asademo.db , and is located in your SQL Anywhere directory.
The sample database represents a small company. It contains internal information about the company (employees, departments, and finances) as well as product information and sales information (sales orders, customers, and contacts). All information in the database is fictional.
The following figure shows the tables in the sample database and how they relate to each other.
fin_code
code char(2) <pk>
type char(10) description char(50) product
id integer <pk>
name char(15) description char(30) size char(18) color char(6) quantity integer unit_price numeric (15,2)
asademo.db
contact
id integer <pk>
last_name char(15) first_name char(15) title char(2) street char(30) city char(20) state char(2) zip char(5) phone char(10) fax char(10) customer
id integer <pk>
fname char(15) lname char(20) address char(35) city char(20) state char(2) zip char(10) phone char(20) company_name char(35)
fin_data
year char(4) <pk>
quarter char(2) <pk>
code char(2) <pk,fk>
amount numeric(9) sales_order
id integer <pk>
cust_id integer <fk>
order_date date fin_code_id char(2) <fk>
region char(7) sales_rep integer <fk>
sales_order_items
id integer <pk,fk>
line_id smallint <pk>
prod_id integer <fk>
quantity integer ship_date date
department
dept_id integer <pk>
dept_name char(40) dept_head_id integer <fk>
Employee
emp_id integer <pk>
manager_id integer
emp_fname char(20)
emp_lname char(20)
dept_id integer <fk>
street char(40)
city char(20)
state char(4)
zip_code char(9)
phone char(10)
status char(1)
ss_number char(11)
salary numeric(20,3)
start_date date
termination_date date
birth_date date
bene_health_ins char(1) bene_life_ins char(1) bene_day_care char(1)
sex char(1)
id = prod_id
id = cust_id
id = id emp_id = sales_rep
emp_id = dept_head_id code = fin_code_id
dept_id = dept_id
code = code
Finding out more and providing feedback
Finding out more Additional information and resources, including a code exchange, are available at the iAnywhere Developer Network at
http://www.ianywhere.com/developer/ .
If you have questions or need help, you can post messages to the iAnywhere Solutions newsgroups listed below.
When you write to one of these newsgroups, always provide detailed information about your problem, including the build number of your version of SQL Anywhere Studio. You can find this information by typing dbeng9 -v at a command prompt.
The newsgroups are located on the forums.sybase.com news server. The newsgroups include the following:
♦ sybase.public.sqlanywhere.general
♦ sybase.public.sqlanywhere.linux
♦ sybase.public.sqlanywhere.mobilink
♦ sybase.public.sqlanywhere.product_futures_discussion
♦ sybase.public.sqlanywhere.replication
♦ sybase.public.sqlanywhere.ultralite
♦ ianywhere.public.sqlanywhere.qanywhere
Newsgroup disclaimer
iAnywhere Solutions has no obligation to provide solutions, information or ideas on its newsgroups, nor is iAnywhere Solutions obliged to provide anything other than a systems operator to monitor the service and ensure its operation and availability.
iAnywhere Solutions Technical Advisors as well as other staff assist on the newsgroup service when they have time available. They offer their help on a volunteer basis and may not be available on a regular basis to provide solutions and information. Their ability to help is based on their workload.
Feedback We would like to receive your opinions, suggestions, and feedback on this documentation.
You can e-mail comments and suggestions to the SQL Anywhere
documentation team at [email protected]. Although we do not reply
to e-mails sent to that address, we read all suggestions with interest.
D ESIGNING AND C REATING
D ATABASES
This part describes key concepts and strategies for designing and building
databases. It covers issues of database design as well as the mechanics of
working with tables, views, and indexes. It also includes material on
referential integrity and transactions.
Designing Your Database
About this chapter This chapter introduces the basic concepts of relational database design and gives you step-by-step suggestions for designing your own databases. It uses the expedient technique known as conceptual data modeling, which focuses on entities and the relationships between them.
Contents
Topic: pageIntroduction 4
Database design concepts 5
The design process 12
Designing the database table properties 27
Introduction
While designing a database is not a difficult task for small and medium sized databases, it is an important one. Bad database design can lead to an
inefficient and possibly unreliable database system. Because client
applications are built to work on specific parts of a database, and rely on the database design, a bad design can be difficult to revise at a later date.
For more information, you may also wish to consult an introductory book
such as A Database Primer by C. J. Date. If you are interested in pursuing
database theory, C. J. Date’s An Introduction to Database Systems is an
excellent textbook on the subject.
Database design concepts
In designing a database, you plan what things you want to store information about, and what information you will keep about each one. You also determine how these things are related. In the common language of database design, what you are creating during this step is a conceptual database model.
Entities and relationships The distinguishable objects or things that you want to store information about are called entities. The associations between them are called relationships. In the language of database description, you can think of entities as nouns and relationships as verbs.
Conceptual models are useful because they make a clean distinction between the entities and relationships. These models hide the details involved in implementing a design in any particular database-management system. They allow you to focus on fundamental database structure. Hence, they also form a common language for the discussion of database design.
Entity-relationship diagrams
The main component of a conceptual database model is a diagram that shows the entities and relationships. This diagram is commonly called an entity-relationship diagram. In consequence, many people use the name entity-relationship modeling to refer to the task of creating a conceptual database model.
Conceptual database design is a top-down design method. There are now sophisticated tools such as Sybase PowerDesigner that help you pursue this method, or other approaches. This chapter is an introductory chapter only, but it does contain enough information for the design of straightforward databases.
Entities
An entity is the database equivalent of a noun. Distinguishable objects such as employees, order items, departments and products are all examples of entities. In a database, a table represents each entity. The entities that you build into your database arise from the activities for which you will be using the database, such as tracking sales calls and maintaining employee
information.
Attributes Each entity contains a number of attributes. Attributes are particular characteristics of the things that you would like to store. For example, in an employee entity, you might want to store an employee ID number, first and last names, an address, and other particular information that pertains to a particular employee. Attributes are also known as properties.
You depict an entity using a rectangular box. Inside, you list the attributes
associated with that entity.
Employee Employee Number First Name Last Name Address
An identifier is one or more attributes on which all the other attributes depend. It uniquely identifies an item in the entity. Underline the names of attributes that you wish to form part of an identifier.
In the Employee entity, above, the Employee Number uniquely identifies an employee. All the other attributes store information that pertains only to that one employee. For example, an employee number uniquely determines an employee’s name and address. Two employees might have the same name or the same address, but you can make sure that they don’t have the same employee number. Employee Number is underlined to show that it is an identifier.
It is good practice to create an identifier for each entity. As will be explained later, these identifiers become primary keys within your tables. Primary key values must be unique and cannot be null or undefined. They identify each row in a table uniquely and improve the performance of the database server.
Relationships
A relationship between entities is the database equivalent of a verb. An employee is a member of a department, or an office is located in a city.
Relationships in a database may appear as foreign key relationships between tables, or may appear as separate tables themselves. You will see examples of each in this chapter.
The relationships in the database are an encoding of rules or practices that govern the data in the entities. If each department has one department head, you can create a one-to-one relationship between departments and
employees to identify the department head.
Once a relationship is built into the structure of the database, there is no provision for exceptions. There is nowhere to put a second department head.
Duplicating the department entry would involve duplicating the department
ID, which is the identifier. Duplicate identifiers are not allowed.
Tip
Strict database structure can benefit you, because it can eliminate incon- sistencies, such as a department with two managers. On the other hand, you as the designer should make your design flexible enough to allow some expansion for unforeseen uses. Extending a well-designed database is usually not too difficult, but modifying the existing table structure can render an entire database and its client applications obsolete.
Cardinality of relationships
There are three kinds of relationships between tables. These correspond to the cardinality (number) of the entities involved in the relationship.
♦
One-to-one relationshipsYou depict a relationship by drawing a line between two entities. The line may have other markings on it such as the two little circles shown. Later sections explain the purpose of these marks. In the following diagram, one employee manages one department.
Department Employee
Management relationship
♦
One-to-many relationshipsThe fact that one item contained in Entity 1 can be associated with multiple entities in Entity 2 is denoted by the multiple lines forming the attachment to Entity 2. In the following diagram, one office can have many phones.
Office Telephones
Phone Location relationship
♦
Many-to-many relationshipsIn this case, draw multiple lines for the
connections to both entities. This means that one warehouse can hold many different parts, and one type of part can be stored at many warehouses.
Parts Warehouses
storage relationship
Roles You can describe each relationship with two roles. Roles are verbs or phrases that describe the relationship from each point of view. For example, a relationship between employees and departments might be described by the following two roles.
1. An employee is a member of a department.
2. A department contains an employee.
Employee Employee Number First name Last name Address
Department Department ID Department name contains
is a member of
Roles are very important because they afford you a convenient and effective
means of verifying your work.
Tip
Whether reading from left-to-right or from right-to-left, the following rule makes it easy to read these diagrams:Read the 1 name of the first entity, 2 role next to the first entity , 3 cardinality from the connection to the second entity , and 4 name of the second entity.
Mandatory elements The little circles just before the end of the line that denotes the relation serve an important purpose. A circle means that an element can exist in the one entity without a corresponding element in the other entity.
If a cross bar appears in place of the circle, that entity must contain at least one element for each element in the other entity. An example will clarify these statements.
Publisher ID Number Publisher Name
Author ID Number First Name Last Name is published by
writes Books
ID Number Title publishes
is written by
This diagram corresponds to the following four statements.
1. A publisher publishes zero or more books.
2. A book is published by exactly one publisher.
3. A book is written by one or more authors.
4. An author writes zero or more books.
Tip
Think of the little circle as the digit 0 and the cross bar as the number one.
The circle means at least zero . The cross bar means at least one .
Reflexive relationships Sometimes, a relationship will exist between entries in a single entity. In this
case, the relationship is said to be reflexive. Both ends of the relationship
attach to a single entity.
Employee Employee Number First Name Last Name Address
manages reports to
This diagram corresponds to the following two statements.
1. An employee reports to at most one other employee.
2. An employee manages zero or more employees.
Notice that in the case of this relation, it is essential that the relation be optional in both directions. Some employees are not managers. Similarly, at least one employee should head the organization and hence report to no one.
☞ Naturally, you would also like to specify that an employee cannot be his or her own manager. This restriction is a type of business rule . Business rules are discussed later as part of “The design process” on page 12.
Changing many-to-many relationships into entities
When you have attributes associated with a relationship , rather than an entity, you can change the relationship into an entity. This situation
sometimes arises with many-to-many relationships, when you have attributes that are particular to the relationship and so you cannot reasonably add them to either entity.
Suppose that your parts inventory is located at a number of different warehouses. You have drawn the following diagram.
Parts
Part Number
Description Warehouse
Warehouse ID Address
But you wish to record the quantity of each part stored at each location. This
attribute can only be associated with the relationship. Each quantity depends
on both the parts and the warehouse involved. To represent this situation,
you can redraw the diagram as follows:
Parts Part Number Description
stored at
Inventory
Quantity
Warehouse Warehouse ID Address contains
Notice the following details of the transformation:
1. Two new relations join the relation entity with each of the two original entities. They inherit their names from the two roles of the original relationship: stored at and contains , respectively.
2. Each entry in the Inventory entity demands one mandatory entry in the Parts entity and one mandatory entry in the Warehouse entity. These relationships are mandatory because a storage relationship only makes sense if it is associated with one particular part and one particular warehouse.
3. The new entity is dependent on both the Parts entity and on the Warehouse entity, meaning that the new entity is identified by the identifiers of both of these entities. In this new diagram, one identifier from the Parts entity and one identifier from the Warehouse entity uniquely identify an entry in the Inventory entity. The triangles that appear between the circles and the multiple lines that join the two new relationships to the new Inventory entity denote the dependencies.
Do not add either a Part Number or Warehouse ID attribute to the Inventory
entity. Each entry in the Inventory entity does depend on both a particular
part and a particular warehouse, but the triangles denote this dependence
more clearly.
The design process
There are five major steps in the design process.
♦ “Step 1: Identify entities and relationships” on page 12.
♦ “Step 2: Identify the required data” on page 15.
♦ “Step 3: Normalize the data” on page 17.
♦ “Step 4: Resolve the relationships” on page 22.
♦ “Step 5: Verify the design” on page 25.
☞ For more information about implementing the database design, see
“Working with Database Objects” on page 29.
Step 1:Identify entities and relationships
❖ To identify the entities in your design and their relationship to each other
1.
Define high-level activitiesIdentify the general activities for which you will use this database. For example, you may want to keep track of information about employees.
2.
Identify entitiesFor the list of activities, identify the subject areas you need to maintain information about. These subjects will become entities.
For example, hire employees , assign to a department , and determine a skill level.
3.
Identify relationshipsLook at the activities and determine what the relationships will be between the entities. For example, there is a relationship between parts and warehouses. Define two roles to describe each relationship.
4.
Break down the activitiesYou started out with high-level activities.
Now, examine these activities more carefully to see if some of them can be broken down into lower-level activities. For example, a high-level activity such as maintain employee information can be broken down into:
♦ Add new employees.
♦ Change existing employee information.
♦ Delete terminated employees.
5.
Identify business rulesLook at your business description and see what rules you follow. For example, one business rule might be that a
department has one and only one department head. These rules will be
built into the structure of the database.
Entity and relationship example
Example ACME Corporation is a small company with offices in five locations.
Currently, 75 employees work for ACME. The company is preparing for rapid growth and has identified nine departments, each with its own department head.
To help in its search for new employees, the personnel department has identified 68 skills that it believes the company will need in its future employee base. When an employee is hired, the employee’s level of expertise for each skill is identified.
Define high-level activities
Some of the high-level activities for ACME Corporation are:
♦ Hire employees.
♦ Terminate employees.
♦ Maintain personal employee information.
♦ Maintain information on skills required for the company.
♦ Maintain information on which employees have which skills.
♦ Maintain information on departments.
♦ Maintain information on offices.
Identify the entities and relationships
Identify the entities (subjects) and the relationships (roles) that connect them. Create a diagram based on the description and high-level activities.
Use boxes to show entities and lines to show relationships. Use the two roles to label each relationship. You should also identify those relationships that are one-to-many, one-to-one, and many-to-many using the appropriate annotation.
Following is a rough entity-relationship diagram. It will be refined
throughout the chapter.
Skill
Employee
Office
Department is acquired by
is capable of
is headed by
manages
contains is a member of
manages reports to works out of
contains
Break down the high-level activities
The following lower-level activities below are based on the high-level activities listed above:
♦ Add or delete an employee.
♦ Add or delete an office.
♦ List employees for a department.
♦ Add a skill to the skill list.
♦ Identify the skills of an employee.
♦ Identify an employee’s skill level for each skill.
♦ Identify all employees that have the same skill level for a particular skill.
♦ Change an employee’s skill level.
These lower-level activities can be used to identify if any new tables or relationships are needed.
Identify business rules Business rules often identify one-to-many, one-to-one, and many-to-many relationships.
The kind of business rules that may be relevant include the following:
♦ There are now five offices; expansion plans allow for a maximum of ten.
♦ Employees can change department or office.
♦ Each department has one department head.
♦ Each office has a maximum of three telephone numbers.
♦ Each telephone number has one or more extensions.
♦ When an employee is hired, the level of expertise in each of several skills is identified.
♦ Each employee can have from three to twenty skills.
♦ An employee may or may not be assigned to an office.
Step 2:Identify the required data
❖ To identify the required data
1. Identify supporting data.
2. List all the data you need to track.
3. Set up data for each entity.
4. List the available data for each entity. The data that describes an entity (subject) answers the questions who, what, where, when, and why.
5. List any data required for each relationship (verb).
6. List the data, if any, that applies to each relationship.
Identify supporting data The supporting data you identify will become the names of the attributes of the entity. For example, the data below might apply to the Employee entity, the Skill entity, and the Expert In relationship.
Employee Skill Expert In
Employee ID Skill ID Skill level
Employee first name Skill name Date skill was acquired Employee last name Description of skill
Employee department Employee office Employee address
If you make a diagram of this data, it will look something like this picture:
Employee Employee ID First name Last name Home address
Skill Skill ID Skill name Skill description is capable of
is acquired by
Observe that not all of the attributes you listed appear in this diagram. The missing items fall into two categories:
1. Some are contained implicitly in other relationships; for example, Employee department and Employee office are denoted by the relations to the Department and Office entities, respectively.
2. Others are not present because they are associated not with either of these entities, but rather the relationship between them. The above diagram is inadequate.
The first category of items will fall naturally into place when you draw the entire entity-relationship diagram.
You can add the second category by converting this many-to-many relationship into an entity.
Employee Employee ID First name Last name Home address
Skill Skill ID Skill name Skill description
is capable of is acquired by
Expert in Skill level Date acquired
The new entity depends on both the Employee and the Skill entities. It
borrows its identifiers from these entities because it depends on both of them.
Notes ♦ When you are identifying the supporting data, be sure to refer to the activities you identified earlier to see how you will access the data.
For example, you may need to list employees by first name in some situations and by last name in others. To accommodate this requirement, create a First Name attribute and a Last Name attribute, rather than a single attribute that contains both names. With the names separate, you can later create two indexes, one suited to each task.
♦ Choose consistent names. Consistency makes it easier to maintain your database and easier to read reports and output windows.
For example, if you choose to use an abbreviated name such as Emp_status for one attribute, you should not use a full name, such as Employee_ID, for another attribute. Instead, the names should be Emp_status and Emp_ID.
♦ At this stage, it is not crucial that the data be associated with the correct entity. You can use your intuition. In the next section, you’ll apply tests to check your judgment.
Step 3:Normalize the data
Normalization is a series of tests that eliminate redundancy in the data and make sure the data is associated with the correct entity or relationship. There are five tests. This section presents the first three of them. These three tests are the most important and so the most frequently used.
Why normalize?
The goals of normalization are to remove redundancy and to improve consistency. For example, if you store a customer’s address in multiple locations, it is difficult to update all copies correctly when they move.
☞ For more information about the normalization tests, see a book on database design.
Normal forms There are several tests for data normalization. When your data passes the first test, it is considered to be in first normal form. When it passes the second test, it is in second normal form, and when it passes the third test, it is in third normal form.
❖ To normalize data in a database 1. List the data.
♦ Identify at least one key for each entity. Each entity must have an
identifier.
♦ Identify keys for relationships. The keys for a relationship are the keys from the two entities that it joins.
♦ Check for calculated data in your supporting data list. Calculated data is not normally stored in a relational database.
2. Put data in first normal form.
♦ If an attribute can have several different values for the same entry, remove these repeated values.
♦ Create one or more entities or relationships with the data that you remove.
3. Put data in second normal form.
♦ Identify entities and relationships with more than one key.
♦ Remove data that depends on only one part of the key.
♦ Create one or more entities and relationships with the data that you remove.
4. Put data in third normal form.
♦ Remove data that depends on other data in the entity or relationship, not on the key.
♦ Create one or more entities and relationships with the data that you remove.
Data and identifiers Before you begin to normalize (test your design), simply list the data and identify a unique identifier each table. The identifier can be made up of one piece of data (attribute) or several (a compound identifier).
The identifier is the set of attributes that uniquely identifies each row in an entity. For example, the identifier for the Employee entity is the
Employee ID attribute. The identifier for the Works In relationship consists of the Office Code and Employee ID attributes.
You can make an identifier for each relationship in your database by taking the identifiers from each of the entities that it connects. In the following table, the attributes identified with an asterisk are the identifiers for the entity or relationship.
Entity or Relationship Attributes
Office *Office code
Office address Phone number
Works in *Office code
*Employee ID
Entity or Relationship Attributes
Department *Department ID
Department name
Heads *Department ID
*Employee ID
Member of *Department ID
*Employee ID
Skill *Skill ID
Skill name Skill description
Expert in *Skill ID
*Employee ID Skill level Date acquired
Employee *Employee ID
last name first name
Social security number Address
phone number date of birth
Putting data in first normal form
♦ To test for first normal form, look for attributes that can have repeating values.
♦ Remove attributes when multiple values can apply to a single item. Move these repeating attributes to a new entity.
In the entity below, Phone number can repeat—an office can have more than
one telephone number.
Department Employee Management Relationship
Remove the repeating attribute and make a new entity called Telephone. Set up a relationship between Office and Telephone.
Office Office code Office address
Telephone Phone number has
is located at
Putting data in second normal form
♦ Remove data that does not depend on the whole key.
♦ Look only at entities and relationships whose identifier is composed of more than one attribute. To test for second normal form, remove any data that does not depend on the whole identifier. Each attribute should depend on all of the attributes that comprise the identifier.
In this example, the identifier of the Employee and Department entity is
composed of two attributes. Some of the data does not depend on both
identifier attributes; for example, the department name depends on only one
of those attributes, Department ID, and Employee first name depends only
on Employee ID.
Employee and Department Employee ID
Department ID Employee first name Employee last name Department name
Move the identifier Department ID, which the other employee data does not depend on, to a entity of its own called Department. Also move any attributes that depend on it. Create a relationship between Employee and Department.
Employee
Employee ID Employee first name Employee last name
Department
Department ID Department name contains
works in
Putting data in third normal form
♦ Remove data that doesn’t depend directly on the key.
♦ To test for third normal form, remove any attributes that depend on other attributes, rather than directly on the identifier.
In this example, the Employee and Office entity contains some attributes that depend on its identifier, Employee ID. However, attributes such as Office location and Office phone depend on another attribute, Office code. They do not depend directly on the identifier, Employee ID.
Employee and Office Employee ID
Employee first name Employee last name Office code Office location Office phone
Remove Office code and those attributes that depend on it. Make another entity called Office. Then, create a relationship that connects Employee with Office.
Employee Employee ID Employee first name Employee last name
Office Office code Office location Office phone houses
works out of
Step 4:Resolve the relationships
When you finish the normalization process, your design is almost complete.
All you need to do is to generate the physical data model that corresponds to your conceptual data model. This process is also known as resolving the relationships, because a large portion of the task involves converting the relationships in the conceptual model into the corresponding tables and foreign-key relationships.
Whereas the conceptual model is largely independent of implementation details, the physical data model is tightly bound to the table structure and options available in a particular database application. In this case, that application is Adaptive Server Anywhere.
Resolving relationships that do not carry data
In order to implement relationships that do not carry data, you define foreign keys. A foreign key is a column or set of columns that contains primary key values from another table. The foreign key allows you to access data from more than one table at one time.
A database design tool such as the DataArchitect component of Sybase PowerDesigner can generate the physical data model for you. However, if you’re doing it yourself there are some basic rules that help you decide where to put the keys.
♦
One to manyAn one-to-many relationship always becomes an entity
and a foreign key relationship.
Employee Employee Number First name Last name Address
Department Department ID Department name contains
is a member of
Notice that entities become tables. Identifiers in entities become (at least part of) the primary key in a table. Attributes become columns. In a one-to-many relationship, the identifier in the one entity will appear as a new foreign key column in the many table.
Employee Employee ID <pk>
Department ID <fk>
First Name Last Name Address
Department Department ID <pk>
Department Name Department ID = Department ID
In this example, the Employee entity becomes an Employee table . Similarly, the Department entity becomes a Department table. A foreign key called Department ID appears in the Employee table.
♦
One to oneIn a one-to-one relationship, the foreign key can go into either table. If the relationship is mandatory on one side, but optional on the other, it should go on the optional side. In this example, put the foreign key (Vehicle ID) in the Truck table because a vehicle does not have to be a truck.
Vehicle
Vehicle ID Model Price
Truck
Weight rating is a type of
may be
The above entity-relationship model thus resolves the database base
structure, below.
Vehicle
Vehicle ID <pk>
Model Price
Truck
Vehicle ID <fk>
Weight rating Vehicle ID = Vehicle ID
♦
Many to manyIn a many-to-many relationship, a new table is created with two foreign keys. This arrangement is necessary to make the database efficient.
Parts
Part Number
Description Warehouse
Warehouse ID Address
The new Storage Location table relates the Parts and Warehouse tables.
Parts Part Number <pk>
Description
Warehouse Warehouse ID <pk>
Address Storage Location
Part Number <pk,fk>
Warehouse ID <pk,fk>
Warehouse ID = Warehouse ID Part Number = Part Number
Resolving relationships that carry data
Some of your relationships may carry data. This situation often occurs in many-to-many relationships.
Parts Part Number Description
stored at
Inventory
Quantity
Warehouse Warehouse ID Address contains
If this is the case, each entity resolves to a table. Each role becomes a
foreign key that points to another table.
Inventory Warehouse ID <pk,fk>
Part Number <pk,fk>
Quantity
The Inventory entity borrows its identifiers from the Parts and Warehouse tables, because it depends on both of them. Once resolved, these borrowed identifiers form the primary key of the Inventory table.
Tip
A conceptual data model simplifies the design process because it hides a lot of details. For example, a many-to-many relationship always generates an extra table and two foreign key references. In a conceptual data model, you can usually denote all of this structure with a single connection.
Step 5:Verify the design
Before you implement your design, you need to make sure that it supports your needs. Examine the activities you identified at the start of the design process and make sure you can access all of the data that the activities require.
♦ Can you find a path to get the information you need?
♦ Does the design meet your needs?
♦ Is all of the required data available?
If you can answer yes to all the questions above, you are ready to implement your design.
Final design Applying steps 1 through 3 to the database for the little company produces
the following entity-relationship diagram. This database is now in third
normal form.
Skill ID Number Skill name Skill description
Employee Employee ID First name Last name Home address
Office ID Number Office name Address
Department Department ID Department name is acquired by
is capable of
is headed by
manages
contains is a member of
manages reports to works out of
Expert In Skill Level Date Acquired
houses
The corresponding physical data model appears below.
Skill ID Number <pk>
Skill name Skill description
Employee Employee ID <pk>
ID Number <fk>
Emp_Employee ID <fk>
First name Last name Home address Office
ID Number <pk>
Office name Address
Department Department ID <pk>
Employee ID <fk>
Department name ID Number = ID Number
Employee ID = Employee ID
Employee ID = Employee ID
Employee ID = Employee ID ID Number = ID Number
Expert In ID Number <pk,fk>
Employee ID <pk,fk>
Skill Level Date Acquired
Department/Employee Department ID <pk,fk>
Employee ID <pk,fk>
Department ID = Department ID
Employee ID = Emp_Employee ID