Adaptive Server

(1)

SQL User’s Guide

Part number: DC38124-01-0902-01

Last modified: October 2004

(2)

No part of this publication may be reproduced, transmitted, or translated in any form or by any means, electronic, mechanical, manual, optical, or otherwise, without the prior written permission of iAnywhere Solutions, Inc. iAnywhere Solutions, Inc. is a subsidiary of Sybase, Inc.

Sybase, SYBASE (logo), AccelaTrade, ADA Workbench, Adaptable Windowing Environment, Adaptive Component Architecture, Adaptive Server, Adaptive Server Anywhere, Adaptive Server Enterprise, Adaptive Server Enterprise Monitor, Adaptive Server Enterprise Replication, Adaptive Server Everywhere, Adaptive Server IQ, Adaptive Warehouse, AnswerBase, Anywhere Studio, Application Manager, AppModeler, APT Workbench, APT-Build, APT-Edit, APT-Execute, APT-Library, APT-Translator, ASEP, AvantGo, AvantGo Application Alerts, AvantGo Mobile Delivery, AvantGo Mobile Document Viewer, AvantGo Mobile Inspection, AvantGo Mobile Marketing Channel, AvantGo Mobile Pharma, AvantGo Mobile Sales, AvantGo Pylon, AvantGo Pylon Application Server, AvantGo Pylon Conduit, AvantGo Pylon PIM Server, AvantGo Pylon Pro, Backup Server, BayCam, Bit-Wise, BizTracker, Certified PowerBuilder Developer, Certified SYBASE Professional, Certified SYBASE Professional Logo, ClearConnect, Client Services, Client-Library, CodeBank, Column Design, ComponentPack, Connection Manager, Convoy/DM, Copernicus, CSP, Data Pipeline, Data Workbench, DataArchitect, Database Analyzer, DataExpress, DataServer, DataWindow, DB-Library, dbQueue, Developers Workbench, Direct Connect Anywhere, DirectConnect, Distribution Director, Dynamic Mobility Model, Dynamo, e-ADK, E-Anywhere, e-Biz Integrator, EC Gateway, ECMAP, ECRTP, eFulfillment Accelerator, Electronic Case Management, Embedded SQL, EMS, Enterprise Application Studio, Enterprise Client/Server, Enterprise Connect, Enterprise Data Studio, Enterprise Manager, Enterprise Portal (logo), Enterprise SQL Server Manager, Enterprise Work Architecture, Enterprise Work Designer, Enterprise Work Modeler, eProcurement Accelerator, eremote, Everything Works Better When Everything Works Together, EWA, E-Whatever, Financial Fusion, Financial Fusion (and design), Financial Fusion Server, Formula One, Fusion Powered e-Finance, Fusion Powered Financial Destinations, Fusion Powered STP, Gateway Manager, GeoPoint, GlobalFIX, iAnywhere, iAnywhere Solutions, ImpactNow, Industry Warehouse Studio, InfoMaker, Information Anywhere, Information Everywhere, InformationConnect, InstaHelp, Intelligent Self-Care, InternetBuilder, iremote, iScript, Jaguar CTS, jConnect for JDBC,

KnowledgeBase, Logical Memory Manager, Mail Anywhere Studio, MainframeConnect, Maintenance Express, Manage Anywhere Studio, MAP, M-Business Channel, M-Business Network, M-Business Server, MDI Access Server, MDI Database Gateway, media.splash, Message Anywhere Server, MetaWorks, MethodSet, ML Query, MobiCATS, My AvantGo, My AvantGo Media Channel, My AvantGo Mobile Marketing, MySupport, Net-Gateway, Net-Library, New Era of Networks, Next Generation Learning, Next Generation Learning Studio, O DEVICE, OASiS, OASiS logo, ObjectConnect, ObjectCycle, OmniConnect, OmniSQL Access Module, OmniSQL Toolkit, Open Biz, Open Business Interchange, Open Client, Open Client/Server, Open Client/Server Interfaces, Open ClientConnect, Open Gateway, Open Server, Open ServerConnect, Open Solutions, Optima++, Orchestration Studio, Partnerships that Work, PB-Gen, PC APT Execute, PC DB-Net, PC Net Library, PhysicalArchitect, Pocket PowerBuilder, PocketBuilder, Power Through Knowledge, power.stop, Power++, PowerAMC, PowerBuilder, PowerBuilder Foundation Class Library, PowerDesigner, PowerDimensions, PowerDynamo, Powering the New Economy, PowerJ, PowerScript, PowerSite, PowerSocket, Powersoft, Powersoft Portfolio, Powersoft Professional, PowerStage, PowerStudio, PowerTips, PowerWare Desktop, PowerWare Enterprise, ProcessAnalyst, QAnywhere, Rapport, Relational Beans, RepConnector, Replication Agent, Replication Driver, Replication Server, Replication Server Manager, Replication Toolkit, Report Workbench, Report-Execute, Resource Manager, RW-DisplayLib, RW-Library, S.W.I.F.T. Message Format Libraries, SAFE, SAFE/PRO, SDF, Secure SQL Server, Secure SQL Toolset, Security Guardian, SKILS, smart.partners, smart.parts, smart.script, SQL Advantage, SQL Anywhere, SQL Anywhere Studio, SQL Code Checker, SQL Debug, SQL Edit, SQL Edit/TPU, SQL Everywhere, SQL Modeler, SQL Remote, SQL Server, SQL Server Manager, SQL Server SNMP SubAgent, SQL Server/CFT, SQL Server/DBM, SQL SMART, SQL Station, SQL Toolset, SQLJ, Stage III Engineering, Startup.Com, STEP, SupportNow, Sybase Central, Sybase Client/Server Interfaces, Sybase Development Framework, Sybase Financial Server, Sybase Gateways, Sybase Learning Connection, Sybase MPP, Sybase SQL Desktop, Sybase SQL Lifecycle, Sybase SQL Workgroup, Sybase Synergy Program, Sybase User Workbench, Sybase Virtual Server Architecture, SybaseWare, Syber Financial, SyberAssist, SybMD, SyBooks, System 10, System 11, System XI (logo), SystemTools, Tabular Data Stream, The Enterprise Client/Server Company, The Extensible Software Platform, The Future Is Wide Open, The Learning Connection, The Model For Client/Server Solutions, The Online Information Center, The Power of One, TotalFix, TradeForce, Transact-SQL, Translation Toolkit, Turning Imagination Into Reality, UltraLite, UltraLite.NET, UNIBOM, Unilib, Uninull, Unisep, Unistring, URK Runtime Kit for UniCode, Versacore, Viewer, VisualWriter, VQL, Warehouse Control Center, Warehouse Studio, Warehouse WORKS, WarehouseArchitect, Watcom, Watcom SQL, Watcom SQL Server, Web Deployment Kit, Web.PB, Web.SQL, WebSights, WebViewer, WorkGroup SQL Server, XA-Library, XA-Server, and XP Server are trademarks of Sybase, Inc. or its subsidiaries.

All other trademarks are property of their respective owners.

(3)

About This Manual ix

SQL Anywhere Studio documentation . . . . x

Documentation conventions . . . . xiii

The Adaptive Server Anywhere sample database . . . . xv

Finding out more and providing feedback . . . . xvi

I Designing and Creating Databases 1 1 Designing Your Database 3 Introduction . . . . 4

Database design concepts . . . . 5

The design process . . . . 12

Designing the database table properties . . . . 27

2 Working with Database Objects 29 Introduction . . . . 30

Working with databases . . . . 31

Working with tables . . . . 40

Working with views . . . . 54

Working with indexes . . . . 62

Working with temporary tables . . . . 76

Copying database objects in Sybase Central . . . . 77

3 Ensuring Data Integrity 79 Data integrity overview . . . . 80

Using column defaults . . . . 83

Using table and column constraints . . . . 88

Using domains . . . . 92

Enforcing entity and referential integrity . . . . 95

Integrity rules in the system tables . . . . 100

4 Using Transactions and Isolation Levels 101 Introduction to transactions . . . . 102

Isolation levels and consistency . . . . 106

Transaction blocking and deadlock . . . . 113

Choosing isolation levels . . . . 116

Isolation level tutorials . . . . 120

How locking works . . . . 135

(4)

Particular concurrency issues . . . . 149

Replication and concurrency . . . . 152

Summary . . . . 155

5 Monitoring and Improving Performance 157 Performance analysis tools . . . . 158

Top performance tips . . . . 165

Using the cache to improve performance . . . . 180

Using indexes to improve performance . . . . 186

Using keys to improve query performance . . . . 187

Sorting query results . . . . 189

Use of work tables in query processing . . . . 190

Monitoring database performance . . . . 192

Fragmentation . . . . 198

Profiling database procedures . . . . 202

II Querying and Modifying Data 211 6 Queries: Selecting Data from a Table 213 Query overview . . . . 214

The SELECT list: specifying columns . . . . 217

The FROM clause: specifying tables . . . . 224

The WHERE clause: specifying rows . . . . 225

7 Summarizing, Grouping, and Sorting Query Results 237 Summarizing query results using aggregate functions . . . . 238

The GROUP BY clause: organizing query results into groups 243 Understanding GROUP BY . . . . 244

The HAVING clause: selecting groups of data . . . . 248

The ORDER BY clause: sorting query results . . . . 250

Performing set operations on query results with UNION, IN- TERSECT, and EXCEPT . . . . 253

Standards and compatibility . . . . 260

8 Joins: Retrieving Data from Several Tables 263 Sample database schema . . . . 264

How joins work . . . . 265

Joins overview . . . . 266

Explicit join conditions (the ON phrase) . . . . 271

Cross joins . . . . 274

Inner and outer joins . . . . 276

Specialized joins . . . . 283

Natural joins . . . . 290

(5)

9 Common Table Expressions 307

About common table expressions . . . . 308

Typical applications of common table expressions . . . . 312

Recursive common table expressions . . . . 316

Parts explosion problems . . . . 319

Data type declarations in recursive common table expressions 322 Least distance problem . . . . 324

Using multiple recursive common table expressions . . . . . 327

10 Using OLAP 329 Understanding OLAP . . . . 330

Understanding subtotals . . . . 331

NULL values and subtotal rows . . . . 334

Using ROLLUP . . . . 335

Using CUBE . . . . 340

Using GROUPING SETS . . . . 343

Working with OLAP functions . . . . 346

11 Using Subqueries 355 Introduction to subqueries . . . . 356

Using subqueries in the WHERE clause . . . . 357

Subqueries in the HAVING clause . . . . 358

Subquery comparison test . . . . 360

Quantified comparison tests with ANY and ALL . . . . 361

Testing set membership with IN conditions . . . . 364

Existence test . . . . 366

Outer references . . . . 368

Subqueries and joins . . . . 369

Nested subqueries . . . . 371

How subqueries work . . . . 373

12 Adding, Changing, and Deleting Data 383 Data modification statements . . . . 384

Adding data using INSERT . . . . 385

Changing data using UPDATE . . . . 390

Changing data using INSERT . . . . 392

Deleting data using DELETE . . . . 393

13 Query Optimization and Execution 395 The role of the optimizer . . . . 396

How the optimizer works . . . . 397

Query execution algorithms . . . . 407

(6)

Physical data organization and access . . . . 423

Indexes . . . . 426

Semantic query transformations . . . . 435

Subquery and function caching . . . . 449

Reading access plans . . . . 451

III SQL Dialects and Compatibility 471 14 Transact-SQL Compatibility 473 An overview of Transact-SQL support . . . . 474

Adaptive Server architectures . . . . 477

Configuring databases for Transact-SQL compatibility . . . . 483

Writing compatible SQL statements . . . . 492

Transact-SQL procedure language overview . . . . 497

Automatic translation of stored procedures . . . . 500

Returning result sets from Transact-SQL procedures . . . . . 501

Variables in Transact-SQL procedures . . . . 502

Error handling in Transact-SQL procedures . . . . 503

15 Differences from Other SQL Dialects 507 Adaptive Server Anywhere SQL features . . . . 508

IV XML in the Database 511 16 Using XML in the Database 513 What is XML? . . . . 514

Storing XML documents in relational databases . . . . 515

Exporting relational data as XML . . . . 516

Importing XML documents as relational data . . . . 517

Obtaining query results as XML . . . . 524

Using SQL/XML to obtain query results as XML . . . . 544

V Remote Data and Bulk Operations 553 17 Importing and Exporting Data 555 Transferring data into and out of databases . . . . 556

Performance aspects of bulk operations . . . . 557

Data recovery issues for bulk operations . . . . 558

Importing data into databases . . . . 559

Exporting data from databases . . . . 569

Rebuilding databases . . . . 580

(7)

Using SQL command files . . . . 596

Adaptive Server Enterprise compatibility . . . . 599

18 Accessing Remote Data 601 Introduction . . . . 602

Basic concepts to access remote data . . . . 604

Working with remote servers . . . . 606

Working with external logins . . . . 612

Working with proxy tables . . . . 614

Joining remote tables . . . . 619

Joining tables from multiple local databases . . . . 621

Sending native statements to remote servers . . . . 622

Using remote procedure calls (RPCs) . . . . 623

Transaction management and remote data . . . . 626

Internal operations . . . . 628

Troubleshooting remote data access . . . . 632

19 Server Classes for Remote Data Access 635 Overview . . . . 636

JDBC-based server classes . . . . 637

ODBC-based server classes . . . . 640

VI Stored Procedures and Triggers 653 20 Using Procedures, Triggers, and Batches 655 Procedure and trigger overview . . . . 657

Benefits of procedures and triggers . . . . 658

Introduction to procedures . . . . 659

Introduction to user-defined functions . . . . 666

Introduction to triggers . . . . 670

Introduction to batches . . . . 677

Control statements . . . . 679

The structure of procedures and triggers . . . . 682

Returning results from procedures . . . . 686

Using cursors in procedures and triggers . . . . 692

Errors and warnings in procedures and triggers . . . . 695

Using the EXECUTE IMMEDIATE statement in procedures . 704 Transactions and savepoints in procedures and triggers . . . 706

Tips for writing procedures . . . . 707

Statements allowed in batches . . . . 709

Calling external libraries from procedures . . . . 710

(8)

21 Debugging Logic in the Database 719

Introduction to debugging in the database . . . . 720

Tutorial: Getting started with the debugger . . . . 722

Working with breakpoints . . . . 731

Working with variables . . . . 734

Working with connections . . . . 735

Index 737

(9)

Subject This book describes how to design and create databases; how to import, export, and modify data; how to retrieve data; and how to build stored procedures and triggers.

Audience This manual is for all users of Adaptive Server Anywhere.

Before you begin This manual assumes that you have an elementary familiarity with

database-management systems and Adaptive Server Anywhere in particular.

If you do not have such a familiarity, you should consider reading

Introducing SQL Anywhere Studio before reading this manual.

(10)

SQL Anywhere Studio documentation

This book is part of the SQL Anywhere documentation set. This section describes the books in the documentation set and how you can use them.

The SQL Anywhere Studio documentation

The SQL Anywhere Studio documentation is available in a variety of forms:

in an online form that combines all books in one large help file; as separate PDF files for each book; and as printed books that you can purchase. The documentation consists of the following books:

♦

Introducing SQL Anywhere Studio

This book provides an overview of the SQL Anywhere Studio database management and synchronization technologies. It includes tutorials to introduce you to each of the pieces that make up SQL Anywhere Studio.

♦

What’s New in SQL Anywhere Studio

This book is for users of previous versions of the software. It lists new features in this and previous releases of the product and describes upgrade procedures.

♦

Adaptive Server Anywhere Database Administration Guide

This book covers material related to running, managing, and configuring databases and database servers.

♦

Adaptive Server Anywhere SQL User’s Guide

This book describes how to design and create databases; how to import, export, and modify data; how to retrieve data; and how to build stored procedures and triggers.

♦

Adaptive Server Anywhere SQL Reference Manual

This book provides a complete reference for the SQL language used by Adaptive Server Anywhere. It also describes the Adaptive Server Anywhere system tables and procedures.

♦

Adaptive Server Anywhere Programming Guide

This book describes how to build and deploy database applications using the C, C++, and Java programming languages. Users of tools such as Visual Basic and

PowerBuilder can use the programming interfaces provided by those tools. It also describes the Adaptive Server Anywhere ADO.NET data provider.

♦

Adaptive Server Anywhere SNMP Extension Agent User’s Guide

This book describes how to configure the Adaptive Server Anywhere SNMP Extension Agent for use with SNMP management applications to manage Adaptive Server Anywhere databases.

♦

Adaptive Server Anywhere Error Messages

This book provides a

complete listing of Adaptive Server Anywhere error messages together

with diagnostic information.

(11)

databases. Adaptive Server Anywhere 7.0 was awarded a TCSEC (Trusted Computer System Evaluation Criteria) C2 security rating from the U.S. Government. This book may be of interest to those who wish to run the current version of Adaptive Server Anywhere in a manner equivalent to the C2-certified environment.

♦

MobiLink Administration Guide

This book describes how to use the MobiLink data synchronization system for mobile computing, which enables sharing of data between a single Oracle, Sybase, Microsoft or IBM database and many Adaptive Server Anywhere or UltraLite databases.

♦

MobiLink Clients

This book describes how to set up and synchronize Adaptive Server Anywhere and UltraLite remote databases.

♦

MobiLink Server-Initiated Synchronization User’s Guide

This book describes MobiLink server-initiated synchronization, a feature of MobiLink that allows you to initiate synchronization from the consolidated database.

♦

MobiLink Tutorials

This book provides several tutorials that walk you through how to set up and run MobiLink applications.

♦

QAnywhere User’s Guide

This manual describes MobiLink QAnywhere, a messaging platform that enables the development and deployment of messaging applications for mobile and wireless clients, as well as traditional desktop and laptop clients.

♦

iAnywhere Solutions ODBC Drivers

This book describes how to set up ODBC drivers to access consolidated databases other than Adaptive Server Anywhere from the MobiLink synchronization server and from Adaptive Server Anywhere remote data access.

♦

SQL Remote User’s Guide

This book describes all aspects of the SQL Remote data replication system for mobile computing, which enables sharing of data between a single Adaptive Server Anywhere or Adaptive Server Enterprise database and many Adaptive Server

Anywhere databases using an indirect link such as e-mail or file transfer.

♦

SQL Anywhere Studio Help

This book includes the context-sensitive help for Sybase Central, Interactive SQL, and other graphical tools. It is not included in the printed documentation set.

♦

UltraLite Database User’s Guide

This book is intended for all

UltraLite developers. It introduces the UltraLite database system and

provides information common to all UltraLite programming interfaces.

(12)

♦

UltraLite Interface Guides

A separate book is provided for each UltraLite programming interface. Some of these interfaces are provided as UltraLite components for rapid application development, and others are provided as static interfaces for C, C++, and Java development.

In addition to this documentation set, PowerDesigner and InfoMaker include their own online documentation.

Documentation formats SQL Anywhere Studio provides documentation in the following formats:

♦

Online documentation

The online documentation contains the complete SQL Anywhere Studio documentation, including both the books and the context-sensitive help for SQL Anywhere tools. The online documentation is updated with each maintenance release of the product, and is the most complete and up-to-date source of documentation.

To access the online documentation on Windows operating systems, choose Start

➤

Programs

➤

SQL Anywhere 9

➤

Online Books. You can navigate the online documentation using the HTML Help table of contents, index, and search facility in the left pane, as well as using the links and menus in the right pane.

To access the online documentation on UNIX operating systems, see the HTML documentation under your SQL Anywhere installation.

♦

^{PDF books}

The SQL Anywhere books are provided as a set of PDF files, viewable with Adobe Acrobat Reader.

The PDF books are accessible from the online books, or from the Windows Start menu.

♦

Printed books

The complete set of books is available from Sybase sales or from eShop, the Sybase online store at

http://eshop.sybase.com/eshop/documentation .

(13)

This section lists the typographic and graphical conventions used in this documentation.

Syntax conventions The following conventions are used in the SQL syntax descriptions:

♦

^Keywords

All SQL keywords appear in upper case, like the words ALTER TABLE in the following example:

ALTER TABLE [ owner.]table-name

♦

Placeholders

Items that must be replaced with appropriate identifiers or expressions are shown like the words owner and table-name in the following example:

ALTER TABLE [ owner.]table-name

♦

Repeating items

Lists of repeating items are shown with an element of the list followed by an ellipsis (three dots), like column-constraint in the following example:

ADD column-definition [ column-constraint, . . . ]

One or more list elements are allowed. In this example, if more than one is specified, they must be separated by commas.

♦

Optional portions

Optional portions of a statement are enclosed by square brackets.

RELEASE SAVEPOINT [ savepoint-name ]

These square brackets indicate that the savepoint-name is optional. The square brackets should not be typed.

♦

Options

When none or only one of a list of items can be chosen, vertical bars separate the items and the list is enclosed in square brackets.

[ ASC | DESC ]

For example, you can choose one of ASC, DESC, or neither. The square brackets should not be typed.

♦

Alternatives

When precisely one of the options must be chosen, the alternatives are enclosed in curly braces and a bar is used to separate the options.

[ QUOTES { ON | OFF } ]

If the QUOTES option is used, one of ON or OFF must be provided. The

brackets and braces should not be typed.

(14)

Graphic icons The following icons are used in this documentation.

♦ A client application.

♦ A database server, such as Sybase Adaptive Server Anywhere.

♦ A database. In some high-level diagrams, the icon may be used to represent both the database and the database server that manages it.

♦ Replication or synchronization middleware. These assist in sharing data among databases. Examples are the MobiLink Synchronization Server and the SQL Remote Message Agent.

♦ A programming interface.

API

(15)

Many of the examples throughout the documentation use the Adaptive Server Anywhere sample database.

The sample database is held in a file named asademo.db , and is located in your SQL Anywhere directory.

The sample database represents a small company. It contains internal information about the company (employees, departments, and finances) as well as product information and sales information (sales orders, customers, and contacts). All information in the database is fictional.

The following figure shows the tables in the sample database and how they relate to each other.

fin_code

code char(2) <pk>

type char(10) description char(50) product

id integer <pk>

name char(15) description char(30) size char(18) color char(6) quantity integer unit_price numeric (15,2)

asademo.db

contact

id integer <pk>

last_name char(15) first_name char(15) title char(2) street char(30) city char(20) state char(2) zip char(5) phone char(10) fax char(10) customer

id integer <pk>

fname char(15) lname char(20) address char(35) city char(20) state char(2) zip char(10) phone char(20) company_name char(35)

fin_data

year char(4) <pk>

quarter char(2) <pk>

code char(2) <pk,fk>

amount numeric(9) sales_order

id integer <pk>

cust_id integer <fk>

order_date date fin_code_id char(2) <fk>

region char(7) sales_rep integer <fk>

sales_order_items

id integer <pk,fk>

line_id smallint <pk>

prod_id integer <fk>

quantity integer ship_date date

department

dept_id integer <pk>

dept_name char(40) dept_head_id integer <fk>

Employee

emp_id integer <pk>

manager_id integer

emp_fname char(20)

emp_lname char(20)

dept_id integer <fk>

street char(40)

city char(20)

state char(4)

zip_code char(9)

phone char(10)

status char(1)

ss_number char(11)

salary numeric(20,3)

start_date date

termination_date date

birth_date date

bene_health_ins char(1) bene_life_ins char(1) bene_day_care char(1)

sex char(1)

id = prod_id

id = cust_id

id = id emp_id = sales_rep

emp_id = dept_head_id code = fin_code_id

dept_id = dept_id

code = code

(16)

Finding out more and providing feedback

Finding out more Additional information and resources, including a code exchange, are available at the iAnywhere Developer Network at

http://www.ianywhere.com/developer/ .

If you have questions or need help, you can post messages to the iAnywhere Solutions newsgroups listed below.

When you write to one of these newsgroups, always provide detailed information about your problem, including the build number of your version of SQL Anywhere Studio. You can find this information by typing dbeng9 -v at a command prompt.

The newsgroups are located on the forums.sybase.com news server. The newsgroups include the following:

♦ sybase.public.sqlanywhere.general

♦ sybase.public.sqlanywhere.linux

♦ sybase.public.sqlanywhere.mobilink

♦ sybase.public.sqlanywhere.product_futures_discussion

♦ sybase.public.sqlanywhere.replication

♦ sybase.public.sqlanywhere.ultralite

♦ ianywhere.public.sqlanywhere.qanywhere

Newsgroup disclaimer

iAnywhere Solutions has no obligation to provide solutions, information or ideas on its newsgroups, nor is iAnywhere Solutions obliged to provide anything other than a systems operator to monitor the service and ensure its operation and availability.

iAnywhere Solutions Technical Advisors as well as other staff assist on the newsgroup service when they have time available. They offer their help on a volunteer basis and may not be available on a regular basis to provide solutions and information. Their ability to help is based on their workload.

Feedback We would like to receive your opinions, suggestions, and feedback on this documentation.

You can e-mail comments and suggestions to the SQL Anywhere

documentation team at [email protected]. Although we do not reply

to e-mails sent to that address, we read all suggestions with interest.

(17)

(18)

(19)

D ESIGNING AND C REATING

D ATABASES

This part describes key concepts and strategies for designing and building

databases. It covers issues of database design as well as the mechanics of

working with tables, views, and indexes. It also includes material on

referential integrity and transactions.

(20)

(21)

Designing Your Database

About this chapter This chapter introduces the basic concepts of relational database design and gives you step-by-step suggestions for designing your own databases. It uses the expedient technique known as conceptual data modeling, which focuses on entities and the relationships between them.

Introduction 4

Database design concepts 5

The design process 12

Designing the database table properties 27

(22)

Introduction

While designing a database is not a difficult task for small and medium sized databases, it is an important one. Bad database design can lead to an

inefficient and possibly unreliable database system. Because client

applications are built to work on specific parts of a database, and rely on the database design, a bad design can be difficult to revise at a later date.

For more information, you may also wish to consult an introductory book

such as A Database Primer by C. J. Date. If you are interested in pursuing

database theory, C. J. Date’s An Introduction to Database Systems is an

excellent textbook on the subject.

(23)

Database design concepts

In designing a database, you plan what things you want to store information about, and what information you will keep about each one. You also determine how these things are related. In the common language of database design, what you are creating during this step is a conceptual database model.

Entities and relationships The distinguishable objects or things that you want to store information about are called entities. The associations between them are called relationships. In the language of database description, you can think of entities as nouns and relationships as verbs.

Conceptual models are useful because they make a clean distinction between the entities and relationships. These models hide the details involved in implementing a design in any particular database-management system. They allow you to focus on fundamental database structure. Hence, they also form a common language for the discussion of database design.

Entity-relationship diagrams

The main component of a conceptual database model is a diagram that shows the entities and relationships. This diagram is commonly called an entity-relationship diagram. In consequence, many people use the name entity-relationship modeling to refer to the task of creating a conceptual database model.

Conceptual database design is a top-down design method. There are now sophisticated tools such as Sybase PowerDesigner that help you pursue this method, or other approaches. This chapter is an introductory chapter only, but it does contain enough information for the design of straightforward databases.

Entities

An entity is the database equivalent of a noun. Distinguishable objects such as employees, order items, departments and products are all examples of entities. In a database, a table represents each entity. The entities that you build into your database arise from the activities for which you will be using the database, such as tracking sales calls and maintaining employee

information.

Attributes Each entity contains a number of attributes. Attributes are particular characteristics of the things that you would like to store. For example, in an employee entity, you might want to store an employee ID number, first and last names, an address, and other particular information that pertains to a particular employee. Attributes are also known as properties.

You depict an entity using a rectangular box. Inside, you list the attributes

(24)

associated with that entity.

Employee Employee Number First Name Last Name Address

An identifier is one or more attributes on which all the other attributes depend. It uniquely identifies an item in the entity. Underline the names of attributes that you wish to form part of an identifier.

In the Employee entity, above, the Employee Number uniquely identifies an employee. All the other attributes store information that pertains only to that one employee. For example, an employee number uniquely determines an employee’s name and address. Two employees might have the same name or the same address, but you can make sure that they don’t have the same employee number. Employee Number is underlined to show that it is an identifier.

It is good practice to create an identifier for each entity. As will be explained later, these identifiers become primary keys within your tables. Primary key values must be unique and cannot be null or undefined. They identify each row in a table uniquely and improve the performance of the database server.

Relationships

A relationship between entities is the database equivalent of a verb. An employee is a member of a department, or an office is located in a city.

Relationships in a database may appear as foreign key relationships between tables, or may appear as separate tables themselves. You will see examples of each in this chapter.

The relationships in the database are an encoding of rules or practices that govern the data in the entities. If each department has one department head, you can create a one-to-one relationship between departments and

employees to identify the department head.

Once a relationship is built into the structure of the database, there is no provision for exceptions. There is nowhere to put a second department head.

Duplicating the department entry would involve duplicating the department

ID, which is the identifier. Duplicate identifiers are not allowed.

(25)

Tip

Strict database structure can benefit you, because it can eliminate incon- sistencies, such as a department with two managers. On the other hand, you as the designer should make your design flexible enough to allow some expansion for unforeseen uses. Extending a well-designed database is usually not too difficult, but modifying the existing table structure can render an entire database and its client applications obsolete.

Cardinality of relationships

There are three kinds of relationships between tables. These correspond to the cardinality (number) of the entities involved in the relationship.

♦

One-to-one relationships

You depict a relationship by drawing a line between two entities. The line may have other markings on it such as the two little circles shown. Later sections explain the purpose of these marks. In the following diagram, one employee manages one department.

Department Employee

Management relationship

♦

One-to-many relationships

The fact that one item contained in Entity 1 can be associated with multiple entities in Entity 2 is denoted by the multiple lines forming the attachment to Entity 2. In the following diagram, one office can have many phones.

Office Telephones

Phone Location relationship

♦

Many-to-many relationships

In this case, draw multiple lines for the

(26)

connections to both entities. This means that one warehouse can hold many different parts, and one type of part can be stored at many warehouses.

Parts Warehouses

storage relationship

Roles You can describe each relationship with two roles. Roles are verbs or phrases that describe the relationship from each point of view. For example, a relationship between employees and departments might be described by the following two roles.

1. An employee is a member of a department.

2. A department contains an employee.

Employee Employee Number First name Last name Address

Department Department ID Department name contains

is a member of

Roles are very important because they afford you a convenient and effective

means of verifying your work.

(27)

Tip

Whether reading from left-to-right or from right-to-left, the following rule makes it easy to read these diagrams:Read the 1 name of the first entity, 2 role next to the first entity , 3 cardinality from the connection to the second entity , and 4 name of the second entity.

Mandatory elements The little circles just before the end of the line that denotes the relation serve an important purpose. A circle means that an element can exist in the one entity without a corresponding element in the other entity.

If a cross bar appears in place of the circle, that entity must contain at least one element for each element in the other entity. An example will clarify these statements.

Publisher ID Number Publisher Name

Author ID Number First Name Last Name is published by

writes Books

ID Number Title publishes

is written by

This diagram corresponds to the following four statements.

1. A publisher publishes zero or more books.

2. A book is published by exactly one publisher.

3. A book is written by one or more authors.

4. An author writes zero or more books.

Tip

Think of the little circle as the digit 0 and the cross bar as the number one.

The circle means at least zero . The cross bar means at least one .

Reflexive relationships Sometimes, a relationship will exist between entries in a single entity. In this

case, the relationship is said to be reflexive. Both ends of the relationship

attach to a single entity.

(28)

Employee Employee Number First Name Last Name Address

manages reports to

This diagram corresponds to the following two statements.

1. An employee reports to at most one other employee.

2. An employee manages zero or more employees.

Notice that in the case of this relation, it is essential that the relation be optional in both directions. Some employees are not managers. Similarly, at least one employee should head the organization and hence report to no one.

☞ Naturally, you would also like to specify that an employee cannot be his or her own manager. This restriction is a type of business rule . Business rules are discussed later as part of “The design process” on page 12.

Changing many-to-many relationships into entities

When you have attributes associated with a relationship , rather than an entity, you can change the relationship into an entity. This situation

sometimes arises with many-to-many relationships, when you have attributes that are particular to the relationship and so you cannot reasonably add them to either entity.

Suppose that your parts inventory is located at a number of different warehouses. You have drawn the following diagram.

Parts

Part Number

Description Warehouse

Warehouse ID Address

But you wish to record the quantity of each part stored at each location. This

attribute can only be associated with the relationship. Each quantity depends

on both the parts and the warehouse involved. To represent this situation,

you can redraw the diagram as follows:

(29)

Parts Part Number Description

stored at

Inventory

Quantity

Warehouse Warehouse ID Address contains

Notice the following details of the transformation:

1. Two new relations join the relation entity with each of the two original entities. They inherit their names from the two roles of the original relationship: stored at and contains , respectively.

2. Each entry in the Inventory entity demands one mandatory entry in the Parts entity and one mandatory entry in the Warehouse entity. These relationships are mandatory because a storage relationship only makes sense if it is associated with one particular part and one particular warehouse.

3. The new entity is dependent on both the Parts entity and on the Warehouse entity, meaning that the new entity is identified by the identifiers of both of these entities. In this new diagram, one identifier from the Parts entity and one identifier from the Warehouse entity uniquely identify an entry in the Inventory entity. The triangles that appear between the circles and the multiple lines that join the two new relationships to the new Inventory entity denote the dependencies.

Do not add either a Part Number or Warehouse ID attribute to the Inventory

entity. Each entry in the Inventory entity does depend on both a particular

part and a particular warehouse, but the triangles denote this dependence

more clearly.

(30)

The design process

There are five major steps in the design process.

♦ “Step 1: Identify entities and relationships” on page 12.

♦ “Step 2: Identify the required data” on page 15.

♦ “Step 3: Normalize the data” on page 17.

♦ “Step 4: Resolve the relationships” on page 22.

♦ “Step 5: Verify the design” on page 25.

☞ For more information about implementing the database design, see

“Working with Database Objects” on page 29.

Step 1:Identify entities and relationships

❖ To identify the entities in your design and their relationship to each other

1.

Define high-level activities

Identify the general activities for which you will use this database. For example, you may want to keep track of information about employees.

2.

Identify entities

For the list of activities, identify the subject areas you need to maintain information about. These subjects will become entities.

For example, hire employees , assign to a department , and determine a skill level.

3.

Identify relationships

Look at the activities and determine what the relationships will be between the entities. For example, there is a relationship between parts and warehouses. Define two roles to describe each relationship.

4.

Break down the activities

You started out with high-level activities.

Now, examine these activities more carefully to see if some of them can be broken down into lower-level activities. For example, a high-level activity such as maintain employee information can be broken down into:

♦ Add new employees.

♦ Change existing employee information.

♦ Delete terminated employees.

5.

Identify business rules

Look at your business description and see what rules you follow. For example, one business rule might be that a

department has one and only one department head. These rules will be

built into the structure of the database.

(31)

Entity and relationship example

Example ACME Corporation is a small company with offices in five locations.

Currently, 75 employees work for ACME. The company is preparing for rapid growth and has identified nine departments, each with its own department head.

To help in its search for new employees, the personnel department has identified 68 skills that it believes the company will need in its future employee base. When an employee is hired, the employee’s level of expertise for each skill is identified.

Define high-level activities

Some of the high-level activities for ACME Corporation are:

♦ Hire employees.

♦ Terminate employees.

♦ Maintain personal employee information.

♦ Maintain information on skills required for the company.

♦ Maintain information on which employees have which skills.

♦ Maintain information on departments.

♦ Maintain information on offices.

Identify the entities and relationships

Identify the entities (subjects) and the relationships (roles) that connect them. Create a diagram based on the description and high-level activities.

Use boxes to show entities and lines to show relationships. Use the two roles to label each relationship. You should also identify those relationships that are one-to-many, one-to-one, and many-to-many using the appropriate annotation.

Following is a rough entity-relationship diagram. It will be refined

throughout the chapter.

(32)

Skill

Employee

Office

Department is acquired by

is capable of

is headed by

manages

contains is a member of

manages reports to works out of

contains

Break down the high-level activities

The following lower-level activities below are based on the high-level activities listed above:

♦ Add or delete an employee.

♦ Add or delete an office.

♦ List employees for a department.

♦ Add a skill to the skill list.

♦ Identify the skills of an employee.

♦ Identify an employee’s skill level for each skill.

♦ Identify all employees that have the same skill level for a particular skill.

♦ Change an employee’s skill level.

These lower-level activities can be used to identify if any new tables or relationships are needed.

Identify business rules Business rules often identify one-to-many, one-to-one, and many-to-many relationships.

The kind of business rules that may be relevant include the following:

♦ There are now five offices; expansion plans allow for a maximum of ten.

♦ Employees can change department or office.

♦ Each department has one department head.

♦ Each office has a maximum of three telephone numbers.

(33)

♦ Each telephone number has one or more extensions.

♦ When an employee is hired, the level of expertise in each of several skills is identified.

♦ Each employee can have from three to twenty skills.

♦ An employee may or may not be assigned to an office.

Step 2:Identify the required data

❖ To identify the required data

1. Identify supporting data.

2. List all the data you need to track.

3. Set up data for each entity.

4. List the available data for each entity. The data that describes an entity (subject) answers the questions who, what, where, when, and why.

5. List any data required for each relationship (verb).

6. List the data, if any, that applies to each relationship.

Identify supporting data The supporting data you identify will become the names of the attributes of the entity. For example, the data below might apply to the Employee entity, the Skill entity, and the Expert In relationship.

Employee Skill Expert In

Employee ID Skill ID Skill level

Employee first name Skill name Date skill was acquired Employee last name Description of skill

Employee department Employee office Employee address

If you make a diagram of this data, it will look something like this picture:

(34)

Employee Employee ID First name Last name Home address

Skill Skill ID Skill name Skill description is capable of

is acquired by

Observe that not all of the attributes you listed appear in this diagram. The missing items fall into two categories:

1. Some are contained implicitly in other relationships; for example, Employee department and Employee office are denoted by the relations to the Department and Office entities, respectively.

2. Others are not present because they are associated not with either of these entities, but rather the relationship between them. The above diagram is inadequate.

The first category of items will fall naturally into place when you draw the entire entity-relationship diagram.

You can add the second category by converting this many-to-many relationship into an entity.

Skill Skill ID Skill name Skill description

is capable of is acquired by

Expert in Skill level Date acquired

The new entity depends on both the Employee and the Skill entities. It

borrows its identifiers from these entities because it depends on both of them.

(35)

Notes ♦ When you are identifying the supporting data, be sure to refer to the activities you identified earlier to see how you will access the data.

For example, you may need to list employees by first name in some situations and by last name in others. To accommodate this requirement, create a First Name attribute and a Last Name attribute, rather than a single attribute that contains both names. With the names separate, you can later create two indexes, one suited to each task.

♦ Choose consistent names. Consistency makes it easier to maintain your database and easier to read reports and output windows.

For example, if you choose to use an abbreviated name such as Emp_status for one attribute, you should not use a full name, such as Employee_ID, for another attribute. Instead, the names should be Emp_status and Emp_ID.

♦ At this stage, it is not crucial that the data be associated with the correct entity. You can use your intuition. In the next section, you’ll apply tests to check your judgment.

Step 3:Normalize the data

Normalization is a series of tests that eliminate redundancy in the data and make sure the data is associated with the correct entity or relationship. There are five tests. This section presents the first three of them. These three tests are the most important and so the most frequently used.

Why normalize?

The goals of normalization are to remove redundancy and to improve consistency. For example, if you store a customer’s address in multiple locations, it is difficult to update all copies correctly when they move.

☞ For more information about the normalization tests, see a book on database design.

Normal forms There are several tests for data normalization. When your data passes the first test, it is considered to be in first normal form. When it passes the second test, it is in second normal form, and when it passes the third test, it is in third normal form.

❖ To normalize data in a database 1. List the data.

♦ Identify at least one key for each entity. Each entity must have an

identifier.

(36)

♦ Identify keys for relationships. The keys for a relationship are the keys from the two entities that it joins.

♦ Check for calculated data in your supporting data list. Calculated data is not normally stored in a relational database.

2. Put data in first normal form.

♦ If an attribute can have several different values for the same entry, remove these repeated values.

♦ Create one or more entities or relationships with the data that you remove.

3. Put data in second normal form.

♦ Identify entities and relationships with more than one key.

♦ Remove data that depends on only one part of the key.

♦ Create one or more entities and relationships with the data that you remove.

4. Put data in third normal form.

♦ Remove data that depends on other data in the entity or relationship, not on the key.

♦ Create one or more entities and relationships with the data that you remove.

Data and identifiers Before you begin to normalize (test your design), simply list the data and identify a unique identifier each table. The identifier can be made up of one piece of data (attribute) or several (a compound identifier).

The identifier is the set of attributes that uniquely identifies each row in an entity. For example, the identifier for the Employee entity is the

Employee ID attribute. The identifier for the Works In relationship consists of the Office Code and Employee ID attributes.

You can make an identifier for each relationship in your database by taking the identifiers from each of the entities that it connects. In the following table, the attributes identified with an asterisk are the identifiers for the entity or relationship.

Entity or Relationship Attributes

Office *Office code

Office address Phone number

Works in *Office code

*Employee ID

(37)

Entity or Relationship Attributes

Department *Department ID

Department name

Heads *Department ID

*Employee ID

Member of *Department ID

*Employee ID

Skill *Skill ID

Skill name Skill description

Expert in *Skill ID

*Employee ID Skill level Date acquired

Employee *Employee ID

last name first name

Social security number Address

phone number date of birth

Putting data in first normal form

♦ To test for first normal form, look for attributes that can have repeating values.

♦ Remove attributes when multiple values can apply to a single item. Move these repeating attributes to a new entity.

In the entity below, Phone number can repeat—an office can have more than

one telephone number.

(38)

Department Employee Management Relationship

Remove the repeating attribute and make a new entity called Telephone. Set up a relationship between Office and Telephone.

Office Office code Office address

Telephone Phone number has

is located at

Putting data in second normal form

♦ Remove data that does not depend on the whole key.

♦ Look only at entities and relationships whose identifier is composed of more than one attribute. To test for second normal form, remove any data that does not depend on the whole identifier. Each attribute should depend on all of the attributes that comprise the identifier.

In this example, the identifier of the Employee and Department entity is

composed of two attributes. Some of the data does not depend on both

identifier attributes; for example, the department name depends on only one

of those attributes, Department ID, and Employee first name depends only

on Employee ID.

(39)

Employee and Department Employee ID

Department ID Employee first name Employee last name Department name

Move the identifier Department ID, which the other employee data does not depend on, to a entity of its own called Department. Also move any attributes that depend on it. Create a relationship between Employee and Department.

Employee

Employee ID Employee first name Employee last name

Department

Department ID Department name contains

works in

Putting data in third normal form

♦ Remove data that doesn’t depend directly on the key.

♦ To test for third normal form, remove any attributes that depend on other attributes, rather than directly on the identifier.

In this example, the Employee and Office entity contains some attributes that depend on its identifier, Employee ID. However, attributes such as Office location and Office phone depend on another attribute, Office code. They do not depend directly on the identifier, Employee ID.

Employee and Office Employee ID

Employee first name Employee last name Office code Office location Office phone

(40)

Remove Office code and those attributes that depend on it. Make another entity called Office. Then, create a relationship that connects Employee with Office.

Employee Employee ID Employee first name Employee last name

Office Office code Office location Office phone houses

works out of

Step 4:Resolve the relationships

When you finish the normalization process, your design is almost complete.

All you need to do is to generate the physical data model that corresponds to your conceptual data model. This process is also known as resolving the relationships, because a large portion of the task involves converting the relationships in the conceptual model into the corresponding tables and foreign-key relationships.

Whereas the conceptual model is largely independent of implementation details, the physical data model is tightly bound to the table structure and options available in a particular database application. In this case, that application is Adaptive Server Anywhere.

Resolving relationships that do not carry data

In order to implement relationships that do not carry data, you define foreign keys. A foreign key is a column or set of columns that contains primary key values from another table. The foreign key allows you to access data from more than one table at one time.

A database design tool such as the DataArchitect component of Sybase PowerDesigner can generate the physical data model for you. However, if you’re doing it yourself there are some basic rules that help you decide where to put the keys.

♦

One to many

An one-to-many relationship always becomes an entity

and a foreign key relationship.

(41)

Employee Employee Number First name Last name Address

Department Department ID Department name contains

is a member of

Notice that entities become tables. Identifiers in entities become (at least part of) the primary key in a table. Attributes become columns. In a one-to-many relationship, the identifier in the one entity will appear as a new foreign key column in the many table.

Employee Employee ID <pk>

Department ID <fk>

First Name Last Name Address

Department Department ID <pk>

Department Name Department ID = Department ID

In this example, the Employee entity becomes an Employee table . Similarly, the Department entity becomes a Department table. A foreign key called Department ID appears in the Employee table.

♦

^{One to one}

In a one-to-one relationship, the foreign key can go into either table. If the relationship is mandatory on one side, but optional on the other, it should go on the optional side. In this example, put the foreign key (Vehicle ID) in the Truck table because a vehicle does not have to be a truck.

Vehicle

Vehicle ID Model Price

Truck

Weight rating is a type of

may be

The above entity-relationship model thus resolves the database base

structure, below.

(42)

Vehicle

Vehicle ID <pk>

Model Price

Truck

Vehicle ID <fk>

Weight rating Vehicle ID = Vehicle ID

♦

Many to many

In a many-to-many relationship, a new table is created with two foreign keys. This arrangement is necessary to make the database efficient.

Parts

Part Number

Description Warehouse

Warehouse ID Address

The new Storage Location table relates the Parts and Warehouse tables.

Parts Part Number <pk>

Description

Warehouse Warehouse ID <pk>

Address Storage Location

Part Number <pk,fk>

Warehouse ID <pk,fk>

Warehouse ID = Warehouse ID Part Number = Part Number

Resolving relationships that carry data

Some of your relationships may carry data. This situation often occurs in many-to-many relationships.

Parts Part Number Description

stored at

Inventory

Quantity

Warehouse Warehouse ID Address contains

If this is the case, each entity resolves to a table. Each role becomes a

foreign key that points to another table.

(43)

Inventory Warehouse ID <pk,fk>

Part Number <pk,fk>

Quantity

The Inventory entity borrows its identifiers from the Parts and Warehouse tables, because it depends on both of them. Once resolved, these borrowed identifiers form the primary key of the Inventory table.

Tip

A conceptual data model simplifies the design process because it hides a lot of details. For example, a many-to-many relationship always generates an extra table and two foreign key references. In a conceptual data model, you can usually denote all of this structure with a single connection.

Step 5:Verify the design

Before you implement your design, you need to make sure that it supports your needs. Examine the activities you identified at the start of the design process and make sure you can access all of the data that the activities require.

♦ Can you find a path to get the information you need?

♦ Does the design meet your needs?

♦ Is all of the required data available?

If you can answer yes to all the questions above, you are ready to implement your design.

Final design Applying steps 1 through 3 to the database for the little company produces

the following entity-relationship diagram. This database is now in third

normal form.

(44)

Skill ID Number Skill name Skill description

Office ID Number Office name Address

Department Department ID Department name is acquired by

is capable of

is headed by

manages

contains is a member of

manages reports to works out of

Expert In Skill Level Date Acquired

houses

The corresponding physical data model appears below.

Skill ID Number <pk>

Skill name Skill description

Employee Employee ID <pk>

ID Number <fk>

Emp_Employee ID <fk>

First name Last name Home address Office

ID Number <pk>

Office name Address

Department Department ID <pk>

Employee ID <fk>

Department name ID Number = ID Number

Employee ID = Employee ID

Employee ID = Employee ID ID Number = ID Number

Expert In ID Number <pk,fk>

Employee ID <pk,fk>

Skill Level Date Acquired

Department/Employee Department ID <pk,fk>

Employee ID <pk,fk>

Department ID = Department ID

Employee ID = Emp_Employee ID