Creating and Managing Tables

=> SHOW search_path;

Dropping a Schema

Use the DROP SCHEMAcommand to drop (delete) a schema. For example:

=> DROP SCHEMA myschema;

By default, the schema must be empty before you can drop it. To drop a schema and all of its objects (tables, data, functions, and so on) use:

=> DROP SCHEMA myschema CASCADE;

System Schemas

The following system-level schemas exist in every database:

• pg_catalog contains the system catalog tables, built-in data types, functions, and operators. It is always part of the schema search path, even if it is not explicitly named in the search path.

• information_schema consists of a standardized set of views that contain information about the objects in the database. These views get system information from the system catalog tables in a standardized way.

• pg_toast stores large objects such as records that exceed the page size. This schema is used internally by the Greenplum Database system.

• pg_bitmapindex stores bitmap index objects such as lists of values. This schema is used internally by the Greenplum Database system.

• pg_aoseg stores append-optimized table objects. This schema is used internally by the Greenplum Database system.

• gp_toolkit is an administrative schema that contains external tables, views, and functions that you can access with SQL commands. All database users can access gp_toolkit to view and query the system log files and other system metrics.

Greenplum Database tables are similar to tables in any relational database, except that table rows are distributed across the different segments in the system. When you create a table, you specify the table’s distribution policy.

Creating a Table

The CREATE TABLE command creates a table and defines its structure. When you create a table, you define:

• The columns of the table and their associated data types. See “Choosing Column Data Types” on page 131.

• Any table or column constraints to limit the data that a column or table can contain. See “Setting Table and Column Constraints” on page 131.

Creating and Managing Tables 131

• The distribution policy of the table, which determines how Greenplum divides data is across the segments. See “Choosing the Table Distribution Policy” on page 132.

• The way the table is stored on disk. See “Choosing the Table Storage Model” on page 133.

• The table partitioning strategy for large tables. See “Partitioning Large Tables” on page 145.

Choosing Column Data Types

The data type of a column determines the types of data values the column can contain.

Choose the data type that uses the least possible space but can still accommodate your data and that best constrains the data. For example, use character data types for strings, date or timestamp data types for dates, and numeric data types for numbers.

There are no performance differences among the character data types CHAR, VARCHAR, and TEXT apart from the increased storage size when you use the blank-padded type.

In most situations, use TEXT or VARCHAR rather than CHAR.

Use the smallest numeric data type that will accomodate your numeric data and allow for future expansion. For example, using BIGINT for data that fits in INT or SMALLINT wastes storage space. If you expect that your data values will expand over time, consider that changing from a smaller datatype to a larger datatype after loading large amounts of data is costly. For example, if your current data values fit in a SMALLINT but it is likely that the values will expand, INT is the better long-term choice.

Use the same data types for columns that you plan to use in cross-table joins.

Cross-table joins usually use the primary key in one table and a foreign key in the other table. When the data types are different, the database must convert one of them so that the data values can be compared correctly, which adds unnecessary overhead.

Greenplum Database has a rich set of native data types available to users. See the Greenplum Database Reference Guide for information about the built-in data types.

Setting Table and Column Constraints

You can define constraints on columns and tables to restrict the data in your tables.

Greenplum Database support for constraints is the same as PostgreSQL with some limitations, including:

• CHECK constraints can refer only to the table on which they are defined.

• UNIQUE and PRIMARY KEY constraints must be compatible with their tableʼs distribution key and partitioning key, if any.

• FOREIGN KEY constraints are allowed, but not enforced.

• Constraints that you define on partitioned tables apply to the partitioned table as a whole. You cannot define constraints on the individual parts of the table.

Check Constraints

Check constraints allow you to specify that the value in a certain column must satisfy a Boolean (truth-value) expression. For example, to require positive product prices:

=> CREATE TABLE products

( product_no integer,

Creating and Managing Tables 132 name text,

price numeric CHECK (price > 0) );

Not-Null Constraints

Not-null constraints specify that a column must not assume the null value. A not-null constraint is always written as a column constraint. For example:

=> CREATE TABLE products

( product_no integer NOT NULL, name text NOT NULL,

price numeric );

Unique Constraints

Unique constraints ensure that the data contained in a column or a group of columns is unique with respect to all the rows in the table. The table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the constraint columns must be the same as (or a superset of) the table’s distribution key columns. For example:

=> CREATE TABLE products

( product_no integer UNIQUE, name text,

price numeric)

DISTRIBUTED BY (product_no);

Primary Keys

A primary key constraint is a combination of a UNIQUE constraint and a NOT NULL constraint. The table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the primary key columns must be the same as (or a superset of) the table’s distribution key columns. If a table has a primary key, this column (or group of columns) is chosen as the distribution key for the table by default. For example:

=> CREATE TABLE products

( product_no integer PRIMARY KEY, name text,

price numeric)

DISTRIBUTED BY (product_no);

Foreign Keys

Foreign keys are not supported. You can declare them, but referential integrity is not enforced.

Foreign key constraints specify that the values in a column or a group of columns must match the values appearing in some row of another table to maintain referential integrity between two related tables. Referential integrity checks cannot be enforced between the distributed table segments of a Greenplum database.

Choosing the Table Distribution Policy

All Greenplum Database tables are distributed. When you create or alter a table, you optionally specify DISTRIBUTED BY (hash distribution) or DISTRIBUTED RANDOMLY (round-robin distribution) to determine the table row distribution.

Consider the following points when deciding on a table distribution policy.

In document PRODUCT DOCUMENTATION. Greenplum Database. Version 4.3. Administrator Guide. Rev: A Pivotal Software, Inc. (Page 139-142)