Schema Changes in RDBMS - Schema Migrations 12.1 Schema Changes

Chapter 12. Schema Migrations 12.1 Schema Changes

12.2. Schema Changes in RDBMS

While developing with standard RDBMS technologies, we develop objects, their corresponding tables, and their relationships. Consider a simple object model and data model that has Customer, Order, and OrderItems. The ER model would look like Figure 12.1.

Figure 12.1. Data model of an e-commerce system

While this data model supports the current object model, life is good. The first time there is a change in the object model, such as introducing preferredShippingType on the Customer object, we have to change the object and change the database table, because without changing the table the application will be out of sync with the database. When we get errors like ORA-00942: table or view does not exist or ORA-00904: "PREFERRED_SHIPPING_TYPE": invalid identifier, we know we have this problem.

Typically, a database schema migration has been a project in itself. For deployment of the schema changes, database change scripts are developed, using diff techniques, for all the changes in the development database. This approach of creating migration scripts during the deployment/release time is error-prone and does not support agile development methods.

12.2.1. Migrations for Green Field Projects

Scripting the database schema changes during development is better, since we can store these schema changes along with the data migration scripts in the same script file. These script files should be

named with incrementing sequential numbers which reflect the database versions; for example, the first change to the database could have script file named as 001_Description_Of_Change.sql. Scripting changes this way allows for the database migrations to be run preserving the order of changes. Shown in Figure 12.2 is a folder of all the changes done to a database so far.

Figure 12.2. Sequence of migrations applied to a database

Now, suppose we need to change the OrderItem table to store the DiscountedPrice and the FullPrice of the item. This will need a change to the OrderItem table and will be change number 007 in our sequence of changes, as shown in Figure 12.3.

Figure 12.3. New change 007_DiscountedPrice.sql applied to the database We applied a new change to the database. This change’s script has the code for adding a new column, renaming the existing column, and migrating the data needed to make the new feature work. Shown below is the script contained in the change 007_DiscountedPrice.sql:

Click here to view code image

ALTER TABLE orderitem ADD discountedprice NUMBER(18,2) NULL; UPDATE orderitem SET discountedprice = price;

ALTER TABLE orderitem MODIFY discountedprice NOT NULL; ALTER TABLE orderitem RENAME COLUMN price TO fullprice; --//@UNDO

ALTER TABLE orderitem RENAME fullprice TO price; ALTER TABLE orderitem DROP COLUMN discountedprice;

The change script shows the schema changes to the database as well as the data migrations needed to be done. In the example shown, we are using DBDeploy [DBDeploy] as the framework to manage the changes to the database. DBDeploy maintains a table in the database, named ChangeLog, where all the changes made to the database are stored. In this table, Change_Number is what tells everyone which changes have been applied to the database. This Change_Number, which is the database version, is then used to find the corresponding numbered script in the folder and apply the changes

which have not been applied yet. When we write a script with the change number 007 and apply it to the database using DBDeploy, DBDeploy will check the ChangeLog and pick up all the scripts from the folder that have not yet been applied. Figure 12.4 is the screenshot of DBDeploy applying the change to the database.

Figure 12.4. DBDeploy upgrading the database with change number 007

The best way to integrate with the rest of the developers is to use your project’s version control repository to store all these change scripts, so that you can keep track of the version of the software and the database in the same place, eliminating possible mismatches between the database and the application. There are many other tools for such upgrades, including Liquibase [Liquibase], MyBatis Migrator [MyBatis Migrator], DBMaintain [DBMaintain].

12.2.2. Migrations in Legacy Projects

Not every project is a green field. How to implement migrations when an existing application is in production? We found that taking an existing database and extracting its structure into scripts, along with all the database code and any reference data, works as a baseline for the project. This baseline should not contain transactional data. Once the baseline is ready, further changes can be done using the migrations technique described above (Figure 12.5).

Figure 12.5. Use of baseline scripts with a legacy database

One of the main aspects of migrations should be maintaining backward compatibility of the database schema. In many enterprises there are multiple applications using the database; when we change the database for one application, this change should not break other applications. We can achieve backward compatibility by maintaining a transition phase for the change, as described in detail in Refactoring Databases [Ambler and Sadalage].

During a transition phase, the old schema and the new schema are maintained in parallel and are available for all the applications using the database. For this, we have to introduce scaffolding code, such as triggers, views, and virtual columns ensuring other applications can access the database schema and the data they require without any code changes.

Click here to view code image

ALTER TABLE customer ADD fullname VARCHAR2(60); UPDATE customer SET fullname = fname;

CREATE OR REPLACE TRIGGER SyncCustomerFullName BEFORE INSERT OR UPDATE

ON customer

REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW

BEGIN

IF :NEW.fname IS NULL THEN :NEW.fname := :NEW.fullname; END IF;

IF :NEW.fullname IS NULL THEN :NEW.fullname := :NEW.fname END IF;

END; /

--Drop Trigger and fname

--when all applications start using customer.fullname

In the example, we are trying to rename the customer.fname column to customer.fullname as we want to avoid any ambiguity of fname meaning either fullname or firstname. A direct rename of the fname column and changing the application code we are responsible for may just work, for our application—but will not for the other applications in the enterprise that are accessing the same database.

Using the transition phase technique, we introduce the new column fullname, copy the data over to fullname, but leave the old column fname around. We also introduce a BEFORE UPDATE trigger to synchronize data between the columns before they are committed to the database.

Now, when applications read data from the table, they will read either from fname or from

fullname but will always get the right data. We can drop the trigger and the fname column once all the applications have moved on to using the new fullname column.

It’s very hard to do schema migrations on large datasets in RDBMS, especially if we have to keep the database available to the applications, as large data movements and structural changes usually create locks on the database tables.

In document NoSQL Distilled.pdf (Page 114-118)