Without rewriting the book on database refactoring, perhaps it would be best to explain the concept of refactoring with an example. Suppose our FlixBuster data warehouse has been in production for some time and the data model includes a conformed product dimension that is keyed on prod- uct ID and is used by several fact tables. However, in the current develop- ment cycle we are integrating a new syndicated data source that contains rich information about product purchases but is at the product subcategory grain rather than the more detailed individual product grain. Because we will be building a new fact table that uses the grain of product subcategory, we decide to split the product dimension into two separate dimensions, Dim_ Product and Dim_ProductSubcategory. The new dimension can be used as a star schema for detailed product facts or linked directly to the more coarse- grained subcategory facts.
Table 6.1 Database Refactoring Categories Database Refactoring
Category Description Example(s)
Structural A change to the definition of one or
more tables or views
Moving a column from one table to another, or splitting a multipurpose column into separate columns
Data quality A change that improves the quality
of information
Making a column non-nullable or applying a common format to a column
Referential integrity A change that ensures that a
referenced row exists within another table and/or that a row that is no longer needed is removed appropriately
Adding a trigger to enable a cascad- ing delete between two entities, code that was formerly implemented outside the database
Architectural A change that improves the overall
manner in which external programs interact with a database
Replacing an existing Java database operation in a code library with a stored procedure in the database to make it available to non-Java applications
Method A change to a stored procedure, ETL
object, stored function, or trigger that improves its quality
Renaming a stored procedure to make it easier to understand
Non-refactoring transformation
A change to a database schema that changes its semantics
Adding a new column to an existing table
ptg6843605
Because the data warehouse has been in production for some time, and
Dim_Product contains historical data, we’ll need to be disciplined in how we split this dimension table to ensure that all existing BI applications continue working properly. That means that we’ll need to introduce the new dimen- sion table but establish a transition period during which the original table continues to include subcategory information. This transition period will allow us to carefully refactor the BI applications to use the new subcategory dimension wherever appropriate. We’ll use the Split Table refactoring for this task (Ambler and Sadalage 2006).
During the transition period we will have some intentional data duplica- tion. To avoid possible inconsistencies between these dimensions we will outfit each table with a trigger to synchronize data across tables in the event of an insertion, deletion, or update. We must create these triggers so that cycles between the two do not occur.
While we don’t want the transition period to linger too long, we also don’t want to deprecate the old schema too early and risk breaking any BI applica- tions. After careful consideration we’ve determined that February 1, 2013, is an appropriate date by which to complete the schema transition. To imple- ment this we’ll schedule the necessary ALTER TABLE statements to run on that date. This step in the refactoring automates the necessary housekeeping that we might otherwise forget to do. Figure 6.2 depicts the schema before, dur- ing, and after our transition period.
Dim_Product Dim_Product 1..* 1 1..* 1 Dim_ProductSubcategory Dim_ProductSubcategory Dim_Product -productID <<PK>> -productCategory Original Schema Transition Period Resulting Schema -productSubcategory -productName -productFeature -productBasePrice -productID <<PK>> -productID <<PK>>
TIME -subcategoryID <<PK>>-subcategoryName
-subcategoryCategory -subcategoryID <<PK>> -subcategoryName -subcategoryCategory SynchronizeWithProduct -productCategory -productSubcategory -productName -productName -productFeature -productFeature -productBasePrice -productBasePrice SynchronizeWithSubcategory
{ event = update | delete | insert, drop date = February 1, 2013 }
{ event = update | delete | insert, drop date = February 1, 2013 }
ptg6843605
REFACTORING 161
The code for this refactoring looks like this:
CREATE TABLE Dim_ProductSubcategory (
subcategoryID VARCHAR(15)NOT NULL, subcategoryName VARCHAR(20)NOT NULL, subcategoryCategory VARCHAR(15)NOT NULL, CONSTRAINT PKSubcategoryID
PRIMARY KEY (subcategoryID) );
-Trigger to keep all split tables in sync
CREATE OR REPLACE TRIGGER SynchronizeWithSubcategory BEFORE INSERT OR UPDATE
ON Dim_Product
REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW
DECLARE BEGIN IF updating THEN FindOrCreateSubcategory; END IF; IF inserting THEN CreateSubcategory; END IF; END; /
CREATE OR REPLACE TRIGGER SynchronizeWithProduct BEFORE UPDATE OF productSubcategory
ON Dim_ProductSubcategory
REFERENCING OLD AS OLD NEW AS NEW FOR EACH ROW
DECLARE BEGIN IF updating THEN FindAndUpdateAllProductsForSubcategory END IF; END; / -On February 1, 2013
ALTER TABLE Dim_Product DROP COLUMN productCategory; ALTER TABLE Dim_Product DROP COLUMN productSubcategory; DROP TRIGGER SynchronizeWithSubcategory
DROP TRIGGER SynchronizeWithProduct
But we aren’t going to write all that code at one time. Before we even start this refactoring, we’ll review the existing test suite to be sure the original schema is well covered by automated tests. We’ll take the time to add any new tests we wish we had, and we’ll ensure that all the tests are passing.
ptg6843605
Then, before we begin writing the code, we’ll write new tests to test the code we’re about to write.
For example, we’ll start by writing some structural tests to verify that the
Dim_ProductSubcategory table exists and contains the expected columns and constraints. These tests will fail until we write the CREATE TABLE query correctly. As we proceed in tiny steps, writing tests along the way, we’ll also continue to rerun the old tests to be sure we haven’t broken anything. By the time we’re finished, we will have a new table, new triggers, and a suite of new tests to validate our work.
Structural database refactorings typically follow a pattern that includes a transition period between the old schema and the new one. These buffers coupled with automated tests are essential ingredients of refactoring safety. Additionally, all of the database code and related artifacts should be man- aged in a version control system, making it easy to roll back to an earlier version if things go horribly wrong. We’ll discuss that in detail in Chapter 8, “Version Control for Data Warehousing.”
Many other refactoring situations and corresponding database refactorings are presented in Refactoring Databases (Ambler and Sadalage 2006), and future books may introduce even more. However, this example provides a glimpse of what database refactoring looks like and should convey the idea that refactoring is more than simply restructuring or improving database elements.