• No results found

Does our SCD work?

In document DBA 3: Creating a Data Warehouse (Page 98-103)

At this po int, we need to test o ur sub jo b to see if dimCustomer is receiving data. We don't need to run our entire jo b tho ugh, because we're really o nly interested in dimCustomer. Fortunately TOS lets us disable sub jo bs.

To do that, right click o n the t MysqlInput co mpo nent fo r the dim Mo vie jo b, then cho o se De act ivat e curre nt sub jo b.

The sub jo b fo r dim Mo vie is no w greyed-o ut and disabled.

No w run yo ur jo b by clicking o n the . As lo ng as yo u've typed everything co rrectly, yo ur jo b will run and yo u'll see the fo llo wing o utput:

OBSERVE:

Starting job ProcessDataWarehouse at 21:08 30/06/2009.

Job ProcessDataWarehouse ended at 21:08 30/06/2009. [exit code=0]

The jo b ran, but did it po pulate the dimCustomer table? Switch to a terminal, and log into your personal database. Run the fo llo wing co mmand against yo ur perso nal database:

CODE TO TYPE:

mysql> SELECT count(*) FROM dimCustomer; If yo ur jo b ran successfully, yo u will see this:

OBSERVE:

mysql> SELECT count(*) FROM dimCustomer; +---+

| count(*) | +---+ | 589 | +---+

1 row in set (0.00 sec)

To be sure everything wo rked, take a lo o k at so me ro ws. Run the fo llo wing co mmand against yo ur perso nal database:

CODE TO TYPE:

mysql> SELECT * FROM dimCustomer LIMIT 0, 10;

OBSERVE:

mysql> SELECT * FROM dimCustomer -> LIMIT 0, 10;

+---+---+---+---+--- ---+---+---+---+--- ---+---+---+---+---+--- ---+---+---+---+

| customer_key | customer_id | first_name | last_name | email | address | address2 | district | city | country | postal_code | phone | active | create_date | start_date | end_date | run_id |

+---+---+---+---+--- ---+---+---+---+--- ---+---+---+---+---+--- ---+---+---+---+

| 1 | 218 | VERA | MCCOY | VERA.MCCOY@sakilacustome r.org | 1168 Najafabad Parkway | | KABOL | Kabul | Afghanistan | 40301 | 886649065861 | 1 | 2004-03-19 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 2 | 441 | MARIO | CHEATHAM | MARIO.CHEATHAM@sakilacus tomer.org | 1924 Shimonoseki Drive | | BATNA | Batna | Algeria | 52625 | 406784385440 | 1 | 2004-10-07 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 3 | 69 | JUDY | GRAY | JUDY.GRAY@sakilacustomer .org | 1031 Daugavpils Parkway | | BCHAR | Bchar | Algeria | 59025 | 107137400143 | 1 | 2004-02-25 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 4 | 176 | JUNE | CARROLL | JUNE.CARROLL@sakilacusto mer.org | 757 Rustenburg Avenue | | SKIKDA | Skikda | Algeria | 89668 | 506134035434 | 1 | 2004-08-11 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 5 | 320 | ANTHONY | SCHWAB | ANTHONY.SCHWAB@sakilacus tomer.org | 1892 Nabereznyje Telny Lane | | TUTUILA | Tafuna | American Samoa | 28396 | 478229987054 | 1 | 2004-07-20 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 6 | 528 | CLAUDE | HERZOG | CLAUDE.HERZOG@sakilacust omer.org | 486 Ondo Parkway | | BENGUELA | Benguela | Angola | 35202 | 105882218332 | 1 | 2004-01-24 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 7 | 383 | MARTIN | BALES | MARTIN.BALES@sakilacusto mer.org | 368 Hunuco Boulevard | | NAMIBE | Namibe | Angola | 17165 | 106439158941 | 1 | 2004-05-31 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 8 | 381 | BOBBY | BOUDREAU | BOBBY.BOUDREAU@sakilacus tomer.org | 1368 Maracabo Boulevard | | | South Hi ll | Anguilla | 32716 | 934352415130 | 1 | 2004-08-29 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 9 | 359 | WILLIE | MARKHAM | WILLIE.MARKHAM@sakilacus tomer.org | 1623 Kingstown Drive | | BUENOS AIRES | Almirant e Brown | Argentina | 91299 | 296394569728 | 1 | 2004-08-13 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 |

| 10 | 560 | JORDAN | ARCHULETA | JORDAN.ARCHULETA@sakilac ustomer.org | 1229 Varanasi (Benares) Manor | | BUENOS AIRES | Avellane da | Argentina | 40195 | 817740355461 | 1 | 2004-01-15 00:0 0:00 | 2009-06-30 | 2099-01-01 | 81 | +---+---+---+---+--- ---+---+---+---+--- ---+---+---+---+---+--- ---+---+---+---+

10 rows in set (0.01 sec)

If yo u scro ll to the right, yo u might no tice so mething a bit strange. Check o ut the create_date and start_date (co pied belo w):

OBSERVE:

+---+---+---+---+---+ | active | create_date | start_date | end_date | run_id | +---+---+---+---+---+ | 1 | 2004-03-19 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-10-07 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-02-25 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-08-11 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-07-20 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-01-24 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-05-31 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-08-29 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-08-13 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | | 1 | 2004-01-15 00:00:00 | 2009-06-30 | 2099-01-01 | 81 | +---+---+---+---+---+

The custo mer's reco rd was created back o n March 19 t h, 20 0 4 , but the ro w in the dimensio n has a start_date o f to day (20 0 9 -0 6 -30).

The start_date and end_date columns on this Type 2 SCD indicate that "t his ro w is valid and co rrect be t we e n t he dat e s o f J une 30 t h 20 0 9 and J anuary 1st , 20 9 9."

The custo mer with customer_id=218 existed on January 1st , 2007..., but that's not what the row tells us no w.

Yo u may recall that earlier in the lesso n, we set the pro perties fo r the SCD co mpo nent. Specifically, we renamed the start co lumn to st art _dat e , and set its creatio n to J o b st art t im e .

The pro blem here has to do with histo rical data. The very first time we setup o ur dimensio n, the first valid date fo r the ro w must be start_date, not today's date. Graphically, the time line after the initial load looks like this:

Fo rtunately, o ur so urce system co ntains the date co mmand we need: create_date.

To execute this co mmand, we co uld just switch to SQL mo de and run an update statement. This isn't an ideal so lutio n tho ugh, because we might fo rget to include this step the next time we have to relo ad the entire dimensio n.

TOS has a co mpo nent that will let us execute a single query: t MysqlRo w. Drag that co mpo nent to yo ur canvas, and name the new sub jo b On Init ial Lo ad Only. Set its co nnectio n to the data wareho use co nnectio n fro m the repo sito ry, and set its co mmand to this (make sure the quo tatio n marks aro und the query are co rrect!):

CODE TO TYPE:

UPDATE dimCustomer SET start_date = create_date;

Next, right-click o n t MysqlSCD, and link the On Co m po ne nt OK trigger to t MysqlRo w:

After o ur initial lo ad, we'll disable this sub jo b.

So , what if yo ur data so urce do esn't have a valid start_date? In that case, you'll have to figure out the earliest date where valid data is present in yo ur dimensio n, perhaps J anuary 1st , 20 0 0 . Generally, yo u'll use J o b St art t im e as yo ur start_date, but after the first load of your dimension, you'll have to update the start_date o f every ro w in yo ur table to J anuary 1st , 20 0 0 . Here's an example o f a query to update a dimensio n witho ut valid start date:

OBSERVE:

UPDATE dimCustomer SET start_date='2000-01-01';

Picking the earliest valid date fo r yo ur dimensio n is similar to picking a high end_date of January 1st , 2099. Befo re we rerun o ur jo b, we sho uld co mpletely clear o ur dimensio n. To do this, we'll use the SQL keywo rd TRUNCATE. Run these co mmands against yo ur perso nal database:

CODE TO TYPE:

mysql> SET FOREIGN_KEY_CHECKS = 0; mysql> TRUNCATE TABLE dimCustomer; mysql> SET FOREIGN_KEY_CHECKS = 1;

Note

We can't truncate dimCusto mer witho ut setting FOREIGN_KEY_CHECKS to 0 because there areseveral fact tables that have fo reign key co nstraints o n a co lumn in dimCusto mer.

In document DBA 3: Creating a Data Warehouse (Page 98-103)