• No results found

Viewing Data

In document DBA 3: Creating a Data Warehouse (Page 153-160)

In the first lesso n, we discussed the go als o f a data wareho use. We want to create: a separate system that wo n't interrupt business critical o peratio nal systems. a single po int o f access fo r all analytical queries.

a unified, co nsistent view o f underlying data (even data fro m external systems).

a straightfo rward way to analyze trends (to see the way sales co mpare fro m mo nth to mo nth).

Our current structure o f dim e nsio ns and f act s is co nsistent and straightfo rward, but we can do better. Co lumns like run_id are no t impo rtant to end users, and may even be co nfusing to them. Tables like etlRuns do n't matter to anyo ne except database administrato rs, so they sho uld be hidden fro m end users.

Ease o f use aside, views also pro vide o ther benefits:

So me info rmatio n in the data wareho use may be very sensitive, so views can be used to pro vide ro w level security.

Views can be used to pro vide co nsistency to the data wareho use, especially as underlying tables gro w in co mplexity and undergo changes.

Co lumns can be renamed to make things easier to understand. Fo r o ur views, we will:

name them with a co mmo n prefix - Fact _ fo r fact tables, and Dim e nsio n_ fo r dimensio n tables. o mit surro gate keys fro m fact tables (such as sales_key).

o mit start_date and end_date from Type-2 slowly changing dimensions. o mit run_id from all tables.

keep fo reign key co lumns (the _key columns) unchanged. keep keys fro m so urce systems (like customer_id) unchanged.

rename fact and dimensio n co lumns to mo re readable equivalents (fo r example, customer_count would beco me Customer Count).

Let's get started and create a view fo r o ur factCustomerCount table. In this view we will omit the customerCount_key and run_id. Switch to MySQL mode, and run this command against your personal database::

CODE TO TYPE:

CREATE VIEW Fact_CustomerCount AS

SELECT date_key, customer_key, store_key, customer_count as `Customer Count` FROM factCustomerCount;

OBSERVE:

mysql> CREATE VIEW Fact_CustomerCount -> AS

-> SELECT date_key, customer_key, store_key, customer_count as "Customer Count" -> FROM factCustomerCount;

Query OK, 0 rows affected (0.06 sec) mysql>

It all lo o ks go o d so far. To test o ut this query, let's check o ut the first ten ro ws. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

SELECT * from Fact_CustomerCount LIMIT 0, 10;

This is easier fo r end users to understand: OBSERVE:

mysql> SELECT * from Fact_CustomerCount -> LIMIT 0, 10;

+---+---+---+---+

| date_key | customer_key | store_key | Customer Count | +---+---+---+---+ | 20040319 | 590 | 1 | 1 | | 20041007 | 591 | 1 | 1 | | 20040811 | 593 | 1 | 1 | | 20040124 | 595 | 1 | 1 | | 20040531 | 596 | 1 | 1 | | 20040115 | 599 | 1 | 1 | | 20040205 | 600 | 1 | 1 | | 20040921 | 602 | 1 | 1 | | 20040427 | 604 | 1 | 1 | | 20041114 | 605 | 1 | 1 | +---+---+---+---+ 10 rows in set (0.08 sec)

Next, let's create a view fo r factSales. In this view, we will omit the sales_key and run_id columns. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

CREATE VIEW Fact_Sales AS

SELECT date_key, customer_key, movie_key, store_key, staff_key, sales_amount as `Sales Amount`

FROM factSales;

If yo u typed everything co rrectly, yo u'll see this familiar MySQL result: Query OK, 0 rows affected (0.13 sec). With a co uple o f f act s o ut o f the way, let's turn o ur attentio n to dim e nsio ns. Start with dimCustomer. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

CREATE VIEW Dimension_Customer AS

SELECT customer_key, customer_id, first_name as `First Name`, last_name as `Last Name`, Email, Address, address2 as "Address 2",

District, City, Country, postal_code as `Postal Code`, Phone, Active, create_date as `Create Date`

FROM dimCustomer;

Great! Let's mo ve o n to dimDate. Run this command against your personal database: CODE TO TYPE:

CREATE VIEW Dimension_Date AS

SELECT date_key, Date, Year, Quarter, Month, month_name as `Month Name`, Day, day_name as `Day of Week`, week as `Week In Year`,

is_weekend as `Is Weekend`, is_holiday as `Is Holiday` FROM dimDate;

The next view we'll create is fo r dimMovie. Run this command against your personal database: CODE TO TYPE:

CREATE VIEW Dimension_Movie AS

SELECT movie_key, film_id, Title, Description, release_year as `Release Year`, Language, original_language as `Original Language`,

rental_duration as `Rental Duration`, Length, Rating, special_features as `Special Features` FROM dimMovie;

The last view we'll create fo r no w is fo r the dimStore table. Run this command against your personal database: CODE TO TYPE:

CREATE VIEW Dimension_Store AS

SELECT store_key, store_id, Address, address2 as `Address 2`, District, City, Country, postal_code as `Postal Code`, Region,

manager_first_name as `Manger First Name`, manager_last_name as `Manager Last Name` FROM dimStore;

Great ! We're ready to get started with o ur queries!

Answering Questions

We'll use a standard template fo r querying the data wareho use. In the text belo w, blue signifies f act s, and re d

signifies dim e nsio ns. It isn't necessary to use all parts o f o ur template, especially if we aren't interested in limiting o ur query using a WHERE clause.

OBSERVE:

SELECT columns from dimension tables, SUM( fact columns )

FROM fact view

INNER JOIN dimension view 1 on (fact column = dimension column ) INNER JOIN dimension view 2 on (fact column = dimension column ) .

.

WHERE Limits to dimensions

limits to facts

GROUP BY dimension columns

ORDER BY dimension columns, fact columns LIMIT 0, 5 (optional "top 5" results)

In lesso n two we discussed questio ns that wo uld be po sed by management. We rewro te these questio ns so that they were in the fo rmat o f f act s and dim e nsio ns. No w let's try to answer so me o f them!

First up: Ho w m any ne w cust o m e rs did we add by quart e r?

To answer this questio n, we'll need data fro m the Fact _Cust o m e rCo unt and Dim e nsio n_Dat e tables. Let's write a query using the template we already created. Altho ugh o ur questio n do es no t specify a particular so rting o rder, we'll so rt by Quart e r. Run this co mmand against yo ur perso nal database:

CODE TO TYPE: SELECT dd.Quarter,

SUM( `Customer Count` ) as `Customer Count` FROM Fact_CustomerCount fc

INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) GROUP BY dd.Quarter

ORDER BY dd.Quarter; MySQL will reply with yo ur answer:

OBSERVE:

mysql> SELECT dd.Quarter,

-> SUM( `Customer Count` ) as `Customer Count` -> FROM Fact_CustomerCount fc

-> INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) -> GROUP BY dd.Quarter

-> ORDER BY dd.Quarter;

+---+---+

| Quarter | Customer Count | +---+---+ | Q1 | 158 | | Q2 | 140 | | Q3 | 143 | | Q4 | 148 | +---+---+ 4 rows in set (0.06 sec) mysql>

That's pretty slick! We didn't even have to figure o ut the specific quarter each custo mer registered.

Note

If yo u see an erro r that lo o ks like this: ERROR 130 5 (4 20 0 0 ): FUNCT ION ce rt jo sh.SUM do e s no t e xist , make sure yo u do no t have any spaces between SUM and (. SUM(column). This works in MySQL, but SUM (column) will return an error.

No w suppo se we want to extend this query, so it answers these questio ns: Ho w m any ne w cust o m e rs did we add

perso nal database: CODE TO TYPE:

SELECT dd.Quarter, dd.`Month Name`,

SUM( `Customer Count` ) as `Customer Count` FROM Fact_CustomerCount fc

INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) GROUP BY dd.Quarter, dd.`Month Name`

ORDER BY dd.Quarter, dd.`Month Name`;

It lo o ks like the database gave us exactly what we asked fo r, even if it isn't exactly what we wanted: OBSERVE:

mysql> SELECT dd.Quarter, dd.`Month Name`,

-> SUM( `Customer Count` ) as `Customer Count` -> FROM Fact_CustomerCount fc

-> INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) -> GROUP BY dd.Quarter, dd.`Month Name`

-> ORDER BY dd.Quarter, dd.`Month Name`;

+---+---+---+

| Quarter | Month Name | Customer Count | +---+---+---+ | Q1 | February | 47 | | Q1 | January | 54 | | Q1 | March | 57 | | Q2 | April | 44 | | Q2 | June | 51 | | Q2 | May | 45 | | Q3 | August | 51 | | Q3 | July | 46 | | Q3 | September | 46 | | Q4 | December | 49 | | Q4 | November | 51 | | Q4 | October | 48 | +---+---+---+ 12 rows in set (0.06 sec)

Let's try o rdering the mo nths by number instead o f by name, so January wo uld o ccur befo re February in o ur results. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

SELECT dd.Quarter, dd.`Month Name`,

SUM( `Customer Count` ) as `Customer Count` FROM Fact_CustomerCount fc

INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) GROUP BY dd.Quarter, dd.`Month Name`

ORDER BY dd.Quarter, dd.`Month`; No w that's mo re like it!

OBSERVE:

mysql> SELECT dd.Quarter, dd.`Month Name`,

-> SUM( `Customer Count` ) as `Customer Count` -> FROM Fact_CustomerCount fc

-> INNER JOIN Dimension_Date dd on (fc.date_key = dd.date_key) -> GROUP BY dd.Quarter, dd.`Month Name`

-> ORDER BY dd.Quarter, dd.`Month`;

+---+---+---+

| Quarter | Month Name | Customer Count | +---+---+---+ | Q1 | January | 54 | | Q1 | February | 47 | | Q1 | March | 57 | | Q2 | April | 44 | | Q2 | May | 45 | | Q2 | June | 51 | | Q3 | July | 46 | | Q3 | August | 51 | | Q3 | September | 46 | | Q4 | October | 48 | | Q4 | November | 51 | | Q4 | December | 49 | +---+---+---+

Okay, let's mo ve o n to a new questio n: What was t he am o unt o f sale s re ve nue we e arne d, by st o re and by by m o nt h? To answer these questio ns, we'll need to use the Fact _Sale s, Dim e nsio n_St o re, and Dim e nsio n_Dat e

views. This time we will try an alternate ORDER BY syntax; we'll specify the co lumns by position instead o f by name. Fo r this query, 1 is the first co lumn, ds.Addre ss, and 2 is the seco nd co lumn, ds.`Mo nt h Nam e `. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

SELECT ds.Address, dd.`Month Name`, SUM( `Sales Amount` ) as `Sales Amount` FROM Fact_Sales fs

INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) GROUP BY 1, 2

ORDER BY ds.Address, dd.`Month`;

OBSERVE:

mysql> SELECT ds.Address, dd.`Month Name`, -> SUM( `Sales Amount` ) as `Sales Amount` -> FROM Fact_Sales fs

-> INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) -> INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) -> GROUP BY ds.Address, dd.`Month Name`

-> ORDER BY 1, 2;

+---+---+---+ | Address | Month Name | Sales Amount | +---+---+---+ | 28 MySQL Boulevard | February | 270.09 | | 28 MySQL Boulevard | May | 2328.30 | | 28 MySQL Boulevard | June | 4829.30 | | 28 MySQL Boulevard | July | 13873.70 | | 28 MySQL Boulevard | August | 11910.70 | | 47 MySakila Drive | January | 999.99 | | 47 MySakila Drive | February | 238.11 | | 47 MySakila Drive | May | 2418.35 | | 47 MySakila Drive | June | 4640.01 | | 47 MySakila Drive | July | 14020.33 | | 47 MySakila Drive | August | 11740.45 | +---+---+---+ 11 rows in set (0.70 sec)

Note

The sakila database is a rando m set o f data. That's the reaso n there were no sales fo r "47 MySakilaDrive" in March. No w suppo se we want to find o ut the t o p f ive sale s, by sto re and by mo nth. Let's give it a try! Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

SELECT ds.Address, dd.`Month Name`, SUM( `Sales Amount` ) as `Sales Amount` FROM Fact_Sales fs

INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) GROUP BY ds.Address, dd.`Month Name`

ORDER BY 1, 2 LIMIT 0, 5;

OBSERVE:

mysql> SELECT ds.Address, dd.`Month Name`, -> SUM( `Sales Amount` ) as `Sales Amount` -> FROM Fact_Sales fs

-> INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) -> INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) -> GROUP BY ds.Address, dd.`Month Name`

-> ORDER BY 1, 2 -> LIMIT 0, 5;

+---+---+---+ | Address | Month Name | Sales Amount | +---+---+---+ | 28 MySQL Boulevard | August | 11910.70 | | 28 MySQL Boulevard | February | 270.09 | | 28 MySQL Boulevard | July | 13873.70 | | 28 MySQL Boulevard | June | 4829.30 | | 28 MySQL Boulevard | May | 2328.30 | +---+---+---+ 5 rows in set (0.39 sec)

We retrieved the to p five results, but we didn't o rder by Sale s Am o unt , and then in descending o rder fro m there. Run this co mmand against yo ur perso nal database:

CODE TO TYPE:

SELECT ds.Address, dd.`Month Name`, SUM( `Sales Amount` ) as `Sales Amount` FROM Fact_Sales fs

INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) GROUP BY ds.Address, dd.`Month Name`

ORDER BY 3 DESC, 1, 2 LIMIT 0, 5;

The results lo o k much better no w: OBSERVE:

mysql> SELECT ds.Address, dd.`Month Name`, -> SUM( `Sales Amount` ) as `Sales Amount` -> FROM Fact_Sales fs

-> INNER JOIN Dimension_Store ds on (fs.store_key = ds.store_key) -> INNER JOIN Dimension_Date dd on (fs.date_key = dd.date_key ) -> GROUP BY ds.Address, dd.`Month Name`

-> ORDER BY 3 DESC, 1, 2 -> LIMIT 0, 5;

+---+---+---+ | Address | Month Name | Sales Amount | +---+---+---+ | 47 MySakila Drive | July | 14020.33 | | 28 MySQL Boulevard | July | 13873.70 | | 28 MySQL Boulevard | August | 11910.70 | | 47 MySakila Drive | August | 11740.45 | | 28 MySQL Boulevard | June | 4829.30 | +---+---+---+ 5 rows in set (0.38 sec)

In document DBA 3: Creating a Data Warehouse (Page 153-160)