• No results found

Question type A: Average time stayed or took

In question type A, aggregate calculation (‘AVG’) was used to find average num- ber of days. Further calculation details are different for each vendor; however, the following queries are written in HiveQL syntax.

A.1: What is the average number of days that product type ‘A’ stays at the warehouse?

This question specifies two filter constraints. First, the product type has to be ‘A’. Second, products stayed at a warehouse. In order to filter for product type ‘A’,

the Product type table is filtered using the Category column. To filter products that stayed at a warehouse, the Location table is filtered using the Type field.

Here are the filters:

• s.out timestamp IS NOT NULL is done because I only want to consider a Stay record that is complete. A Stay record is complete when it has both a Time in and Time out timestamp. If s.out timestamp is NULL, then the product is still at that current location.

• l.type = ‘Warehouse’ is done because I am only interested in locations that are warehouses. In real life, there would be many different types of locations inside a warehouse (i.e., warehouse inspection, warehouse shelf, etc.) and those would all have to be combined to find the total time a product spent at a warehouse. For example, if a product spent 1 minute being unloaded, 5 minutes being inspected, 10 minutes being transported to the shelf, 2 days on the shelf, 10 minutes being transported to the shipping area, etc., then all of those would have to be combined to get the total time at the warehouse.

• pt.category = ‘A’ filters just for products that are in category ‘A’. Also, instead of filtering with the Category column, I can use UPC column to identify a specific product type as well.

Query 7.1 A.1 Query using Normalized Schema

1 SELECT AVG( (CAST( s . t i m e o u t AS INT ) − CAST( s . t i m e i n AS INT ) )

/ ( 2 4∗60∗60) ) AS AvgStayDuration

2 FROM v2 . p r o d u c t t y p e s pt

3 JOIN v2 . p r o d u c t s p ON ( pt . i d = p . p r o d u c t t y p e i d ) 4 JOIN v2 . s t a y s s ON ( s . p r o d u c t i d = p . i d )

5 JOIN v2 . l o c a t i o n s l on ( s . l o c a t i o n i d = l . i d ) 6 WHERE s . o u t t i m e s t a m p I S NOT NULL AND

7 pt . c a t e g o r y = 'A' AND 8 l . t y p e = ' Warehouse ' ;

Query 7.2 A.1 Query using De-Normalized Schema

1 SELECT AVG( (CAST( s . o u t t i m e s t a m p a s i n t ) − CAST( s . i n t i m e s t a m p a s

i n t ) ) / ( 2 4∗60∗60) ) AS AvgStayDuration

2 FROM v3 . s t a y s s

3 WHERE s . l o c a t i o n t y p e = ' Warehouse ' AND 4 s . o u t t i m e s t a m p I S NOT NULL AND

5 s . p r o d u c t t y p e c a t e g o r y = ' Category 1 ' ;

A.2: What is the average time to be sold for products of type ’A’ that were shipped from Warehouse 1 in January 2015?

The answer can be calculated by finding the intersection of the list of all product type ‘A’ that were at Warehouse A and the list of all product type ‘A’ that were sold during January 2015.

This question has some major criteria: • Product must be type ‘A’

• Product must have been at Warehouse 1

• Product must have shipped from Warehouse 1 during January of 2015 • Product must be sold

To solve this question, I need to find two Stay records for each product type ‘A’: • A Stay record that shows the product type ‘A’ was shipped from Warehouse 1. I need the time out from the Stay record, which is the time that the product type ‘A’ left the warehouse. The time out (ship date) must be during January 2015. In normalized schema, I implemented Date dim table to help this process. By using the Date dim table, we can easily filter by a year, a month, and a day. • A Stay record that shows the product type ‘A’ was sold. I assumed that any RFID readings at a checkout meant that the product was sold. I needed the time in from this Stay record.

I took the difference in the time out from the warehouse record and the time in from the checkout record, then I had the amount of time it took to sell product type ‘A’.

The upper part of the query found the warehouse record. I combined the base tables to be able to relate the products with the given Stay record.

Here are the filters:

• pt.category = ‘A’ filters product results to only include ‘A’.

• l.name = ‘Warehouse 1’ filters stay results to only include products that were at Warehouse 1.

• d.year = ‘2015’ AND d.month =‘1’ filters stay results to only include those that ended during January of 2015. Any stay record that has a time out during that timespan would indicate that the product type ‘A’ was shipped during January of 2015 since a stay ends when the product leaves that location.

The lower part of the query found the store checkout record. I combined the base tables to be able to relate the products with the given Stay record.

Here are the filters:

• l.type = ‘Store’ filters stay results to only include products that were at the store checkout.

• s.time out IS NOT NULL is done because I only wanted to consider a Stay record that was complete, which means sold.

The upper part of the query generates the list of product IDs with the time they shipped from ‘Warehouse 1’ during January 2015. The lower part of the query generates the list of product IDs with the time they were sold. By joining on those two lists, I can calculate the average number of days it took for product type ‘A’ to be shipped from Warehouse 1 in January 2015 to be sold.

Query 7.3 A.2 Query using Normalized Schema 1 WITH q1 AS ( 2 SELECT p . i d a s p i d 3 FROM v2 . p r o d u c t t y p e s pt 4 JOIN v2 . p r o d u c t s p ON ( pt . i d = p . p r o d u c t t y p e i d ) 5 JOIN v2 . s t a y s s ON ( p . i d = s . p r o d u c t i d ) 6 JOIN v2 . l o c a t i o n s l ON ( s . l o c a t i o n i d = l . i d ) 7 JOIN v2 . d a t e d i m d ON ( s . t i m e o u t d i m i d = d . i d ) 8 WHERE d . y e a r = '2015 ' 9 AND d . month = '1 ' 10 AND pt . c a t e g o r y = 'A' 11 AND l . name = ' Warehouse 1 '

12 )

13 SELECT AVG( (CAST( s . t i m e o u t a s i n t ) − CAST( s . t i m e i n a s i n t ) )

/ ( 2 4∗60∗60) ) AS AvgStayDuration

14 FROM v2 . l o c a t i o n s l

15 JOIN v2 . s t a y s s ON ( l . i d = s . l o c a t i o n i d ) 16 JOIN q1 ON ( s . p r o d u c t i d = q1 . p i d )

17 WHERE l . t y p e = ' Store ' 18 AND s . t i m e o u t I S NOT NULL ;

Query 7.4 A.2 Query using De-Normalized Schema 1 WITH q1 AS (

2 SELECT s . EPC 3 FROM v3 . s t a y s s

4 WHERE s . l o c a t i o n n a m e = ' Warehouse 1 ' AND 5 s . o u t t i m e s t a m p I S NOT NULL AND

6 s . p r o d u c t t y p e c a t e g o r y = ' Category 1 ' AND

7 s . o u t t i m e s t a m p >= UNIX TIMESTAMP('2015 −01 −01 ' , ' yyyy−MM−dd

' ) AND

8 s . o u t t i m e s t a m p < UNIX TIMESTAMP('2015 −02 −02 ' , ' yyyy−MM−dd ' ) 9 )

10 SELECT AVG( (CAST( s . o u t t i m e s t a m p a s i n t ) − CAST( s . i n t i m e s t a m p a s

11FROM v3 . s t a y s s JOIN 12 q1 ON s . e p c = q1 . e p c

13WHERE s . l o c a t i o n t y p e = ' Store ' AND 14 s . o u t t i m e s t a m p I S NOT NULL ;

Related documents