Location and Path Queries - RFID Big Data Warehousing and Analytics in Cloud Computing Environm

Question type B: Locations/Paths

In question type B, finding shipment path or interesting locations could be an- swered by sorting RFID tags in chronological order, counting the number of discarded items for each location, or counting the number of product type ‘B’ in stock.

B.1: What locations are in the shipment path for product EPC ‘XXXX’ ? What paths product EPC ‘XXXX’ took so far?

The question specifies one filter constraint, product EPC. In order to filter for product EPC, the Product table is filtered using the EPC column. Also, by sorting RFID tags in chronological order, users can track the physical locations where products have been stored, and then build a path by aggregating the locations in chronological order.

Here are the filters:

• p.epc = ‘DFFC7FFC-5CF8-4447-8B10-442199452636’ is done to filter one spe- cific item.

Query 7.5 B.1 Query using Normalized Schema 1 SELECT 2 l . name a s locationName , 3 l . t y p e a s l o c a t i o n T y p e , 4 s . t i m e i n a s a r r i v a l T i m e , 5 s . t i m e o u t a s l e a v i n g T i m e 6 FROM v2 . p r o d u c t t y p e s pt

7 JOIN v2 . p r o d u c t s p ON ( pt . i d = p . p r o d u c t t y p e i d ) 8 JOIN v2 . s t a y s s ON ( p . i d = s . p r o d u c t i d )

9 JOIN v2 . l o c a t i o n s l on ( s . l o c a t i o n i d = l . i d )

10WHERE p . e p c = 'DFFC7FFC−5CF8−4447−8B10−442199452636 ' 11 ORDER BY s . t i m e i n ;

Query 7.6 B.1 Query using De-Normalized Schema 1 SELECT s . l o c a t i o n n a m e , 2 s . l o c a t i o n l o n g i t u d e , 3 s . l o c a t i o n l a t i t u d e , 4 s . i n t i m e s t a m p , 5 s . o u t t i m e s t a m p 6 FROM v3 . s t a y s s 7 WHERE s . e p c = 'DFFC7FFC−5CF8−4447−8B10−442199452636 ' 8 ORDER BY s . i n t i m e s t a m p ;

B.2: Of all the product UPC ‘ABCD1234’ that were discarded because of ‘Reason 1’, what are the most common shared locations where the product type ‘B’ stayed?

Identifying the relationship between the locations and the number of discarded items will help businesses to predict possible issues from factories, warehouses, or stores. To find common locations for a product that was defective, the total number of products that were discarded because of ‘Reason 1’ that have a Stay record for each location need to be counted. For example, Location 1 has 1000 records, Location 2 has 200 records, and Location 3 has 5 records, then, I can assume there is a correlation between a product coming from Location 1 and the product discarded due to ‘Reason 1’.

This query filters using the dimension tables Product type and Defective code to look for product type ‘B’ that was discarded or returned under the code of ‘Reason 1’. The Defective code table has a list of possible reasons a product would be defective,

meaning the product is thrown away since it cannot be used or sold. If a product is returned by a customer, if a quality check finds that the product is broken, etc., then the product’s record would be updated to show that the product was defective and a reason will then be specified. By having a Defective code table, it allows users to filter based on why the product was discarded or returned. This could be helpful for business that want to find issues in their business, manufacturing, or transportation process.

Here are the filters:

• d.description = ‘Reason 1’ filters for products that were discarded because of ‘Reason 1’.

• pt.UPC = ‘ABCD1234’ filters for a Stay record that Universal Product Code is ‘ABCD1234’.

• GROUP BY l.name groups by the location name and allows the count of products that were discarded for the ‘Reason 1’ to be done for each location.

Finally, I used the query as a sub-query so I can list all locations in descending order (ORDER BY q1.num scrapped DESC), by the number of products that were discarded for ‘Reason 1’.

Query 7.7 B.2 Query using Normalized Schema 1 WITH q1 AS(

2 SELECT l . name a s l o c a t i o n n a m e , 3 COUNT(∗ ) as num scrapped

4 FROM v2 . p r o d u c t s p 5 JOIN v2 . p r o d u c t t y p e s pt ON ( p . p r o d u c t t y p e i d = pt . i d ) 6 JOIN v2 . d e f e c t i v e c o d e s d ON ( p . d e f e c t i v e c o d e i d = d . i d ) 7 JOIN v2 . s t a y s s ON ( p . i d = s . p r o d u c t i d ) 8 JOIN v2 . l o c a t i o n s l ON ( s . l o c a t i o n i d = l . i d ) 9 WHERE d . d e s c r i p t i o n = ' Reason 1 '

10 AND pt .UPC = 'ABCD1234 ' 11 GROUP BY l . name

12 )

13 SELECT ∗ 14 FROM q1

15 ORDER BY q1 . num scrapped DESC ;

Query 7.8 B.2 Query using De-Normalized Schema 1 WITH q1 AS (

2 SELECT s . l o c a t i o n n a m e ,

3 COUNT(∗ ) AS num scrapped 4 FROM v3 . s t a y s s

5 WHERE s . p r o d u c t t y p e u p c = 'ABCD1234 ' AND 6 s . d e f e c t i v e c o d e d e s c r i p t i o n = ' Reason 1 ' 7 GROUP BY s . l o c a t i o n n a m e

8 )

9 SELECT ∗ 10FROM q1

11 ORDER BY q1 . num scrapped DESC ;

B.3: Which warehouses have product type ‘B’ in stock and are the closest to Store ‘B’ ?

This type of question can be asked when a store needs more product type ‘B’ and needs to know which warehouse to get them from (i.e. which warehouse is closest).

This query filters for warehouses that have one or more product type ‘B’ in stock. It then calculates the distance between each of these warehouses and Store ‘B’ to find the closest warehouses.

Here are the filters:

• s.time out IS NULL is done because I only want to consider a Stay record that is incomplete, which means the products are still in the same location.

• l.type = ‘Warehouse’ filters Stay results to include any locations that are warehouses.

• HAVING COUNT(*) > 0 filter for locations that have at least 1 product type ‘B’.

The distance calculation function finds the distance between the two given sets of longitude and latitude. This distance does not represent the true distance traveled over roads between the two locations; it simply estimates the distance. The distance calculation function is missing in HiveQL, but it is possible to implement it in other relational database management systems. If this query is used as a sub-query and then ordered by the distance column, then I can sort the warehouses in order of the distance they are from the store.

Query 7.9 B.3 Query using Normalized Schema

1 SELECT l . name , l . l o n g i t u d e , l . l a t i t u d e , COUNT(∗ ) AS num instock 2 FROM v2 . l o c a t i o n s l 3 JOIN v2 . s t a y s s ON ( l . i d = s . l o c a t i o n i d ) 4 JOIN v2 . p r o d u c t s p ON ( s . p r o d u c t i d = p . i d ) 5 JOIN v2 . p r o d u c t t y p e s pt ON ( p . p r o d u c t t y p e i d = pt . i d ) 6 WHERE s . t i m e o u t I S NULL 7 AND l . t y p e = ' Warehouse ' 8 AND pt . c a t e g o r y = 'B' 9 GROUP BY l . name , l . l o n g i t u d e , l . l a t i t u d e 10 HAVING COUNT(∗ )>0 ;

Query 7.10 B.3 Query using De-Normalized Schema 1 SELECT s . l o c a t i o n n a m e ,

2 s . l o c a t i o n l o n g i t u d e , 3 s . l o c a t i o n l a t i t u d e , 4 COUNT(∗ ) AS num instock 5 FROM v3 . s t a y s s

6 WHERE s . o u t t i m e s t a m p I S NULL AND

7 s . l o c a t i o n t y p e = ' Warehouse ' AND 8 s . p r o d u c t t y p e c a t e g o r y = 'B'

9 GROUP BY s . l o c a t i o n n a m e , s . l o c a t i o n l o n g i t u d e , s .

l o c a t i o n l a t i t u d e

10 HAVING COUNT(∗ ) > 0 ;

In document RFID Big Data Warehousing and Analytics in Cloud Computing Environment (Page 67-72)