• No results found

Advanced SQL Queries for the EMF

N/A
N/A
Protected

Academic year: 2021

Share "Advanced SQL Queries for the EMF"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Advanced SQL

Queries for the EMF

Susan McCusker, MARAMA

2

Online EMF Resources: SQL

 EMF User’s Guide

 https://www.cmascenter.org/emf/internal/guide.html  EMF SQL Reference Guide

 https://www.cmascenter.org/emf/internal/sql_basics.

(2)

3

Roadmap

 Previously: SQL snippets, query components,

syntax

Today:

 Review – Anatomy of a SQL SELECT query and EMF-specific syntax

 New – Query enhancements: data from more than one table – join clause and its variations

Goal: Build a query to summarize CAP emissions for a state by SCC with SCC descriptions

4

Where is SQL used in the EMF?

 Filtering

 Use where clause snippets to filter records; for example

poll = 'CO' ann_value > 300

 Row Filter when viewing and editing raw data  Data Value Filter in Advanced Dataset Search  Row Filter when exporting datasets and QA step

results

(3)

5

Where is SQL used in the EMF?

 SQL-based QA steps

 Write your own selectqueries to create reports and summaries of datasets

 Sum or average emissions by region, SCC, plant, etc.

 Include reference information like county names, lat/lon coordinates, or pollutant description

SQL in the EMF

6

Basic SQL Query Components

 Which table(s) should we get the data from?  Do we want to select all the data in the table

or only certain rows or columns?

 Example: select certain rows to get one state  Example: select certain columns if don't need all

the data from the original inventory

 How should the output be grouped and sorted?  Example: by SCC? by FIPS? by state? by

pollutant?

(4)

7

SQL Syntax

Basic query to extract/aggregate data from a single dataset table:

select columns (i.e., data fields) from table ($TABLE[1] e in EMF)

where filtering criteria used to pick rows

group by columns to aggregate over (e.g., sum) order by columns to use to sort results

SQL Review

8

SQL Syntax

Query to extract/aggregate data from multiple dataset tables:

select columns (i.e., datafields) from table ($TABLE[1] e in EMF) join table on matching criteria

where filtering criteria used to pick rows

group by columns to aggregate over (e.g., sum) order by columns to use to sort results

(5)

9

EMF-Specific Syntax

 SQL database stores data in tables

 Table names must be unique so the EMF uses

the dataset name and random ID to name tables

 Dataset name =

nonpt_2011NEIv2_NONPOINT_20141108_11nov20 14_v1.csv

 Underlying data table name =

emissions.ds_nonpt_2011NEIv2_NONPOINT_2014 1108_11nov2014_v1_csv_214226751

EMF-Specific Syntax

Basic query

Choose:

FIPS, scc, pollutant, annual emissions for NY (36)

selectregion_cd, scc, poll, ann_value from

Emissions.DS_nonpt_2011NEIv2_NONPOINT_20141108_11nov2014_v1_csv_ 214226751

whereregion_cd like '36%'

# records: 83,718

10

(6)

Basic query

Choose:

FIPS, scc, pollutant, annual emissions for NY (36)

selectregion_cd, scc, poll, ann_value from

Emissions.DS_nonpt_2011NEIv2_NONPOINT_20141108_11nov2014_v1_csv_ 214226751

whereregion_cd like '36%'

# records: 83,718

11

SQL Review

Poll #1

The highlighted part of this where clause can be used directly as a row filter when viewing or exporting data: where region_cd like '36%' • TRUE

• FALSE

(7)

13

EMF-Specific Syntax

 Instead of directly using the table name in the

from statement, use the special syntax $TABLE[1] e

 e is a single character table alias

 Can use the alias throughout the query instead of the table name

EMF-Specific Syntax

Make it generic with SQL-specific

syntax

selectregion_cd, scc, poll, ann_value

from$TABLE[1] e

whereregion_cd like '36%'

# records: 83,718

14

(8)

15

EMF-Specific Syntax

 $DATASET_TABLE["dataset name", 1] a  Refers to a different dataset

 Uses the default version of the data

Ex:$DATASET_TABLE["nonpt_2011NEIv2_NONPOINT

_20141108_11nov2014_v1.csv", 1] a

 Other EMF-specific options let you refer to

specific versions of datasets or output of QA steps (covered in reference guide)

EMF-Specific Syntax

Only interested in CAPs, not HAPs

FIPS, SCC, pollutant, annual CAP emissions for NY (state FIPS = 36)

selecte.region_cd, e.scc, e.poll, e.ann_value from$TABLE[1] e

wheree.region_cd like '36%' and substring(e.poll,1,1) not in ('1', '2', '3',

'4', '5', '6', '7', '8', '9')

# records: 20,187

16

(9)

Poll #2

We want to exclude HAPS & type the condition e.poll not in ('1', '2', '3', '4', '5', '6', '7', '8', '9'). a. This condition excludes HAPs

b. This condition results in an error

c. This condition does not exclude HAPs and does not result in an error

17

Summarize by FIPS, SCC, pollutant

Use the aggregate function sum for ann_value

select e.region_cd, e.scc, e.poll, sum(e.ann_value)

from $TABLE[1] e

where e.region_cd like '36%' and substring(e.poll,1,1)

not in ('1', '2', '3', '4', '5', '6', '7', '8', '9')

Error: need group by clause for select sum

18

(10)

Summarize by FIPS, SCC, pollutant

Use the aggregate function sum for ann_value

selecte.region_cd, e.scc, e.poll, sum(e.ann_value) from$TABLE[1] e

wheree.region_cd like '36%' and substring(e.poll,1,1) not in ('1', '2', '3', '4', '5',

'6', '7', '8', '9')

group by region_cd, scc, poll

# records: 20,187

19

SQL Review

Summarize by State,SCC, pollutant

Group together by 1st two digits of FIPS code:

selectsubstring(e.region_cd,1,2), e.scc, e.poll, sum(e.ann_value) from$TABLE[1] e

wheree.region_cd like '36%' and substring(e.poll,1,1) not in ('1', '2', '3', '4', '5',

'6', '7', '8', '9')

group by substring(e.region_cd,1,2), e.scc, e.poll

# records: 340

20

(11)

21

Multiple Tables

 Instead of extracting or aggregating data from

just one table, we can use join to combine data from multiple tables

Example reference table: scc

Query Enhancements: JOIN scc sector scc_description

2401001000 Nonpoint "Solvent Utilization;Surface Coating;Architectural Coatings;Total: All Solvent Types"

2610000500 Nonpoint "Waste Disposal, Treatment, and Recovery;Open Burning;All Categories;Land Clearing Debris (use 28-10-005-000 for Logging Debris Burning)"

Poll #3

The EMF-specific syntax $TABLE[1] e:

a. Refers to the data table for the dataset to which the QA step is attached

b. Is generic, i.e., it can be copied to QA steps for other datasets of the same type/with the same columns c. Assigns a single-character table alias "e" that can be

used in referring to columns from the dataset d. All of the above

(12)

23

JOIN Syntax

For a reference table: from $TABLE[1] e

left join reference.scc on e.scc = scc.scc

Query Enhancements: JOIN

After JOIN keyword is the name of the table to join

For a dataset table: from $TABLE[1] e left join

$DATASET_TABLE["dataset name", 1] a

 Refers to a different dataset  Uses the default version of the data

Less basic query – add a join

select … scc.scc_description … from $TABLE[1] e

left join reference.scc on e.scc = scc.scc

where

group by … scc.scc_description …

24

(13)

Summarize by State, SCC, pollutant –

with SCC descriptions

selectsubstring(region_cd,1,2) as FIPS_State, scc,

scc.scc_description, poll, sum(ann_value)

from$TABLE[1] e

left join reference.scc on e.scc = scc.scc

whereregion_cd like '36%' and substring(poll,1,1) not in ('1', '2',

'3', '4', '5', '6', '7', '8', '9')

group by substring(region_cd,1,2), scc,

scc.scc_description,poll

Failed to run…"scc" is ambiguous

25

Query Enhancements: JOIN

Summarize by State, SCC, pollutant –

with SCC descriptions

selectsubstring(e.region_cd,1,2) as FIPS_State, e.scc, scc.scc_description,

e.poll, sum(e.ann_value)

from$TABLE[1] e

left join reference.scc on e.scc = scc.scc

wheree.region_cd like '36%' and substring(e.poll,1,1) not in ('1', '2', '3', '4', '5',

'6', '7', '8', '9')

group by substring(e.region_cd,1,2), e.scc, scc.scc_description, e.poll

26

(14)

EMF Reference Table Examples

https://www.cmascenter.org/emf/inte rnal/sql_basics.html

27

EMF Reference Tables - Example

Summarize by State, SCC, pollutant –

with SCC level descriptions

selectsubstring(e.region_cd,1,2) as FIPS_State, e.scc, scc_codes.scc_l1,

e.poll, sum(e.ann_value)

from$TABLE[1] e

left join reference.scc_codes on e.scc = scc_codes.scc

wheree.region_cd like '36%' and substring(e.poll,1,1) not in ('1', '2', '3', '4', '5',

'6', '7', '8', '9')

group by substring(e.region_cd,1,2), e.scc, scc_codes.scc_l1, e.poll

28

(15)

29

JOIN Syntax: multiple

criteria,multiple joins

... from $TABLE e left join reference.fips on e.county = fips.county and e.state = fips.st left join

reference.scc on e.scc = e.scc

on clause defines

how the two tables relate to each other

 county and state names must match  Can have

 Different column names

 Multiple joins

Query Enhancements: JOIN

30

JOIN Options

(16)

31

LEFT JOIN Semantics

 To include all the records from

nonpt_2011NEIv2_NONPOINT_20141108_11nov2014_v1.csv

whether or not there’s a matching record in

reference.scc, we use a left join instead of just join

left join is also called left outer join

Query Enhancements: JOIN

32

LEFT JOIN Oil & Gas Example

(17)

33

LEFT JOIN Details

 Table order is important when using left

join ...

from $TABLE[1] e

left join reference.scc ...

 All the records in the “left” table ($TABLE[1]) are

returned in the output

 What if the order is reversed?

Query Enhancements: JOIN

Poll #4

What results would we expect from reversing the order of the left join in the previous query, i.e., from reference.scc

left join $TABLE[1] e

a. Only records common to both tables are returned b. All records in either table are returned

c. All records in the reference.scc table are returned d. All records in the $TABLE[1] e are returned

(18)

35

Dealing with NULL Values

NULL in SQL is a state (unknown) and not a

value: data value does not exist in the database

 Used when data is unknown  Unknown ≠ "value of zero"

NULL values can cause unexpected output

when combined with other values

 'Fish' || NULL || 'Chips' = NULL

NULL /0 = NULL

NULL Values

What happens if there's no match in the

other table?

Coalesce function

select e.scc, coalesce(scc.scc_description, 'An unspecified description') as scc_description,

e.poll, coalesce(sum(e.ann_value),0) as ann_emis from $TABLE[1] e left join reference.scc on

scc.scc = e.scc

group by e.scc, scc.scc_description, e.poll order by e.scc

36

(19)

37

COALESCE Function

coalesce(value1, value2, ...) function

returns the first value in the list that is not null

 Replace select scc.scc_descriptionwith select coalesce(scc.scc_description, 'An unspecified description')

 If there is no description for the SCC (i.e., scc.scc_description is NULL), 'An upspecified description' is returned.

NULL Values

38

Putting It All Together

selecte.scc,

coalesce(s.scc_description, 'Unspecified description') as scc_description, e.poll,

coalesce(p.descrptn, 'Unspecified description') as

pollutant_code_desc,

coalesce(sum(ann_value), 0) as ann_emis from $TABLE[1] e

left join reference.invtable p on p.cas = e.poll left join reference.scc s on e.scc = s.scc

group by e.scc, e.poll, p.descrptn, s.scc_description,

p.name

(20)

39

-- From EMF main window:

Manage  Datasets

-- Show Datasets of Type:

Select one from drop-down menu: ex, nonpt_2011NEIv2_NONPOINT_20141108_11nov2014_v1.csv *Use or search button or to narrow down results displayed*

-- Check box to select: ex, nonpt_2011NEIv2_NONPOINT_20141108_11nov2014_v1.csv

-- Click Edit Propertiesat bottom of the screen -- Click QAtab at top

-- Click Add Customat bottom of screen

-- Enter Name (ex, “test”, “temp”, or something descriptive for future reminder) -- In Program drop-down box choose SQL

-- In Arguments box type in or cut & paste SQL query -- Click OK

-- Back in Dataset Properties Editor, check the box next to the QA Step that you added and click Editat

bottom of screen

-- In Edit QA Step window, check box for Download result file to local machine?if want to export results

-- Click Run

-- Click View Resultsto see results in EMF

* If there are multiple files in a dataset type, make sure to keep track of which file your query is applied to. Query results are for that particular file only.

* Dumping run logs in status windows to trashcan before run starts will reduce clustering and make the log easier to read

Exporting to Google Earth

 Option to export data in

KMZ format used by Google Earth

 For QA step results that

include latitude and longitude coordinates

 Export from View QA Step

Results window

40

(21)

Exporting to Google Earth

41

QA Step Results: Mapping

Formatting Note: Quotes

42

"smart quotes" cause a syntax error in the EMF

References

Related documents

When the driver executes a query that contains a join, it processes the tables from left to right and uses an index on the second table’s join field (the DEPT field of the EMP

COPY %1 A: shift if not (%1)==() goto LOOP  {Beginning with the first 

• Joins from Left source to right source • Returns nulls on the right if join fails • RIGHT [OUTER] JOIN. • Joins from Right source to Left source • Return nulls on the left if

 Left join: include all qualified records from the left table in the join. condition even if they do not have matching records in the

He is on the editorial board of three journals and runs a variety of courses at his practice, including a comprehensive year course for those willing to get started in the field

Finally, we will discuss Bode-Fano limitations and show how they may be used to obtain a bound on the terminal behavior of a nonuniform transmission line..

This paper has presented automatic speech recognition and formant based analysis of Arabic vowels using a spectrogram technique [18].. For the formant based

As the file detection test is the most requested test by users (and magazines) and also due to some business requirements (our file detection test results are used/required by