• No results found

DATA QUALITY VISUALIZATION TOOLS ON ARCHIVED HISTORICAL FREEWAY TRAFFIC DATA

N/A
N/A
Protected

Academic year: 2021

Share "DATA QUALITY VISUALIZATION TOOLS ON ARCHIVED HISTORICAL FREEWAY TRAFFIC DATA"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

DATA QUALITY VISUALIZATION TOOLS

ON ARCHIVED HISTORICAL FREEWAY

TRAFFIC DATA

Jothan P. Samuelson, Maricopa Association of Governments

TRB NATMEC Conference June 6, 2012

(2)

Project Extents

286 detectors: 66% loops, 34% PADs (228 detectors in 2009: 58%

loops, 42% PADS)

(3)

Pop Survey…

a.

95%

b.

90%

c.

80%

d.

70%

e. don’t know

(4)

How accurate does your data need to be?

How can we look at the quality of our data

without having to measure it explicitly?

Sensor Type

Counts

Speed

Loops

6% mean error

7% mean error

PADs

34% mean error

26% mean error

sample count

17 freeway stations 4 freeway stations

- MAG Accuracy Evaluation (2005-2006)

(5)

Data Quality Visualization Tools on Archived Historical Freeway Traffic Data

Jothan P. Samuelson | [email protected] | Maricopa Association of Governments | TRB NATMEC Conference, June 2012

Abstract:

Concern regarding the quality of traffic data exists among engineers and planners tasked with obtaining and using the data for various transportation applications. While data quality issues are often understood by analysts doing the hands on work, rarely are the quality characteristics of the data effectively communicated beyond the analyst. This analysis is an exercise in measuring and reporting data quality using ten visualization charts associated and stored with the processed traffic data. This exercise was conducted to support the performance measurement program at the Maricopa Association of Governments in Phoenix, Arizona, and investigates the traffic data from 228 continuous

monitoring freeway detectors in the metropolitan region. The visualization charts produced communicate important information about the completeness and even accuracy of the traffic data, and can be used qualitatively in evaluating the validity of the traffic data beyond pass/fail criteria commonly used. These visualization charts also serve to educate an intuitive sense or understanding of the underlying characteristics of the data considered and used as valid. This presentation describes the method by which the traffic data has been processed and the creation and use of the

visualization charts.

Experience gained in the resulting data quality assessment would recommend wherever possible that visualizations tools be developed and used in the processing and quality control of all traffic data; and that these visualization tools, along with other information on the quality control effort, be stored as metadata with the processed traffic data.

Data Analysis Steps:

 Sort raw lane specific freeway sensor data to a consistently formatted annual text data file for each detector station

 process sorted data through filtering and aggregation and cleaning template

 copy processed and aggregated results to receptacle template with data quality visualization charts and performance measure summaries.

 aggregate select point level charts and data to corridor level files.

 manually evaluate data quality visualization charts at the point level in the corridor aggregation file.

Result:

 Processed and cleaned point level data in accessible format

 Visualization charts characterizing the quality of the data

 Summary point level performance measures  Corridor level summary files for evaluation of point

level data in relation to adjacent locations and corridor as a whole

 Utilization of processed data possible for a variety of applications where data can be rejected/accepted based on the application needs, the condition of adjacent detector data, and availability of the data as a whole

Conclusions:

 Pass/fail quality control filters are not enough. Visualization of the data is needed in qualitatively filtering bad data.

 Visualization of the data is needed in better utilizing “valid” data by communicating the confidence that can be had in this data.

 Analysts stand to benefit most from tools that educate an intuitive sense of data quality.

 Needed are experience and guidelines for utilizing visualization tools for corridor level validity checks.

(6)

Data Processing and QC Steps

raw data

sorting

• copy/paste data to Excel template with formulas for sorting

• save data from template to text file

• combine data to single annual text file for each station

station

level

• copy data to template with aggregation and QC formulas

• copy cleaned and aggregated data to receptacle template

with QC panel and summary performance statistics

corridor

level

• combine point level Visualization charts and performance

measure statistics to tabs in corridor level file

(7)

Data Processing and QC Steps

Point Level Result File

Example

Corridor Level Result

File Example

Please contact presenter for more

information and external example

(8)

0 10 20 30 40 50 60 70 80 12 A M 1 A M 2 A M 3 A M 4 A M 5 A M 6 A M 7 A M 8 A M 9 A M 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e S pe ed (m ph )

Annual Hourly Average Speeds - weekdays

All lanes GP lanes HOV lane

Chart 1: Average Weekday Speed

Valuable Information:

Overall speed trend

Relationship between

GP and hov speed

Peak congestion

period

above: detector 64

I-10 EB @ Broadway

left: detector 59

I-10 EB @ 48

th

Street

½ mile up stream of

det64

0 10 20 30 40 50 60 70 80 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av era ge S pe ed (mp h)

Annual Hourly Average Speeds - weekdays

All lanes GP lanes HOV lane

(9)

Chart 2: Average Weekday Volume

0 200 400 600 800 1000 1200 1400 1600 1800 2000 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM M A10 11 AM 12 PM 1 PM 2 PM 3 PM 4 PM 5 PM 6 PM 7 PM 8 PM 9 PM 10 PM 11 PM 12 AM Av era ge V ol ume P er L an e

Annual Hourly Average Throughput Per Lane - weekdays

All lanes GP lanes HOV lane

Valuable Information:

Overall volume per

lane trend

Relationship between

GP and hov volume

Peak volume period

above: detector 50

I-10 EB @ University Dr.

left: detector 545

Loop 101 SB @ Elliot Rd

0 200 400 600 800 1000 1200 1400 1600 1800 2000 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av era ge V ol ume P er L an e

Annual Hourly Average Throughput Per Lane - weekdays

All lanes GP lanes HOV lane

(10)

Chart 3: Average Weekday Occupancy

Valuable Information:

relationship between

peak occupancy,

volume, and speed

above: detector 64

I-10 EB @ Broadway

left: detector 393

I-10 EB @ Southern Ave

½ mile downstream of

detector 64

0 2 4 6 8 10 12 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e O cc up an cy p er cen t

Annual Hourly Average Occupancy Percent - weekdays

all occ gp occ hov occ 0 5 10 15 20 25 30 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e O cc up an cy p er cen t

Annual Hourly Average Occupancy Percent - weekdays

all occ gp occ hov occ

(11)

Chart 4: Average by-lane Distribution

Valuable Information:

Calibration error

demonstrated by

lanes irregularity

lane specific data

errors

above: detector 30

I-10 EB @ 42

nd

Ave.

left: detector 263

Loop 101 SB,

approaching Southern

Ave

10 20 30 40 50 60 70 80 5,000 10,000 15,000 20,000 25,000 30,000

lane h lane g lane f lane e lane d lane c lane b lane a

Sp ee d an d O cc up an cy ADT

Annual Average by Lane

- raw data with zero values and without

ADT Speed Occupancy 10 20 30 40 50 60 70 5,000 10,000 15,000 20,000 25,000 30,000

lane h lane g lane f lane e lane d lane c lane b lane a

Sp ee d an d O cc up an cy ADT

Annual Average by Lane

- raw data with zero values and without

ADT Speed Occupancy

(12)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Per cen t o f D at a Ro w s V al id

Distribution of Data Passing Quality Control Criteria by Date

Chart 5: Annual Validity Distribution

Valuable Information:

the amount and

season of data

missing

missing vs. invalid

data

above: detector 35

I-10 EB @ 36

th

Ave.

left: detector 85

I-10 EB @ Loop 101 SB

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Pe rc en t o f D at a R ow s V al id

(13)

Chart 6: Weekly Validity Distribution

Valuable Information:

percent valid on

weekdays vs.

weekends

overall percent of

data valid

above: detector 35

I-10 WB @ 36

th

Ave

left: detector 396

I-10 SB @ Baseline Dr.

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 13579111315171921232527293133353739414345474951535557596163656769717375777981838587899193959799101103105107109111113115117119121123125127129131133135137139141143145147149151153155157159161163165167 Pe rc en t o f D at a R ow s V al id

Distribution of Data Passing Quality Control Criteria by Weekday

0%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 13579111315171921232527293133353739414345474951535557596163656769717375777981838587899193959799101103105107109111113115117119121123125127129131133135137139141143145147149151153155157159161163165167 Pe rc en t o f D at a R ow s V al id

(14)

Chart 7: Count of Error Flags by Hour

Valuable Information:

which type of errors

exist

uniform error vs. time

specific error

above: detector 141

I-10 WB @ Loop 202

Interchange

left: detector 66

I-10 WB, South of

Broadway Rd.

0 500 1000 1500 2000 2500 3000 3500 12 A M 1 A M 2 A M 3 A M 4 A M 5 A M 6 A M 7 A M 8 A M 9 A M 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Co unt o f 5-m in E rr or F la gs p er H ou r

Count of Annual Quality Control Flags by Hour of Day - weekdays

speed volume occupancy difference missing all rows 0 100 200 300 400 500 600 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Co unt o f 5 -mi n Erro r F la gs p er H ou r

Count of Annual Quality Control Flags by Hour of Day - weekdays

speed volume occupancy difference missing all rows

(15)

Chart 8: Flow/Density Distribution

Valuable Information:

extent of peak period

congestion

deviations from

expected flow/density

relationship

above: detector 50

I-10 EB @ 48

th

St.

left: detector 93

I-10 EB @ Buckeye Rd.

(16)

Chart 9: Speed/Density Distribution

above: detector 261

Loop 101 SB @ Broadway

Rd

left: detector 255

Loop 101 SB @ Loop 202

Fwy

Valuable Information:

extent of peak period

congestion

deviations from

expected speed/density

relationship

(17)

Chart 10: Speed/Flow Distribution

above: detector 50

I-10 EB @ 48

th

St.

left: detector 264

Loop 101 NB @

Southern Ave.

Valuable Information:

extent of peak period

congestion

deviations from

expected speed/flow

relationship

(18)

Result Summary

Processed and cleaned point level data in accessible format

Visualization charts characterizing the quality of the data

Summary point level performance measures

Corridor level summary files for evaluation of point level data in

relation to adjacent locations and corridor as a whole

Utilization of processed data possible for a variety of applications

where data can be rejected/accepted based on the application

needs, the condition of adjacent detector data, and availability of

the data as a whole

(19)

Conclusions

1. Pass/fail quality control filters are not enough. Visualization of

the data is needed in qualitatively filtering bad data.

2. Visualization of the data is needed in better utilizing “valid” data

by communicating the confidence that can be had in this data.

3. Analysts stand to benefit most from tools that educate an intuitive

sense of the data quality.

4. Needed are more experience and guidelines for utilizing

visualization tools for corridor level validity checks.

(20)

Data Quality Visualization Tools on Archived Historical Freeway Traffic Data

Jothan P. Samuelson | [email protected] | Maricopa Association of Governments | TRB NATMEC Conference, June 2012

Abstract:

Concern regarding the quality of traffic data exists among engineers and planners tasked with obtaining and using the data for various transportation applications. While data quality issues are often understood by analysts doing the hands on work, rarely are the quality characteristics of the data effectively communicated beyond the analyst. This analysis is an exercise in measuring and reporting data quality using ten visualization charts associated and stored with the processed traffic data. This exercise was conducted to support the performance measurement program at the Maricopa Association of Governments in Phoenix, Arizona, and investigates the traffic data from 228 continuous

monitoring freeway detectors in the metropolitan region. The visualization charts produced communicate important information about the completeness and even accuracy of the traffic data, and can be used qualitatively in evaluating the validity of the traffic data beyond pass/fail criteria commonly used. These visualization charts also serve to educate an intuitive sense or understanding of the underlying characteristics of the data considered and used as valid. This presentation describes the method by which the traffic data has been processed and the creation and use of the

visualization charts.

Experience gained in the resulting data quality assessment would recommend wherever possible that visualizations tools be developed and used in the processing and quality control of all traffic data; and that these visualization tools, along with other information on the quality control effort, be stored as metadata with the processed traffic data.

Data Analysis Steps:

 Sort raw lane specific freeway sensor data to a consistently formatted annual text data file for each detector station

 process sorted data through filtering and aggregation and cleaning template

 copy processed and aggregated results to receptacle template with data quality visualization charts and performance measure summaries.

 aggregate select point level charts and data to corridor level files.

 manually evaluate data quality visualization charts at the point level in the corridor aggregation file.

Result:

 Processed and cleaned point level data in accessible format

 Visualization charts characterizing the quality of the data

 Summary point level performance measures  Corridor level summary files for evaluation of point

level data in relation to adjacent locations and corridor as a whole

 Utilization of processed data possible for a variety of applications where data can be rejected/accepted based on the application needs, the condition of adjacent detector data, and availability of the data as a whole

Conclusions:

 Pass/fail quality control filters are not enough. Visualization of the data is needed in qualitatively filtering bad data.

 Visualization of the data is needed in better utilizing “valid” data by communicating the confidence that can be had in this data.

 Analysts stand to benefit most from tools that educate an intuitive sense of data quality.

 Needed are experience and guidelines for utilizing visualization tools for corridor level validity checks.

References

Related documents