DATA QUALITY VISUALIZATION TOOLS
ON ARCHIVED HISTORICAL FREEWAY
TRAFFIC DATA
Jothan P. Samuelson, Maricopa Association of Governments
TRB NATMEC Conference June 6, 2012
Project Extents
286 detectors: 66% loops, 34% PADs (228 detectors in 2009: 58%
loops, 42% PADS)
Pop Survey…
a.
95%
b.
90%
c.
80%
d.
70%
e. don’t know
How accurate does your data need to be?
How can we look at the quality of our data
without having to measure it explicitly?
Sensor Type
Counts
Speed
Loops
6% mean error
7% mean error
PADs
34% mean error
26% mean error
sample count
17 freeway stations 4 freeway stations
- MAG Accuracy Evaluation (2005-2006)
Data Quality Visualization Tools on Archived Historical Freeway Traffic Data
Jothan P. Samuelson | [email protected] | Maricopa Association of Governments | TRB NATMEC Conference, June 2012
Abstract:
Concern regarding the quality of traffic data exists among engineers and planners tasked with obtaining and using the data for various transportation applications. While data quality issues are often understood by analysts doing the hands on work, rarely are the quality characteristics of the data effectively communicated beyond the analyst. This analysis is an exercise in measuring and reporting data quality using ten visualization charts associated and stored with the processed traffic data. This exercise was conducted to support the performance measurement program at the Maricopa Association of Governments in Phoenix, Arizona, and investigates the traffic data from 228 continuous
monitoring freeway detectors in the metropolitan region. The visualization charts produced communicate important information about the completeness and even accuracy of the traffic data, and can be used qualitatively in evaluating the validity of the traffic data beyond pass/fail criteria commonly used. These visualization charts also serve to educate an intuitive sense or understanding of the underlying characteristics of the data considered and used as valid. This presentation describes the method by which the traffic data has been processed and the creation and use of the
visualization charts.
Experience gained in the resulting data quality assessment would recommend wherever possible that visualizations tools be developed and used in the processing and quality control of all traffic data; and that these visualization tools, along with other information on the quality control effort, be stored as metadata with the processed traffic data.
Data Analysis Steps:
Sort raw lane specific freeway sensor data to a consistently formatted annual text data file for each detector station
process sorted data through filtering and aggregation and cleaning template
copy processed and aggregated results to receptacle template with data quality visualization charts and performance measure summaries.
aggregate select point level charts and data to corridor level files.
manually evaluate data quality visualization charts at the point level in the corridor aggregation file.
Result:
Processed and cleaned point level data in accessible format
Visualization charts characterizing the quality of the data
Summary point level performance measures Corridor level summary files for evaluation of point
level data in relation to adjacent locations and corridor as a whole
Utilization of processed data possible for a variety of applications where data can be rejected/accepted based on the application needs, the condition of adjacent detector data, and availability of the data as a whole
Conclusions:
Pass/fail quality control filters are not enough. Visualization of the data is needed in qualitatively filtering bad data.
Visualization of the data is needed in better utilizing “valid” data by communicating the confidence that can be had in this data.
Analysts stand to benefit most from tools that educate an intuitive sense of data quality.
Needed are experience and guidelines for utilizing visualization tools for corridor level validity checks.
Data Processing and QC Steps
raw data
sorting
• copy/paste data to Excel template with formulas for sorting
• save data from template to text file
• combine data to single annual text file for each station
station
level
• copy data to template with aggregation and QC formulas
• copy cleaned and aggregated data to receptacle template
with QC panel and summary performance statistics
corridor
level
• combine point level Visualization charts and performance
measure statistics to tabs in corridor level file
Data Processing and QC Steps
Point Level Result File
Example
Corridor Level Result
File Example
Please contact presenter for more
information and external example
0 10 20 30 40 50 60 70 80 12 A M 1 A M 2 A M 3 A M 4 A M 5 A M 6 A M 7 A M 8 A M 9 A M 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e S pe ed (m ph )
Annual Hourly Average Speeds - weekdays
All lanes GP lanes HOV lane
Chart 1: Average Weekday Speed
Valuable Information:
Overall speed trend
Relationship between
GP and hov speed
Peak congestion
period
above: detector 64
I-10 EB @ Broadway
left: detector 59
I-10 EB @ 48
th
Street
½ mile up stream of
det64
0 10 20 30 40 50 60 70 80 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av era ge S pe ed (mp h)Annual Hourly Average Speeds - weekdays
All lanes GP lanes HOV lane
Chart 2: Average Weekday Volume
0 200 400 600 800 1000 1200 1400 1600 1800 2000 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM M A10 11 AM 12 PM 1 PM 2 PM 3 PM 4 PM 5 PM 6 PM 7 PM 8 PM 9 PM 10 PM 11 PM 12 AM Av era ge V ol ume P er L an eAnnual Hourly Average Throughput Per Lane - weekdays
All lanes GP lanes HOV lane
Valuable Information:
•
Overall volume per
lane trend
•
Relationship between
GP and hov volume
•
Peak volume period
above: detector 50
I-10 EB @ University Dr.
left: detector 545
Loop 101 SB @ Elliot Rd
0 200 400 600 800 1000 1200 1400 1600 1800 2000 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av era ge V ol ume P er L an eAnnual Hourly Average Throughput Per Lane - weekdays
All lanes GP lanes HOV lane
Chart 3: Average Weekday Occupancy
Valuable Information:
•
relationship between
peak occupancy,
volume, and speed
above: detector 64
I-10 EB @ Broadway
left: detector 393
I-10 EB @ Southern Ave
½ mile downstream of
detector 64
0 2 4 6 8 10 12 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e O cc up an cy p er cen tAnnual Hourly Average Occupancy Percent - weekdays
all occ gp occ hov occ 0 5 10 15 20 25 30 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Av er ag e O cc up an cy p er cen t
Annual Hourly Average Occupancy Percent - weekdays
all occ gp occ hov occ
Chart 4: Average by-lane Distribution
Valuable Information:
•
Calibration error
demonstrated by
lanes irregularity
•
lane specific data
errors
above: detector 30
I-10 EB @ 42
nd
Ave.
left: detector 263
Loop 101 SB,
approaching Southern
Ave
10 20 30 40 50 60 70 80 5,000 10,000 15,000 20,000 25,000 30,000lane h lane g lane f lane e lane d lane c lane b lane a
Sp ee d an d O cc up an cy ADT
Annual Average by Lane
- raw data with zero values and without
ADT Speed Occupancy 10 20 30 40 50 60 70 5,000 10,000 15,000 20,000 25,000 30,000
lane h lane g lane f lane e lane d lane c lane b lane a
Sp ee d an d O cc up an cy ADT
Annual Average by Lane
- raw data with zero values and without
ADT Speed Occupancy
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Per cen t o f D at a Ro w s V al id
Distribution of Data Passing Quality Control Criteria by Date
Chart 5: Annual Validity Distribution
Valuable Information:
•
the amount and
season of data
missing
•
missing vs. invalid
data
above: detector 35
I-10 EB @ 36
th
Ave.
left: detector 85
I-10 EB @ Loop 101 SB
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Pe rc en t o f D at a R ow s V al id
Chart 6: Weekly Validity Distribution
Valuable Information:
•
percent valid on
weekdays vs.
weekends
•
overall percent of
data valid
above: detector 35
I-10 WB @ 36
th
Ave
left: detector 396
I-10 SB @ Baseline Dr.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 13579111315171921232527293133353739414345474951535557596163656769717375777981838587899193959799101103105107109111113115117119121123125127129131133135137139141143145147149151153155157159161163165167 Pe rc en t o f D at a R ow s V al idDistribution of Data Passing Quality Control Criteria by Weekday
0%10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 13579111315171921232527293133353739414345474951535557596163656769717375777981838587899193959799101103105107109111113115117119121123125127129131133135137139141143145147149151153155157159161163165167 Pe rc en t o f D at a R ow s V al id
Chart 7: Count of Error Flags by Hour
Valuable Information:
•
which type of errors
exist
•
uniform error vs. time
specific error
above: detector 141
I-10 WB @ Loop 202
Interchange
left: detector 66
I-10 WB, South of
Broadway Rd.
0 500 1000 1500 2000 2500 3000 3500 12 A M 1 A M 2 A M 3 A M 4 A M 5 A M 6 A M 7 A M 8 A M 9 A M 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Co unt o f 5-m in E rr or F la gs p er H ou rCount of Annual Quality Control Flags by Hour of Day - weekdays
speed volume occupancy difference missing all rows 0 100 200 300 400 500 600 12 A M 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM 7 AM 8 AM 9 AM 10 A M 11 A M 12 P M 1 P M 2 P M 3 P M 4 P M 5 P M 6 P M 7 P M 8 P M 9 P M 10 P M 11 P M 12 A M Co unt o f 5 -mi n Erro r F la gs p er H ou r
Count of Annual Quality Control Flags by Hour of Day - weekdays
speed volume occupancy difference missing all rows
Chart 8: Flow/Density Distribution
Valuable Information:
•
extent of peak period
congestion
•
deviations from
expected flow/density
relationship
above: detector 50
I-10 EB @ 48
th
St.
left: detector 93
I-10 EB @ Buckeye Rd.
Chart 9: Speed/Density Distribution
above: detector 261
Loop 101 SB @ Broadway
Rd
left: detector 255
Loop 101 SB @ Loop 202
Fwy
Valuable Information:
•
extent of peak period
congestion
•
deviations from
expected speed/density
relationship
Chart 10: Speed/Flow Distribution
above: detector 50
I-10 EB @ 48
th
St.
left: detector 264
Loop 101 NB @
Southern Ave.
Valuable Information:
•
extent of peak period
congestion
•
deviations from
expected speed/flow
relationship
Result Summary
Processed and cleaned point level data in accessible format
Visualization charts characterizing the quality of the data
Summary point level performance measures
Corridor level summary files for evaluation of point level data in
relation to adjacent locations and corridor as a whole
Utilization of processed data possible for a variety of applications
where data can be rejected/accepted based on the application
needs, the condition of adjacent detector data, and availability of
the data as a whole
Conclusions
1. Pass/fail quality control filters are not enough. Visualization of
the data is needed in qualitatively filtering bad data.
2. Visualization of the data is needed in better utilizing “valid” data
by communicating the confidence that can be had in this data.
3. Analysts stand to benefit most from tools that educate an intuitive
sense of the data quality.
4. Needed are more experience and guidelines for utilizing
visualization tools for corridor level validity checks.
Data Quality Visualization Tools on Archived Historical Freeway Traffic Data
Jothan P. Samuelson | [email protected] | Maricopa Association of Governments | TRB NATMEC Conference, June 2012
Abstract:
Concern regarding the quality of traffic data exists among engineers and planners tasked with obtaining and using the data for various transportation applications. While data quality issues are often understood by analysts doing the hands on work, rarely are the quality characteristics of the data effectively communicated beyond the analyst. This analysis is an exercise in measuring and reporting data quality using ten visualization charts associated and stored with the processed traffic data. This exercise was conducted to support the performance measurement program at the Maricopa Association of Governments in Phoenix, Arizona, and investigates the traffic data from 228 continuous
monitoring freeway detectors in the metropolitan region. The visualization charts produced communicate important information about the completeness and even accuracy of the traffic data, and can be used qualitatively in evaluating the validity of the traffic data beyond pass/fail criteria commonly used. These visualization charts also serve to educate an intuitive sense or understanding of the underlying characteristics of the data considered and used as valid. This presentation describes the method by which the traffic data has been processed and the creation and use of the
visualization charts.
Experience gained in the resulting data quality assessment would recommend wherever possible that visualizations tools be developed and used in the processing and quality control of all traffic data; and that these visualization tools, along with other information on the quality control effort, be stored as metadata with the processed traffic data.
Data Analysis Steps:
Sort raw lane specific freeway sensor data to a consistently formatted annual text data file for each detector station
process sorted data through filtering and aggregation and cleaning template
copy processed and aggregated results to receptacle template with data quality visualization charts and performance measure summaries.
aggregate select point level charts and data to corridor level files.
manually evaluate data quality visualization charts at the point level in the corridor aggregation file.
Result:
Processed and cleaned point level data in accessible format
Visualization charts characterizing the quality of the data
Summary point level performance measures Corridor level summary files for evaluation of point
level data in relation to adjacent locations and corridor as a whole
Utilization of processed data possible for a variety of applications where data can be rejected/accepted based on the application needs, the condition of adjacent detector data, and availability of the data as a whole
Conclusions:
Pass/fail quality control filters are not enough. Visualization of the data is needed in qualitatively filtering bad data.
Visualization of the data is needed in better utilizing “valid” data by communicating the confidence that can be had in this data.
Analysts stand to benefit most from tools that educate an intuitive sense of data quality.
Needed are experience and guidelines for utilizing visualization tools for corridor level validity checks.