• No results found

Using Statistical data formats in visualization

N/A
N/A
Protected

Academic year: 2021

Share "Using Statistical data formats in visualization"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Using Statistical data

(2)

Background

(3)

Background

Focus is on visualization, but that is useless without data… and data is useless without an easy way to load it.

(4)
(5)
(6)

Background

Data Providers

Loaded

Indicators Selected

(7)

Background

• Data loading demo – Start off on a bright note

– Download PC-Axis from SCB

– Load directly into Statistics eXplorer or Mdim eXplorer – http://www.scb.se/Pages/ListWide____259087.aspx – http://www.ssd.scb.se/databaser/makro/visavar.asp? yp=duwird&xu=c5587001&lang=1&langdb=1&Fromw here=S&omradekod=BE&huvudtabell=BefolkningNy &innehall=Folkmangd&prodid=BE0101&deltabell=K2 &fromSok=&preskat=O

(8)

Background

• To make our tool useful, it needs:

– Support the most common formats – Combine data from different sources – Load data in a intuitive way

• Should be easy to understand WHY data is loaded in a specific way

(9)

Background

• To make our tool useful, it needs:

– Support the most common formats – Combine data from different sources

– Load data in a intuitive way

• Should be easy to understand WHY data is loaded in a specific way

(10)

Formats

• Generic Formats

– Excel – txt – CSV

• Statistics Formats

– PC-Axis – SDMX
(11)

Generic Formats

• User are guided to use our structure • Simpler to have

special additions like categorical data and groupings

• Proper error

management and

feedback goes a long way

– Make sure the user knows what is

wrong

• Limits the user to supported structures

• Their export format either needs specific support OR they need to edit their files • Problematic to keep track of

(12)

Excel: Categorical

Example

(13)

Excel: Categorical

Categorical Numerical

(14)

Excel: Categorical

Treemap

Numerical

(15)

Excel: Categorical

Color Map

Categorical Numerical

(16)

Statistics Formats

• Strictly structured

• Has identifiable properties that can be used by

our tools

– Dimensions – Values

– Time

(17)

Statistics Formats

• Exported data can directly be used in tools

which support the format

• No need for editing or changing data bases as

long as they support proper export mechanisms

• Potentially much simpler to update and manage

(18)

Common issues - Notation

• Contents

– Spatial

• Countries, Regions…

• Extra important if the tool uses a map

• Identified in different ways depending on the publisher, language and data set.

– region, country, geo, cou, location etc.

• Usage of codes and/or names differs as well

(19)

Common issues - Notation

• Contents

– Spatial

• Need to prompt the user to identify the spatial dimension

PC-Axis prompt in Statistics eXplorer, Reading a Finnish language PC-Axis

file

SDMX Load interface in Statistics eXplorer, Loading fields for both files, along with

(20)

Common issues - Notation

• Contents

– Spatial

• Problem do exist for other formats as well, but there are fewer options

Prompt when reading an Excel file with data on both sheets and columns, where they couldn’t be correctly identified.

(21)

Common issues - Notation

• Contents

– Time • 2012-05-31 • 05-31-2012 • Q2-2012 • 2012-Q2 • January, February • Etc..

Our tools currently don’t care, they only assume it can

be sorted alphabetically.

Plans on using proper Date standards exist, but there

are many localization issues.

(22)

Common issues - Notation

• Contents

– Dimensions

• Any number of value dimensions

– Gender: Men, Women

– Population: Age 0-14, Age 15-64, Age 65+

– Title and Description fields

(23)
(24)

Common issues – Notation - Example

• How the structure of PC-Axis is used in eXplorer:

– TITLE: Title of the file

– CONTENTS: Contents of the file – STUB: dimensions

– HEADING: dimensions

– VALUES: Contains the content of dimensions – DESCRIPTION: Description of the file

(25)

Common issues – Notation - Example

• Example

– TITLE: “Population numbers by gender” – CONTENTS: “Population”

– STUB: “regions”

– HEADING: “gender”, “time”

– VALUES(“gender”)=“Men”, ”Women”

– VALUES(“time”)=“2000”,”2001”,”2002”…

– VALUES(“region”)=“Norrköping”,”Linköping”…

Name of the indicators would be:

(26)

Common issues - Notation- Example

• Example from SCB

– TITLE: “Statistics focused on sick leave numbers by region, time and value”

– CONTENTS: “Statistics focused on sick leave” – STUB: “regions, “variables”

– HEADING: “time”, “indicators”

– VALUES(“variables”)=“Total”, “Men”, ”Women” – VALUES(“indicators”)=”Sick leave, days”,

”Percentage who contributes to sick leave, per cent"

Name of the indicators would be:

(27)

Common issues - Notation- Example

• Leaves work for the user, to make sure their file

has a structure that fits what we do.

• Being more flexible in the tool could help, but

make it more complex to read data.

(28)

Common issues

• Usage of special characters

– () – ; – “ ” – ‘ ‘

• All cases has to be correctly identified

(29)

SDMX

• Our tools can read:

– SDMX-ML: XML based format – It needs two files:

• DSD: Data structure definition • Data

– Location/regional dimension has to be identified

• We use an Open Source project: flex-cb,

previously developed by ECB.

(30)

SDMX

• OECD: DotStat integration

– eXplorer component viewer: Single view app. – Integrated into the database

– Allows direct viewing of data in our graphs

User select data Query URL OECD web service SDMX data

(31)

SDMX

• Testing with SCB and Eurostat

– Evaluating usage of SDMX

• For regular users?

• What kind of files are suitable

– Usually very large files, for database communication

– Finding bugs

• No SDMX implementation seems to be the same • Both in our reader and the export functionality

(32)

SDMX

• Often completely irrelevant to the normal user

• Extremely powerful for technical users

(33)

Web services

• Best way of acquiring data for normal users

• Format is irrelevant, black-box approach

(34)

Web services

• Standards?

• World dataBank uses its own API and data

format

(35)

Wrapping up

• Most common format is Excel

– Statisticians don’t want a black box format – Harder to detect errors in files

• PC-Axis used by a certain group of people

– They are usually experienced with PC-Axis editing.

• SDMX is only used by technical experts

– Used for data export and webservices – Quite heavily promoted

• From our point of view it’s hard to know the focus of it

(36)

Wrapping up

• Need more structure?

– Not at all! A flexible system will always be better

• Guidelines are important

– Usage of codes and structures

• Know your audience

– Make sure they have options on data structure, and that it is clear how to reach it.

(37)
http://www.scb.se/Pages/ListWide____259087.aspx http://www.ssd.scb.se/databaser/makro/visavar.asp?yp=duwird&xu=c5587001&lang=1&langdb=1&Fromw

References

Related documents