• No results found

Amounts and Proportions

N/A
N/A
Protected

Academic year: 2021

Share "Amounts and Proportions"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Amounts and Proportions

Session 4

PMAP 8921: Data Visualization with R

Andrew Young School of Policy Studies

(2)

Plan for today

Reproducibility Amounts

Proportions

(3)

Reproducibility

(4)

Why am I making you learn R?

Pivot Tables do the same thing!

(5)

Why am I making you learn R?

 

More powerful

Free and open source

Reproducibility

(6)

Debt:GDP ratio

90%+ → −0.1% growth

Paul Ryan's 2013 House budget resolution

Austerity and Excel

(7)

Thomas Herndon From Paul Krugman, "The Excel Depression"

Austerity and Excel

(8)

Austerity and Excel

Debt:GDP ratio = 90%+ → 2.2% growth (!!)

(9)

Septin 2 Membrane- Associated Ring Finger (C3HC4) 1

2310009E13

Genes and Excel

20% of genetics papers between 2005–2015 (!!!)

(10)

General guidelines

Don't touch the raw data

If you do, explain what you did!

Use self-documenting, reproducible code

R Markdown!

Use open formats

Use .csv, not .xlsx

(11)

Airbnb, ggplot, and rmarkdown

 

The UK's reproducible analysis pipeline

R Markdown in real life

(12)

Amounts

(13)

Yay bar plots!

We are a lot better at visualizing

line lengths than angles and areas

(14)

Oh no bar plots!

(15)

Start at zero

The entire line length matters, so don't truncate it!

Always start at 0

(Or don't use bars)

(16)

Bar plots and summary statistics

#barbarplots

0:00 / 2:45

(17)

Bar plots and summary statistics

(18)

Show more data with strip plots

ggplot(animals,

aes(x = animal_type, y = weight,

color = animal_type)) +

geom_point(position = position_jitter(heigh size = 1) +

labs(x = NULL, y = "Weight") + guides(color = "none")

(19)

library(ggbeeswarm)

ggplot(animals, aes(x = animal_type, y = weight,

color = animal_type)) + geom_beeswarm(size = 1) +

# Or try this too:

# geom_quasirandom() +

labs(x = NULL, y = "Weight") + guides(color = "none")

Show more data with beeswarm plots

(20)

Combine boxplots with points

ggplot(animals, aes(x = animal_type, y = weight,

color = animal_type)) + geom_boxplot(width = 0.5) +

geom_point(position = position_jitter(heigh size = 1, alpha = 0.5) +

labs(x = NULL, y = "Weight") + guides(color = "none")

(21)

Combine violins with points

ggplot(animals, aes(x = animal_type, y = weight,

color = animal_type)) + geom_violin(width = 0.5) +

geom_point(position = position_jitter(heigh size = 1, alpha = 0.5) +

labs(x = NULL, y = "Weight") + guides(color = "none")

(22)

library(ggridges)

ggplot(animals, aes(x = weight,

y = animal_type,

fill = animal_type)) + geom_density_ridges() +

labs(x = "Weight", y = NULL) + guides(fill = "none")

Overlapping ridgeplots

(23)

General rules

Bar charts always start at zero

Don't use bars for summary statistics.

You throw away too much information.

The end of the bar is often all that matters

(24)

Lots of alternatives

We'll use a summarized version of the gapminder dataset as an example

library(gapminder)

gapminder_continents <- gapminder %>%

filter(year == 2007) %>% # Only look at 20 count(continent) %>% # Get a count of cont arrange(desc(n)) %>% # Sort descendingly # Make continent into an ordered factor

mutate(continent = fct_inorder(continent)) ggplot(gapminder_continents,

aes(x = continent, y = n, fill = conti geom_col() +

guides(fill = "none") +

labs(x = NULL, y = "Number of countries")

(25)

ggplot(gapminder_continents,

aes(x = continent, y = n, color = continent)) +

geom_pointrange(aes(ymin = 0, ymax = n)) + guides(color = "none") +

labs(x = NULL, y = "Number of countries")

Alternatives: Lollipop charts

Since the end of the bar is important, emphasize it the most

(26)

Alternatives: Waf e charts

Show the individual observations as squares

# This has to be installed in a special way--

# Run this in your console:

# devtools::install_github("hrbrmstr/waffle") library(waffle)

ggplot(gapminder_continents,

aes(x = continent, y = n, fill = continent)) +

geom_waffle(aes(values = n), # geom_waffle n_rows = 9, # It has lots of o flip = TRUE) +

labs(fill = NULL) +

coord_equal() + # Make all the squares squ theme_void() # Use a completely empty them

(27)

Alternatives: Heatmaps

If exact counts are less important,

try a heatmap with geom_tile()

(28)

Proportions

(29)

Why proportions?

Sometimes we want to compare values across a whole population instead of looking at raw counts

Only do this when it makes analytical sense!

COVID-19 amounts vs. proportions

(30)

Pie charts

Perceptual issues with angle and ll space

Only okay(ish) if there are a few easily distinguishable categories

(31)

Alternatives

Bar plots

Any of the alternatives to bar plots Treemaps and mosaic plots

(but these can still be really hard to interpret)

(32)

Treemaps with the

treemapify package Mosaic plots with the ggmosaic package

Treemaps and mosaic plots

(33)

Alternatives

Bar plots

Any of the alternatives to bar plots Treemaps and mosaic plots

(but these can still be really hard to interpret)

Specialized gures like parliament plots

(34)

Parliament plots

Parliament plots with the ggparliament package

References

Related documents

The case of the emergency financing in Belarus is the only case in our sample in which a country combined all three elements of the global financial safety net: The EFSD

– Target young market that uses 100 to 300 minutes Market Research All Options Theory Application Calculation... Pros

The huge losses resulting from the natural catastrophes in 2011 and the myriad of issues that can arise when insureds come to make claims under their insurance and reinsurance

Eintragung: 18/10/2001 Nizzaer Klassifikation: 35, 36, 37, 38, 39, 40, 41, 42 Verfahrensstand: Eingetragen Veröffentlichung der Eintragung Name des Inhabers: WWF-World Wide Fund

• More than one-third of freelancers report that demand for their services increased in the past year, and nearly half expect their income from freelancing to increase in

We will check our proposal by arguing that all genus zero A/2 model correlation functions will match those of the B/2-twisted mirror Landau-Ginzburg theory given above, using

The research utilizing case study at Kemiren Tribe Village of District Banyuwangi has shown how the local wisdom and custom can be used as labor intensive enterprise in