Becoming an effective data storyteller
Marrying data visualisation with a guided narrative
My Story PwC: IT Consultant ResearchTNS: Psychology degree Mental Health LTSB:
Analyst Insight rolesAimia: CommsData
Maths degree
But this is how I tell it PwC: IT Consultant ResearchTNS: Psychology degree Mental Health LTSB:
Analyst Insight rolesAimia: CommsData
Maths degree
We can't help but form stories, links and causes If the story isn't there, we make it up
Exercise
Where
else do we come across
Newspapers Fiction Religion Traditions Games Film Television Brands Advertising Assemblies Theatre Conversation Sales pitches Presentations
What happens when we
Hold our
In normal life, we spin about
one-hundred daydreams per waking hour. But when absorbed in a good story…
we experience approximately zero daydreams per hour.
Hold our attention
“
”
When we read dry, factual arguments,
we read with our dukes up. We are critical and skeptical.
But when we are absorbed in a story
we drop our intellectual guard.”
Reference: The storytelling animal, Jonathan Gottschall
Influence
Food shortages in Malawi are affecting more than three million children. In Zambia, severe rainfall deficits have resulted in a 42% drop in maize production from 2000. As a result, an estimated three million Zambians face hunger. Four million Angolans — one-third of the
population — have been forced to flee their homes. More than 11 million people in Ethiopia need immediate food assistance.
Food shortages in Malawi are affecting more than three million children. In Zambia, severe rainfall deficits have resulted in a 42% drop in maize production from 2000. As a result, an estimated three million Zambians face hunger. Four million Angolans — one-third of the
population — have been forced to flee their homes. More than 11 million people in Ethiopia need immediate food assistance.
$1.14
per participant
Any money that you donate will go to Rokia, a seven-year-old girl who lives in Mali in Africa. Rokia is desperately poor and faces a threat of severe hunger, even starvation. Her life will be changed for the better as a result of your
financial gift. With your support, and the support of other caring sponsors, Save the Children will work with Rokia’s family and other members of the community to help feed and educate her, and provide her with basic medical care.
Food shortages in Malawi are affecting more than three million children. In Zambia, severe rainfall deficits have resulted in a 42% drop in maize production from 2000. As a result, an estimated three million Zambians face hunger. Four million Angolans — one-third of the
population — have been forced to flee their homes. More than 11 million people in Ethiopia need immediate food assistance.
$1.14
per participant
Any money that you donate will go to Rokia, a seven-year-old girl who lives in Mali in Africa. Rokia is desperately poor and faces a threat of severe hunger, even starvation. Her life will be changed for the better as a result of your
financial gift. With your support, and the support of other caring sponsors, Save the Children will work with Rokia’s family and other members of the community to help feed and educate her, and provide her with basic medical care.
Food shortages in Malawi are affecting more than three million children. In Zambia, severe rainfall deficits have resulted in a 42% drop in maize production from 2000. As a result, an estimated three million Zambians face hunger. Four million Angolans — one-third of the
population — have been forced to flee their homes. More than 11 million people in Ethiopia need immediate food assistance.
$2.38 per participant donation $1.14 per participant donation
63% could remember stories, but only 5% could remember a single statistic”
Reference: Made to Stick, Chip Heath and Dan Heath
“
Neuroscience shows that
stories activate many more parts of the brain than facts and figures alone
Emotional reaction also strengthens memory,
triggering the release of dopamine
Hold our
attention Influence Remember
Grocery
Christmas Number 1 2018
But how is this relevant for analysis?
The Dangers Of
Letting Data Speak For Itself
Ignaz Semmelweis
Semmelweis’s data met 3 key criteria
Cold data,
And he failed to visualize
his data
Without a compelling
narrative and supporting
visualisation, data can easily fail to land an important
Why? Because
analysts are hired to do analysis
Morning
Afternoon
Story
- Where and Why we use stories - What is story
- How to create stories Visualisation - Illustrating our story
Once upon a time…
We can tell when a story is good or bad.
But what is it that
Exercise What is a story?
What makes a good story?
There was no structure.
Just a series of events without any kind of drama.
A story refers to a sequence of events. It can be thought of as the raw material out of which a narrative is crafted
Plot is the
causeandeffect
sequence of
events in a story
Narrative connects events
and makes meaning.
It is a representation or specific manifestation
of the story
1979 An American Christmas Carol 1983 Mickey's Christmas Carol 1984 A Christmas Carol
1986 A Christmas Carol 1988 Scrooged
1992 The Muppet Christmas Carol 1998 Ebenezer
1999 A Christmas Carol 2000 A Christmas Carol
2001 Christmas Carol: The Movie 2005 Chasing Christmas
2009 A Christmas Carol 2012 A Christmas Carol
A Christmas Carol
Charles Dickens, 1843
1901 Scrooge, or, Marley's Ghost 1910 A Christmas Carol
1935 Scrooge
1938 A Christmas Carol 1949 The Christmas Carol 1951 Scrooge
1953 It's Never Too Late 1954 A Christmas Carol
1956 The Stingiest Man in Town 1969 A Christmas Carol
1970 Scrooge
1971 A Christmas Carol Animated 1978 A Christmas Carol
As data storytellers, our aim is to create an
audience-appropriate narrative A perspective on the data with a rich vein of WHY
The subject of our analysis lends
itself to story
Source: Gottschall, author of The Storytelling Animal
“Stories are almost uniformly about humans facing
problems and trying to overcome
Overcoming adversity
The Story Arc
Beginning Middle End
Exposition
Rising Action
Climax
Falling Action
“The narrative arc is not about recovering what the crisis took away; it’s about the protagonist growing into a better version of
themselves that they didn’t realize was possible before.”
Exercise
Map the plot of your favourite film
to check for the narrative arc
Reflect
How not to do it The anxious parade of knowledge
Our aim is to go beyond the ‘raw material’
4 steps to your data story
Curiosity Narrative Arc Personalise Restraint
4 steps to your data story
Curiosity part 1
Curiosity part 2
Your stakeholders will be searching for the root cause
. 0 10 20 30 40 50 60 70 80 2010 2011 2012 2013 2014 2015 2016 2017 2018 Annual staff turnover
Bad news can be well received, if you can identify the root
cause to be acted upon This informs the proposed resolution
Curiosity part 3
Curiosity
reveals a new story
Curiosity
reveals a new story
Strengthen your content Add supporting quotes and market research
Editorial thinking
This curiosity and search for a strong story is akin to the role of a jjournalist or photojournalist
The US
Mexico border
How would you photograph it?
Zoomed out, the big picture
A Story
World Press
Photo of the Year
Winner 2019
Subjectivity
And the power to manipulate the truth
Do not mislead your audience
Cleanse & Structure Populate missing values, treat outliers, dates, trim Explore & Familiarise Descriptive statistics (min, max, avg, count)
Transform & Enhance Group continuous into ordinal, combine/split fields Consolidate with supporting data Think about other angles, and data to supplement the alternative story
All this curiosity takes time and effort
Curiosity Narrative Personalise
Arc Restraint
Pixar’s 22 rules of storytelling Rule #3
Shift from:
Identify the
essence of your story.
Storyboard
RRemember the 3 act structure
Introduce the problem Start with the why
The hook
Their frame of reference Wide angle context
Beginning Middle End
Exposition
Rising Action
Climax
Falling Action
Address hypotheses
Bait: raise anticipated questions Conflict and tension are integral
Possibly reframe the problem Pivotal discovery
Beginning Middle End
Exposition
Rising Action
Climax
Falling Action
Resolution, restore order End in a better position Feel inevitable, no surprises
Actionable
Beginning Middle End
Exposition
Rising Action
Climax
Falling Action
Exercise: create an imaginary narrative arc from the following analytics brief
We have a gender pay gap of 30% Is our rate of promotion faster for men than for women?
The essence of my story
The gender pay gap of 30% isn’t driven by an imbalance in promotions, as suspected It’s due to 2 recruitment issues:
1. Women are coming in on lower pay bands for doing the same job as men 2. Recruitment into senior roles is heavily
Beginning Recently published gender pay gap is 30% No historical tracking data
Industry benchmark is 24%
Pay gap has been brought up in development conversations
Widely held hypothesis: women are being held back from promotion
Middle There is no gender difference in the rate of promotion
Even if split by grade, and tenure
However there are nuances by department So looked at other potential drivers
Found women being recruited into lower pay bands for the same role At all levels
Also recruitment into senior roles skewed towards men
End Therefore can share positive news on gender balanced promotion But 2 aspects of recruitment to address
Curiosity Narrative Personalise
Arc Restraint
If we fill our stories with caricatures and
cardboard cutouts,
they're sure to fall flat
Source: 33 Ways to Write Stronger Characters by Kristen Kieffer
” “
Humanise
“If you’re a man coming in as a senior engineer you’re going to get ££42k, but if you’re a woman
you will only receive £35k for doing exactly the same job”
Personal relevance
If they have an objective to increase staff retention: “22% of our female staff left the business last year, vs 14% of male
staff. Could our pay gap be contributing to this?”
Frame it in their language to increase relatability
Retention, attrition or churn
Make it tangible
The Dangers Of
Letting Data Speak For Itself
Ignaz Semmelweis
Curiosity Narrative
Arc Personalise Restraint
“The secret of being a
bore is to tell everything”
Remove the friction
, Tangents to the conclusion
, Embellishments
And
get to the
point
Pixar’s 22 rules of storytelling Rule #5
“Simplify. Focus. Combine characters. Hop over detours. You’ll feel like you’re losing valuable stuff but it sets you free..”
Leave time for this important step
“I was going to write a shorter letter, but I didn't have time so I
wrote you a long one instead” Mark Twain
Now that we have
a well-crafted story
Illustrators breathe life into the
Essential for communicating data to those with a visual
learning preference
AUDITORY
A visual
representation is far more
convincing than a table of data
Doesn’t the
software do
the visualizing
for us?
Will we be
creating fancy
infographics?
Let’s consider what we want
our audience to do
Exercise:
Sketch as many graphs as
you can to represent this data
Store Online
EMEA £7m £3m
Asia-PAC £2m £2m
AMER £5m £6m
So, which is the best chart?
We need to be specific with the message we intend to communicate
Headlines need to tell the story TThey should be ’active’, not titles
What is your interpretation?
“The campaign generated income of £2.7k”
Is this a good or bad outcome?
Metrics should be framed
- Is it better or worse than target?
- How does the picture compare to other campaigns / other marketing? - Is the picture improving over time?
Headlines enable us to marry the story and the visualisation
Headline Construct Declutter Extract
4 steps to data visualisation
Our visualisations also need to be informed by how
Psychology
After 20
seconds we will sketch the clock
You have 1 minute to sketch the clock
How did you do?
Why do
we get the 4 wrong?
We have mental models (schemas) to conserve our
We are prone to inattentional blindness
We only see what we’re attending to
And we see
Unless I present the image in a more intentional way
Unless I present the image in a more intentional way
Without
direction, we
experience the same image in different ways
6B ;B 76B 7;B 86B 8;B 96B &+% ()!"' +))+#++()*+%*" *(+ #"% )**) *(","% $"#") *." !&+(!&&) *(*"% +* &&((%)"&%() &*(",* &+)!&#) &(%%)*+(&%. ,")!")*.#) -+*",#*! ""+#*"(+$)*%) &+%*(.)"&$$+%"*") &$&(*#%"&()"*.&'!")*"*) ((#"$() #"%*: #"%*9 #"%*8 #"%*7 Without direction, we experience the same image in different ways
6B ;B 76B 7;B 86B 8;B 96B &+% ()!"' +))+#++()*+%*" *(+ #"% )**) *(","% $"#") *." !&+(!&&) *(*"% +* &&((%)"&%() &*(",* &+)!&#) &(%%)*+(&%. ,")!")*.#) -+*",#*! ""+#*"(+$)*%) &+%*(.)"&$$+%"*") &$&(*#%"&()"*.&'!")*"*) ((#"$() #"%*: #"%*9 #"%*8 #"%*7 Even with direction, we struggle to interpret complicated information
How complicated is too complicated?
Together, a bat and ball cost £1.10. The bat costs £1 more than the ball.
H
We avoid cognitive effort
The Bat & Ball puzzle illustrates that many people find cognitive effort at least mildly unpleasant and avoid it as much as possible
We default to fast and intuitive thinking
SSystem 1 is fast, intuitive and emotional. Pre-conscious
System 2 is slower, more
We are constrained in our capacity to think
Sensory
Memory WWorking Memory Long-Term Memory
Incoming information Forgotten Forgotten Retrieval Encoding Rehearsal Attention
How can we support working memory with the interpretation of this data? Bristol London 2010 13 11 2011 15 20 2012 13 16 2011 12 32 2014 16 32 2015 19 26 2016 17 41 2017 22 38 2018 23 48 Annual staff turnover
A picture is worth a thousand words Bristol London 0 10 20 30 40 50 60 2010 2011 2012 2013 2014 2015 2016 2017 2018 Annual staff turnover
‘Chunking up’
information supports our capacity to think
In summary, our cognitive
capacity is limited
1 We are not supposed to notice every detail
1 Schemas
1 Inattentional blindness
1 We notice different things, influenced by our different experiences 1 System 1 is prone to dominate with fast intuitive thinking
What is your
interpretation of this graph?
System 1 is drawn to the differential heights
…marginally NSW is recruiting more nurses
But what if our intention was to exaggerate the story?
COGNITIVE EASE instils trust in
the message being communicated
Something that is easy, cognitively speaking, feels familiar, true,
good and effortless.
“When you are in a state of cognitive ease, you are probably in a good mood, [you] like what you see, believe what you hear, trust your intuitions and feel that the current situation is comfortably familiar.”
Tenure Customer
value
Low-value
Established Low-valueNew
High-value
Established High-valueNew
EEstablished New High
Low
Tenure Customer
value
High-value
New EstablishedHigh-value
Low-value
New EstablishedLow-value
N
New Established Low-value
High-value
Do simple charts undermine the
complexity of our work?
There are 100s of choices out there, with Software providers often priding themselves
on their huge selection
Decision fatigue: Too much choice can be paralyzing,
stressful, or result in the wrong choice being made
Choosing the right chart requires an understanding of data types
N
Nominal Ordinal Interval
Discrete items, in a single category, that don’t
relate to one another
E.g. regions,
departments, gender
Items that have an
intrinsic order, but do not correspond to
quantitative values E.g. rankings,
high/medium/low value
Ordered, equal intervals of quantitative values
E.g. ranges of values, time (even months), profit
For nominal and ordinal data,
bar graphs are effective as they emphasise the distinct nature of the categories £0 £2 £4 £6 £8 £10 £12 £14 £16 £18 A B C D E F G Sales by Region (£m)
Long categorical labels do not work with vertical bars
0 10 20 30 40 50 60 70 80 90 Busin ess D evelo pmen t Talen t and C ulture Lega l Client Mana geme nt Marke ting a nd… Tech nology and P rodu ct Finan ce
The labels take up valuable chart
space, but are still difficult to read 0 10 20 30 40 50 60 70 80 90 Busin ess D evelo pmen t Talen t and C ulture Lega l Client Mana geme nt Marke ting a nd… Tech nology and P rodu ct Finan ce
Use horizontal bars instead
But people search for meaning
0 20 40 60 80 100
Business Development Talent and Culture Legal Client Management Marketing and Communications
Technology and Product Finance
Give meaning to the order Sort by magnitude 0 20 40 60 80 100 Legal Finance Talent and Culture Business Development Marketing and Communications
Client Management Technology and Product
But don’t re-order ordinal data
The meaning is already there
0 20 40 60 80 100 E A+ A D Unrated C B
But don’t re-order ordinal data
The meaning is already there
0 20 40 60 80 100 Unrated E D C B A A+
Ordinal data usually presents better vertically 0 10 20 30 40 50 60 70 80 90 A+ A B C D E Unrated
Bar graphs with multiple series cannot be interpreted effectively when crammed into a single graph 6B ;B 76B 7;B 86B 8;B 96B ,")!")*.#) -+*",#*! *+(&%. "*.&'!")*"*) ((#"$() &+%*(.)"&$$+%"*") +))+#++() *." !&+(!&&) &$&(*#%"&() *(*"% +* *+%*" &(%%) *(","% $"#") &&((%)"&%() &+% ()!"' *(+ #"% )**) ""+#*"(+$)*%) &*(",* &+)!&#) #"%*: #"%*9 #"%*8 #"%*7
Small multiples can be an effective alternative 6B ;B 76B 7;B 86B 8;B ,")!")*.#) -+*",#*! *+(&%. "*.&'!")*"*) ((#"$() &+%*(.)"&$$+%"*") +))+#++() *." !&+(!&&) &$&(*#%"&() *(*"% +* *+%*" &(%%) *(","% $"#") &&((%)"&%() &+% ()!"' *(+ #"% )**) ""+#*"(+$)*%) &*(",* &+)!&#) Client Four 6B ;B 76B 7;B 86B 8;B ,")!")*.#) -+*",#*! *+(&%. "*.&'!")*"*) ((#"$() &+%*(.)"&$$+%"*") +))+#++() *." !&+(!&&) &$&(*#%"&() *(*"% +* *+%*" &(%%) *(","% $"#") &&((%)"&%() &+% ()!"' *(+ #"% )**) ""+#*"(+$)*%) &*(",* &+)!&#) Client Three 6B ;B 76B 7;B 86B 8;B ,")!")*.#) -+*",#*! *+(&%. "*.&'!")*"*) ((#"$() &+%*(.)"&$$+%"*") +))+#++() *." !&+(!&&) &$&(*#%"&() *(*"% +* *+%*" &(%%) *(","% $"#") &&((%)"&%() &+% ()!"' *(+ #"% )**) ""+#*"(+$)*%) &*(",* &+)!&#) Client Two 6B ;B 76B 7;B 86B 8;B ,")!")*.#) -+*",#*! *+(&%. "*.&'!")*"*) ((#"$() &+%*(.)"&$$+%"*") +))+#++() *." !&+(!&&) &$&(*#%"&() *(*"% +* *+%*" &(%%) *(","% $"#") &&((%)"&%() &+% ()!"' *(+ #"% )**) ""+#*"(+$)*%) &*(",* &+)!&#) Client One
To compare the distribution of 2 bars a
back-to-back graph might be a useful
A ‘dumb-bell plot’ draws attention to the difference
between 2 values for this nominal data set
The height of bars is meaningful; it
encodes the data
£11 £12 £13 £14 £15 £16 £17 A B C D E F G Sales (£m)
Never chop the height of bars
It distorts the meaning
£0 £2 £4 £6 £8 £10 £12 £14 £16 £18 A B C D E F G Sales (£m)
Is it ok to adjust the axis for line graphs?
Are line graphs an option for nominal data?
We are drawn to the shape of the distribution, which is arbitrary for nominal data
A star plot is a back-to-back line graph 766B *!) !.)") !$")*(. % #")! "&#& . ($ '&(* "(#) &.)
Anecdotally, it works for the evaluation of sports players
There is appeal in the CIRCLE
Softer than bar/line graphs Emphasises completeness
We struggle to interpret the relative size of values encoded within 2D areas
Which category contributes the most to sales? A B C D E F G SSales by Category
Heights of bars are much easier to
compare
The same data
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% A B C D E F G Sales by Category
What other
reasons are there to be cautious?
But… pies do emphasise that the components are parts of a whole
Obvious the components sum to 100% Not obvious the components sum to 100%Solution: make it clear in the title
A B C D E F G Sales by Category 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% A B C D E F G
Avoid the pie chart
The only exception
Maximum 3 segments
It’s important to represent ‘the
whole’
Each segment differs considerably
A donut chart is just pie with a hole
Unless your intention is to illustrate a
diverse and
Interval data can be represented by a bar or line graph
7 8 9 : 7 8 9 :
6 866 :66 <66 >66 7/666 7/866 7/:66 7/<66 31 3608 360: 360< 360> 3706 3708 % ( '( . +% +# + ' * &, U ni ts S ol d (1 ,0 00 ’s ) Re ve nue (m)
Revenue and Units Sold by Month
,%+ %"*)&#
What about dual axes? 6 866 :66 <66 >66 7/666 7/866 7/:66 7/<66 31 3608 360: 360< 360> 3706 3708 % ( '( . +% +# + ' * &, U nits Sold Re ve nue (£m)
Revenue and Units Sold by Month
,%+ %"*)&# 6 866 :66 <66 >66 7/666 7/866 7/:66 7/<66 31 3608 360: 360< 360> 3706 3708 370: 370< 370> 3806 % ( '( . +% +# + ' * &, U nits Sold Re ve nue (£m)
Revenue and Units Sold by Month
,%+ %"*)&#
An alternative solution to compare two related sets of values
0% 50% 100% 150% 200% 250% 300%
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Revenue and Units by Month indexed to January
Variations of the column and line graphs cover the majority of charts used for analysis
Scatter plots
enable correlations to be identified
Variations of the column and line graphs cover the majority of charts used for analysis
Waterfalls are effective for
Sometimes a graph just isn’t necessary
18% 2018 staff
turnover rate
Each element of a visualisation should be considered for redundancy “Chart-junk refers to all visual elements in charts
and graphs that are not necessary to comprehend the information represented on the graph, or that
distract the viewer from this information.”
Decluttering to bring joy
Product design
'Ten principles' of good design:
Good design is innovative, useful, and aesthetic. Good design should be make a product easily understood. Good design is unobtrusive, honest, durable, thorough, and concerned with the environment.
M
Even for a
simple set of
data, the Excel default is full of ‘chartjunk’ 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% A B C D
Response rate by campaign
Test Control
0% 20% 40% 60% 80% 100% A B C D
Response rate by campaign
Test Control
Excel makes it easy to add
0.0% 20.0% 40.0% 60.0% 80.0% 100.0% A B C D
Response rate by campaign
Test Control What message does the default convey?
0.0% 20.0% 40.0% 60.0% 80.0% 100.0% A B C D
Response rate by campaign
Test Control
The border adds to the chartjunk
0.0% 20.0% 40.0% 60.0% 80.0% 100.0% A B C D
Response rate by campaign
Test Control
The labels are quite dominant