• No results found

Exercises for Data Visualisation for Analysis in Scholarly Research

N/A
N/A
Protected

Academic year: 2021

Share "Exercises for Data Visualisation for Analysis in Scholarly Research"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Exercise  1:  exploring  network  visualisations  

1. In  your  browser,  go  to  http://bit.ly/11qqXuj   2. Scroll  down  the  page  to  the  network  graph.    

3. Take  a  few  minutes  to  explore  the  visualisation:  try  holding  the  cursor  over   items,  clicking,  dragging,  etc.  

4. You  can  see  the  same  data  represented  in  other  graphs  below.  

5. Discuss  with  your  neighbour:  does  interacting  with  the  network  graph  give  you   more  or  less  information  than  the  other  visualisations?  Does  it  open  up  new   questions?    

Time:  approx  5  minutes    

Exercise  2:  comparing  N-­gram  tools  

NB:  in  both  tools,  copyright  affects  the  availability  of  20th  century  books.  Transcription   errors  may  also  affect  results,  particularly  for  older  books  (e.g.  the  'long  s'  vs  f)  

1. Think  of  two  words  or  phrases  you’d  like  to  compare  over  time  (e.g.  Burma,   Burmah).  

2. Open  two  browser  windows  

3. In  one,  go  to  http://books.google.com/ngrams     4. In  the  other,  go  to  http://bookworm.culturomics.org    

5. Enter  your  words  or  phrases  in  each  and  compare  the  results  

6. Discuss  with  your  neighbour:  what  differences  did  you  find,  and  what  might  have   caused  them?  

Google  Ngram  tips:  http://books.google.com/ngrams/info  

Background  information  on  Bookworm:  http://bookworm.culturomics.org/OL.html     Bookworm  tips:  click  the  'cog'  icon  next  to  the  'i'  to  change  the  time  period  or  click  the   underlined  words  next  to  the  search  term  to  change  which  books  are  searched  (e.g.   subject,  language,  country,  gender  of  author).  [Image  on  slide]  

(2)

 

Exercise  3:  scholarly  visualisations  

In  pairs,  you  will  be  asked  to  pair  with  your  neighbour  and  explore  and  discuss  one  of   the  following  visualisations:  

University  of  Richmond,  “Visualizing  Emancipation”   http://www.americanpast.org/emancipation/  

Further  information:  http://dsl.richmond.edu/emancipation/about/  

http://dirt.terrypbrock.com/2012/04/visualizing-­‐emancipation-­‐examining-­‐its-­‐process-­‐ through-­‐digital-­‐tools/    

Stanford  "Mapping  the  Republic  of  Letters"  

http://www.stanford.edu/group/toolingup/rplviz/rplviz.swf      

Further  information:  http://openglam.org/2012/03/21/mapping-­‐the-­‐republic-­‐of-­‐ letters/,  http://danbri.org/words/2010/11/22/603,  

http://republicofletters.stanford.edu/tools/     GAPVis  

http://gap.alexandriaarchive.org/gapvis/index.html#index  

Further  information:  http://googleancientplaces.wordpress.com/     Digital  Harlem  ::  Everyday  Life  1915-­1930  

http://www.acl.arts.usyd.edu.au/harlem/  

Further  information:  http://digitalharlemblog.wordpress.com/,   http://writinghistory.trincoll.edu/evidence/robertson-­‐2012-­‐spring/     Instructions  

1. In  your  browser,  go  to  suggested  site  

2. Take  a  few  minutes  to  explore  the  visualisation    

3. Discuss  with  your  neighbour:  what  do  you  think  is  being  presented  here?  What   stories  or  trends  can  you  start  to  see?    Does  it  work  better  at  one  scale  over   another?  Do  you  find  it  more  effective  at  aggregate  or  detail  level?  Does  it  present   an  argument  or  provide  a  space  to  develop  and  explore  one?  If  it  was  designed  to   present  an  argument  or  investigate  a  particular  question,  what  do  you  think  that   was?  What  have  you  learned  from  visualisation  that  you  might  not  have  learned   from  looking  at  the  data  or  reading  text  about  it?  

4. Report  back  to  the  group:  summarise  the  site's  purpose,  visualisation  formats   and  data  types  in  a  sentence,  then  share  the  most  interesting  parts  of  your   discussion  

(3)

Exercise  4:  cleaning  data  in  Refine  

In  this  exercise,  we  will  be  cleaning  an  existing  dataset  to  prepare  it  for  visualisation  as  a   chart  and  map  in  the  next  exercise.  We  want  to  investigate  different  disciplines  within   this  data  later,  so  we  will  review  the  data,  remove  rows  that  are  missing  certain  values   (in  real  life  we  would  try  to  do  some  research  to  fill  in  the  gaps  before  resorting  to  this!),   and  make  the  remaining  values  more  consistent.  Don't  worry  if  you  make  a  mistake,  you   can  click  'Undo/Redo'  and  go  back  as  many  steps  in  your  history  as  you  need.  

1. Start  the  Refine  application.  (If  there  isn't  a  shortcut  for  it  on  the  computer  

desktop,  go  to  the  Start  menu  and  search  for  Refine).  Double-­‐click  the  .exe  file  (on   Windows)  to  start  the  application.  It  will  open  a  page  in  your  browser.  Tip:  if  it   opens  in  Internet  Explorer,  copy  the  page  address  from  the  URL/location  bar  and   paste  it  into  Chrome  or  Firefox.  

2. Create  a  new  project.  [See  slide  'Exercise  4:  data  cleaning  in  Refine']  

a. On  the  left-­‐hand  side  of  the  browser  page,  click  on  'Create  project'.  Under   'Get  data  from'  select  'This  Computer'.  Browse  to  the  file  Inspiring  Women   through  History_April2013.xlsx,  load  and  select  'Next'  

b. If  the  preview  rows  look  ok,  click  'Create  Project'.    

c. Spend  some  time  looking  over  the  data.  You  will  notice  that  not  every  field   is  been  filled  in  for  each  row,  and  that  the  level  of  detail  varies  within  a   column  (e.g.  locations).  

3. Start  'cleaning'  the  data:  remove  rows  where  'Discipline'  has  been  left  blank.     a. Click  the  down  arrow  in  the  Discipline  column;  select  Facet;  Text  Facet  

[see  slide]  

b. Refine  should  open  a  new  block  on  the  left-­‐hand  side  of  the  screen,  listing   all  the  different  disciplines  in  the  dataset.  

c. Scroll  to  the  bottom  of  the  box  and  click  on  '(blank)'.  

d. This  should  change  the  rows  on  the  right-­‐hand  side  so  only  rows  without  a   discipline  are  listed.  

e. To  remove  these  rows  from  the  dataset,  click  the  down  arrow  next  to  'All'   (start  of  the  row);  select  Edit  rows;  Remove  all  matching  rows.    

f. This  should  remove  23  rows.  

4. Let  Refine  find  close  matches  in  column  values.  

a. Click  'reset'  in  the  blue  bar  of  the  left-­‐hand  Discipline  box  (this  resets  the   previously  selected  rows  on  the  right-­‐hand  side).  

b. Click  'Cluster'  on  the  left-­‐hand  Discipline  box.  

c. A  new  block  should  pop-­‐up,  showing  'Painter'  and  'Painter.',  which  Refine   has  detected  as  being  close  matches.  

d. Tick  the  'merge'  box  and  then  click  'Merge  selected  and  close'.   e. Congratulations!  You  have  merged  some  messy  records!    

(4)

5. Manually  review  and  simplify  the  number  of  disciplines  listed.  In  this  exercise  we   will  manually  reduce  the  number  of  Disciplines  in  the  dataset  by  merging  them  so   that  it's  easier  for  them  to  be  displayed  in  a  simple  visualisation.    

a. Click  on  'count'  in  the  line  'Sort  by  name  count'.   b. The  most  common  disciplines  should  be  at  the  top.  

c. Review  the  disciplines.  There  is  one  row  each  for  Biologist,  Botanist,   Chemist,  Natural  Historian  and  Inventor.  We  will  merge  each  of  them  with   'Scientist'.  

d. Hover  your  cursor  over  'Inventor'  so  that  'edit'  appears  as  an  option.  Click   'edit'.  

e. In  the  box  that  appears,  type  'Scientist'  and  click  'Apply'.  

f. Repeat  for  the  rows  for  Biologist,  Botanist,  Chemist  and  Natural  Historian.   g. There  should  now  be  13  rows  for  Scientist.  

h. Edit  any  other  Disciplines  as  you  wish.  You  can  review  the  values  by  

clicking  on  the  Discipline  name  and  scrolling  through  the  rows  to  work  out   which  values  to  simplify.  For  example,  Poet  might  become  Writer,  Surgeon   and  Doctor  might  both  become  Medicine.  

6. When  you're  done  tidying  the  values  for  Discipline,  click  the  'x'  next  to  Discipline   in  the  left-­‐hand  side  box  to  'remove  this  facet'.  

7. Repeat  the  steps  above  to  remove  rows  where  'Birth_place'  is  missing.  

8. If  you  wish,  repeat  step  4  (Let  Refine  find  close  values)  on  the  Birth  place  field  to   tidy  up  uncertain  values  like  'London?'  

9. Optional:  remove  numeric  rows  with  missing  values.  Because  we  need  a  year  to   build  a  timeline,  we'll  remove  rows  without  a  currently  known  birth  year.  

a. Scroll  sideways  through  the  columns  until  'Birth_year'  is  on  the  screen.   b. Click  the  down  arrow  next  to  'Birth_year';  select  Facet;  Numeric  facet.   c. A  'Birth_year'  block  should  appear  on  the  left-­‐hand  side  

d. Untick  the  'numeric'  box.  

e. Only  rows  without  a  number  in  'Birth_year'  should  be  shown  now.  

f. Scroll  back  to  the  left  until  'All'  is  showing.  To  remove  these  rows  from  the   dataset,  click  the  down  arrow  next  to  'All'  (start  of  the  row);  select  Edit   rows;  Remove  all  matching  rows.    

g. Close  the  'Birth_year'  box  on  the  left-­‐hand  side  by  clicking  the  'x'.   10.To  export  your  data  (ready  for  use  in  another  application),  click  on  'Export'  on  

the  right-­‐hand  side,  select  an  option  like  Excel  or  comma-­‐separated  value  and   save  the  file.  

Congratulations,  you  have  cleaned  data  with  Refine!  We  now  have  data  we  can  use  to   make  a  nice  timeline.  You  can  use  the  same  principles  to  clean  or  reduce  the  complexity   of  huge  datasets.  You  can  find  out  more  at  http://freeyourmetadata.org/cleanup/  or   https://github.com/OpenRefine/OpenRefine/wiki/Documentation-­‐For-­‐Users    

(5)

1. In  your  browser,  go  to  http://nlp.stanford.edu:8080/corenlp/process    

2. Find  a  short  paragraph  of  text  (e.g.  from  a  news  site  or  digitised  text)  to  paste   into  the  box  

3. How  many  of  the  things  (concepts,  people,  places,  events,  references  to  time  or   dates,  etc)  you  recognise  did  it  pick  up?  Is  any  of  the  other  information  presented   useful?  

Time:  approx  5  minutes    

Exercise  6:  create  a  pie  chart  using  Google  Fusion  Tables  

1. Go  to  https://drive.google.com/  and  log  into  Google  (if  you  aren't  already)   2. Go  to  http://bit.ly/Xw0zNJ  (or  

https://www.google.com/fusiontables/DataSource?dsrcid=implicit&redirectPat h=data&usp=apps_start&hl=en)  to  access  Fusion  from  your  account.  

3. You  should  see  a  screen  'Import  new  table'  and  'From  this  computer'.  

4. Click  'Choose  file',  find  the  file  you  exported  from  Refine  on  your  computer  and   upload  it.  

5. Leave  the  default  options  and  click  'Next'  until  the  screen  asks  you  to  choose  a   table  name,  'attribute  data  to',  enter  a  description,  etc.  Fill  in  the  values  as   appropriate  and  click  'Finish'.  

6. Check  that  you  are  in  'Classic'  rather  than  'New'  mode  in  Google  Drive.  If  the   menu  items  along  the  page  say  'File',  'View',  'Edit',  'Visualize',  'Merge',  'Labs',  you   are  in  Classic  mode.  If  you  are  in  New  mode,  click  on  the  'Help'  menu  and  select   'Back  to  Classic  look'.  

7. From  the  Visualize  menu,  choose  'Pie'.  [See  slide  'Pre-­‐aggregation  view  of  a  pie   chart']  

8. Click  'options'  then  'Aggregate'.  

9. In  the  'Aggregated  by'  section,  select  'Discipline',  and  click  'Apply'.  [Slide:  'Change   options,  aggregate  by  Discipline']  

10.Set  the  Entity  to  Discipline  and  the  value  to  Count.   11.You  should  have  a  pie  chart  of  your  data!  

 

Time:  approx  5  minutes    

(6)

 

Exercise  7:  geocoding  data  and  creating  a  map  using  Google  Fusion  Tables  

Google  Fusion  Tables  can  geocode  data  directly  from  the  table,  but  it  sometimes  needs   some  help.  Fusion  will  have  recognised  some  columns  as  containing  location  data,  but  it   will  not  know  much  about  those  locations.  The  best  way  to  see  how  it  copes  with  a   dataset  is  to  geocode  it  and  look  over  the  resulting  map  to  check  that  records  have   ended  up  in  the  right  place.    

Before  you  begin,  remove  any  aggregated  fields  or  filters  set  in  the  last  exercise:  click   'options',  then  'Clear  filter'  in  the  Filter  tab  view  and  'Clear  aggregation'  on  the  

'Aggregate'  tab  view.  Change  into  'New'  from  'Classic'  mode  by  clicking  'Switch  to  new   look'  on  the  top  right-­‐hand  corner  of  the  browser  window.  If  you  can't  see  that  option,   let  me  know!  

Geocode  your  data  

1. From  the  Visualize  menu,  choose  'Table'   2. From  the  File  menu,  choose  'Geocode'  

3. Change  the  value  for  'Location  column'  to  'Combined_birthplace_location'  and   click  'Begin  geocoding'.  (The  Combined_birthplace_location  field  hopefully   contains  enough  information  to  stop  London,  England,  being  confused  with  

London,  Ontario.)  

4. Wait  a  bit...  

5. Congratulations,  you  have  geocoded  some  data!   Create  a  map  from  your  geocoded  data  

6. Click  the  '+'  at  the  end  of  the  row  that  starts  with  the  File  option  

7. Click  the  down  arrow  next  to  the  new  maps  name  and  change  the  value  for  'Select   location'  to  the  field  Combined_birthplace_location.  

8. Congratulations!  You've  created  a  map!  

If  you  have  extra  time  or  curiosity,  try  changing  the  options  under  'Change  info  window   layout...'  and  'Change  map  styles...'.  

Time:  approx  10  minutes    

(7)

Exercise  8:  Choose  your  own  adventure  

Options:  

• explore  and  analyse  more  visualisations  

• try  making  different  visualisations  with  the  current  dataset   • try  creating  visualisations  with  your  own  data  

Exploring  and  analysing  more  visualisations  

There  are  links  to  visualisation  blogs  and  other  specialist  sites  on  the  Resources  post  at   http://bit.ly/UJwgEz  (i.e.  http://www.miaridge.com/resources-­‐for-­‐data-­‐visualisation-­‐ for-­‐analysis-­‐in-­‐scholarly-­‐research/)  

For  each  visualisation,  you  might  like  to  consider:  what  sources  have  they  used?  Do  they   explain  how  they've  prepared  them?  What  effect  have  their  choices  of  visualisation   formats  and  tools  had?  What  data  or  queries  are  prioritised,  and  which  are  more  difficult   or  impossible?  

If  you  have  a  particular  type  of  data,  process,  format  or  audience  in  mind,  ask  for   suggestions  for  sites  to  look  at!  

Making  more  visualisations  

ManyEyes  is  a  useful  tool  for  learning,  but  as  it  uses  Java  it  can  be  tricky  in  classroom   situations.  You  can  create  visualisations  from  data  other  people  have  uploaded  without   signing  up  at  ManyEyes:  http://www-­‐

958.ibm.com/software/analytics/manyeyes/datasets  

You  can  also  try  visualising  the  data  from  the  British  Library  Pin-­‐a-­‐tale  project,  available   in  Google  Docs  at  http://bit.ly/WT1Ai5    

Try  a  visualisation  and  evaluate  the  results.  Is  more  cleaning  or  transformation  needed?   You  may  need  to  iterate  with  different  versions  of  your  data  after  cleaning  or  enhancing   it.  

If  you  have  your  own  dataset,  review  the  'planning'  slides.  What  do  you  want  to  learn  or   express  about  your  data?  

The  ManyEyes  site  provides  some  guidance  on  the  best  visualisation  types  for  different   types  of  data:  http://www-­‐

958.ibm.com/software/analytics/manyeyes/page/Visualization_Options.html  which   can  also  be  useful  for  planning  visualisations  in  Google  Fusion.  

More  data  cleaning  and  linking  (reconciling)  to  other  data  

You  may  want  to  go  back  to  Refine  and  do  more  cleaning,  or  try  reconciling  the  data  to   pull  in  more  related  information.    

(8)

For  example,  you  could  find  rows  with  more  than  one  Spouse  or  Place  lived  listed,  and   separate  the  values  into  different  columns.  You  could  experiment  with  different  levels  of   location  values  -­‐  does  it  matter  whether  England  or  United  Kingdom  is  listed?  You  can   find  out  more  about  cleaning  inconsistent  values  at  

https://github.com/OpenRefine/OpenRefine/wiki/Clustering  and   https://github.com/OpenRefine/OpenRefine/wiki/Clustering-­‐In-­‐Depth   There  are  some  instructions  for  reconciling  data  at  

http://freeyourmetadata.org/reconciliation/    

References

Related documents

Competitive Intelligence and Benchmarking Cost Modeling and Cost Forecasting Country Entry Strategy Consumer Survey & Feedback Reports Market Research & Business

The programme was occasioned by the realisation that buried archaeological remains related to the siege of Leith, which took place in 1560, may survive in these localities.. 160

Troubleshooting FIO 2 Module Problems 4-28 Troubleshooting Temperature Module Problems 4-30 Troubleshooting CO 2 Module Problems 4-32 Troubleshooting Blood Analysis Module

The critical defect length leading to thermal runaway is determined as a function of the current decay time constant s dump , RRR of the SC cable copper matrix, RRR of the bus

This involved significant work navigating DFID systems in order to get PERL off the ground as an adaptive programme – including conversations with procurement staff to ensure the

The Spaniards initial goals for the Illinois country were very similar to those of the British across the river Spain needed to exercise m ilitary control over

7 A resort is considered as major if its attendance reaches over 1 million skier visits per winter season... Most of the industry is concentrated around the resorts that generate

The thesis statement in Lucy’s subsequent scene with Elinor, Edward, and Marianne (Jory, Sense and Sensibility 49-51) is the fact that Lucy is becoming increasingly involved in