Relationship between projects - Basic global statistics of citizen science platforms and data p

7 Environmental Citizen Science biological recording projects

7.2 Basic global statistics of citizen science platforms and data preparation

7.4.1 Relationship between projects

To compare the three platforms in a grid by grid basis using the number of contributors to each project as a comparison metric, two methods have been used: the design of scatterplot diagrams and the estimation of the Spearman’s rank test. It is run through couples of projects and gives ⍴ as a measure of the correlation between the two ranked collections and it measures the statistical significance of the result obtained. The scatterplot method provides a visual impression of the correlation grid by grid. To design the scatterplot, the contributor ratio (Cr) was used. The visual analysis of scatterplots, while it might lead to misleading interpretations due to overlapping points, is very helpful to find outliers. For the calculation of the Spearman’s rank test, the use of this ratio is not needed since this statistical method ranks the values for every project and compares the ranks but does not consider the relative magnitudes.

The spatial distributions of volunteer productivity were significantly correlated between datasets: GiGL and iSpot ρ (1778) = 0.235, p < 0.001; GiGL and iRecord: ρ (1778) = 0.271, p < 0.001; iSpot and iRecord: ρ (1069) = 0.333, p < 0.001.

The correlation coefficients are not particularly high, but the results suggest that the same grid cells are broadly more or less productive for all datasets for both number of records per volunteer. Although the total number of records per grid cell summed across datasets was not uniform across the study area the spatial distribution of records was significantly correlated between datasets (Spearman’s Rank correlation); GiGL and iSpot: ρ (1778) = 0.373, p < 0.001; GiGL and iRecord ρ (1778) = 0.402, p < 0.001; iSpot and iRecord: ρ (1069) = 0.544, p < 0.001. The total number of volunteers recording in each grid cell was also not uniform. The number of records per grid cell was highly correlated with the number of volunteers, ρ (1801) = 0.795, p < 0.001. The number of contributors per grid cell was quite mildly correlated among the three platforms: GiGL and iSpot ⍴=0.435, p < 0.001; iSpot and iRecord ⍴=0.4310229, p<0.001; iRecord and GiGL ⍴=0.422, p<0.001. The scatterplot diagrams, given the mild results of the Spearman Rank test, do not give a strong specific image of the spatial correlation

191 between organisations. But as explained in the previous chapters, while they cannot be very effective for correlations with elevated elements, they are very useful once we focus on the detection and characterisation of outliers.

192

The scatterplot diagram between GiGL and iSpot Cr in Figure 7-7, shows how important the London Wetland Centre83_{and the River Thames are for both projects. Amongst the}

five main outliers for both projects, we have two grids including the London Wetland Centre. Especially the grid (TQ2277), which is the top performing grid unit for GiGL, while is the third for iSpot. Almost all the other outliers have diverging values with relatively high values for GiGL's Cr, accompanied by low Cr values for iSpot. The second highest Cr value for GiGL (TQ5280) including the Rainham Marshes in the borough of Havering, a wet area close to the Thames, is even located in a grid where there are no iSpot observations at all. While looking at the grids with significant Cr for iSpot but relatively low values for GiGL, we have another grid covering the London Wetland Centre (TQ2276) and St. James' Park and Westminster Abbey in Westminster. The grid unit including the River Thames and the back part of Westminster Abbey also among outliers with significant contribution from iSpot (TQ3079). The neighbouring grids with high significance in one or another project witness how the reliance on one platform only can give a partial picture of a phenomenon. This picture is enriched by the aggregation of platforms and the geostatistical analysis. Another outlier at a closer look seems not really fit for environmental recordings. It is the grid TQ3281. It covers the Barbican centre, parts of St Paul's Cathedral's gardens and a highly urbanised area in the City of London up to Leadenhal Market. Focusing on both statistics and original geospatial data we have in this grid 384 observations mainly coming from the iSpot dataset84_{. The iSpot dataset}

contributed with 56 but several of them are misplaced since we have here observations with locational descriptions that refer them to places in Kent85_{, South Gloucestershire}86_,

Devon87_{, and other areas of London out of this grid unit. We can conclude that the}

geolocalisation of observations is wrong and they do not relate to this grid unit. This proves how important is the motivation to collect and at the end to manipulate geographic information which is not the main aim of the data collection.

In Figure 7-8 we have the scatterplot for the relation between iRecord and iSpot. Apart from the most evident iSpot outliers mentioned above, we can focus here on the five grid units where the contributor ratio of the two projects is not too dissimilar where iSpot Cr>0.5 AND iRecord Cr>0.5.

83_{http://www.wwt.org.uk/wetland-centres/london/}

84_{Going to see some comments I think they misplaced some observations of LNHS 2010 especially from}

Kent. Some observations have comments about the surrounding environment and report a woodland which does not exist in this part of London.

85_{16 observations with location_name set as Somerfield Rd, Maidstone} 86_{1 observation with location_name set as M4, Bradley Stoke}

193

Figure 7-8:iRecord and iSpot contributor ratio scatterplot

Those areas include parts of royal parks (Regent's and Hyde park) and surrounding areas, Hampstead Heath, and the Stoke Newington area around the Abney Park Cemetery where observations are quite widespread and not confined only to the open spaces.

Another noticeable outlier covers a part of the Lea Valley where the Walthamstow Marshes meet the Springfield Park: here, most of the observations are along the Lea River. The only grid south of the River Thames here overlaps with Syndenham Hill Wood and Dulwich Wood (TQ3472). iRecord has a particular spatial aggregation of grids covering the southernmost section of the Borough of Camden (TQ2882, TQ2883) which includes part of Regent's Park but also the grid unit TQ2982 which includes also UCL's main quad where an individual who contributed to the creation of some iRecord spatial patterns in terms of Volunteer productivity in Orpington Borough, giving also here a noticeable contribute. In Figure 7-9, we have the scatterplot diagram that compares the contributor ratio of GiGL and iRecord. The mild correlation between the two data collections is visible here while in terms of outliers they have been already described in the sections above. A grid unit not mentioned earlier covers the area of Islington, including all of King's Cross station and the mostly residential and office areas in the

194

Pentonville area. Here several GiGL observations are made in everyday environments, walls, pavements together with some recordings erroneously geolocated in this grid while actually recorded in a small nature reserve behind King's Cross Station88_{. The analysis}

of the data at the individual cell level helped us to stress some interesting outcomes. We noted that 9 of the 10 cells with the highest number of records and all the 10 cells with the highest number of volunteers contained open water (‘bluespace’), suggesting that such sites are particularly popular with recorders.

88_{Carmley Street Natural Park (}_{http://www.wildlondon.org.uk/reserves/camley-street-natural-park}_{) more}

correctly falls inside the grid unit TQ2983

195

7.5 Spatial statistics

This section is devoted to the identification of environmental spaces that are statistically and spatially relevant. The identification of grid units that individually or in spatial aggregations have values that are in stark contrast with the values of surrounding grids have been found using the values given by the total amount of contributors to all citizen science projects.

The relation between projects as presented in Figure 7-10, shows how the three platforms we used have such large differences in terms of number of observations and number of volunteers that considering the simple addition of contributors and then studying their distribution will equate most of the time to the analysis of GiGL volunteers' distribution as shown in Figure 7-10, below.

Figure 7-10: Proportion of contributors amongst Citizen Science organisations

It is noteworthy that the green colour associated to GiGL overshadows all other organisations for the vast majority of Greater London. While the summing up of all contributors can therefore be seen as the creation of a bias that favours GiGL, we must keep in mind that GiGL data is a collection of 176 surveys run by several heterogeneous organisations, as well as the fact that iRecord is a more coordinated family of 29 surveys built around technological affordances.

Willing to characterise the overall phenomenon of citizen science as a cultural practice that help us to detect environmental spaces, we will then sum up all observations collected by our three organisations, reminding that in such a way we are summing up 176 GiGL surveys, 29 iRecord surveys and iSpot. A study comparable to the other case studies analysed in this work might be run considering all 206 surveys individually.

196

The map of total number of volunteers per grid units is shown in Figure 7-1. It is evident how some grid units particularly exceed the values registered in neighbouring grid units as well as some spatial trends.

Figure 7-11:Total contributors to citizen science biological recordings per BNG grid unit

We ran several tests to find the most appropriate methodological arrangement to stress the evident spatial clusters as can be intuitively seen in Figure 7-11. In terms of the size of the analytical window to use, the most effective distance range resulted to be 10 km. Then the most significant Moran’s I global spatial autocorrelation index, calculated over the distribution of total contributors for every grid unit using a linear decay of influence89

(I = 0.034380, z=17.756409 ρ < 0.0001), reveals that there is largely less than 1% probability that the autocorrelation has been generated by a random distribution. While previously the use of the contributor ratio was functional to the detection of correlations, in this case the number of contributors is preferred as in this case every individual contributing is considered regardless of the project to which they contributed90_.

7.5.1 Spatial trends: clusters and outliers

To detect spatially statistically significant grid units, such as spatial clusters and outliers, this study used Moran’s I local indicator of spatial association (LISA; Anselin 1995) as implemented in ArcGIS 10.2.2 spatial statistical tools. The computation highlighted 120 grid units that constitute in clusters, or individually a statistically relevant spatial unit,

89_{Inverse distance}

90_{The sum of contributor ratios from the three data holders provided even better indicators of autocorrelation}

but that number was considered a false positive for the reasons explained above (we have more than 200 surveys and not just three communities).

197 when analysing the distribution of the number of contributors. In Figure 7-12 we have them plotted on the Greater London Map and against all Green and main Blue spaces91_.

The grids with high values in clusters or in isolation define 44 areas of different sizes. The largest five of them are all in areas along the rivers Thames or Lea. Several other clusters and hotspots are in areas which include parks, but several large green areas do not emerge as a significant high value clusters. We have also three grid units which are particularly poorer in terms of contributions to citizen science compared to neighbouring areas. In the following section, we are going to identify and characterise some of the areas that have been identified through the spatial statistical analysis.

Figure 7-12: Citizen Science clusters and hotspots

7.5.2 Visual inspection: sites of environmental

In document Leveraging the value of crowdsourced geographic information to detect cultural ecosystem services (Page 190-197)