The next step, after having downloading all the data, was to compile them in one table. As some of the data were only available on PDF files and not organized in the same way (some reports class the data per continent, some per HDI and some per name), it took me a long time to be able to put everything together. I decided to organize the data in the alphabetical order, as it was in the GIS world shape file, so that my table would be easier to implement in ArcGIS. I had to delete some countries from which I did not have all the information as Cuba, Serbia and Montenegro, Bosnia and Herzegovina, Zimbabwe, etc. I finally ended up with the following table for 149 different countries: Country Pop. (millio ns) Total Freshwater Withdrawal (m3/pers/y) Per Capita Withdrawal (m3/pers/ y) Domestic Use (m3/pers /y) % of population without sustainable access to improved water source HDI GDP (billi ons) GDP per capita (billion s) Infant mortality rate Life expectanc y at birth
Outliers The only trend that had an R square enough significant to be studied was the first one (Total freshwater withdrawal vs. Population). To define the outliers, I used the method described in Helsel and Hirsch (p.246), “leverage is a measure of an ‘outlier’ in the x direction.” A high leverage point is one where , where p is the number of coefficients in the model and n is the number of data use. The idea is to check the degree of deviation of an individual point from the regression line in the x and y directions with this value. For deviation in the x‐direction, the statistics hi is computed as:
1
Where SSx is the sum of the squares x. For deviations in the y direction we use the standardized residual esi. It is the actual residual divided by its standard error, Se. The estimated y can be calculated using the trendline equation. Alternatively, the residual can be found in the residuals output of the regression analysis. Then
1
Where the s in this equation is the standard error of estimate of the regression equation. Helsel and Hirsch describes an extreme outlier as one for which |esi|>3 but in order to only get rid of only the most extreme outliers I just decided to use the ones for which |esi|>6. The only one that I found was the United States of America. This means that regarding the total freshwater withdrawal vs. the population of a country, the only country that seems to be significantly distant from the rest of the data is the USA. Indeed, Americans seem to use a lot more water per capita than in the rest of the world. It may be confirmed in the following part with only taking into account the domestic use.Figure 6 ‐ Map of the world: % of population without sustainable access to improved water source
Figure 7 ‐ Map of Africa: % of population without sustainable access to improved water source