top of page

Regression Analyses

From observing the correlation matrix, the relationships between happiness and PM2.5 and happiness and PM10 at a national scale appear to be similar. Therefore, this inspired a further assessment of the relationship using regression techniques.

 

Using the regression analysis, it is clear that both R-squared values and gradients for the respective regressions are relatively close (see the summary statistics table, below).

 

Therefore, we might expect the effect of PM2.5 and PM10 air pollution on happiness to be similar. However, the low R-squared values tell us that the relationship between air pollution and happiness here is weak. This evidence, therefore, may not be sufficient to draw a certain conclusion.  

A slideshow to demonstrate the summary statistics and regression plot for mean PM2.5 and mean PM10 across all of England in the years 2011-2021.

A slideshow to demonstrate the summary statistics and regression plots for mean PM2.5 and mean PM10 across the North and South of England in the years 2011-2021.

At first glance, the plot shows that there is an overall increase in happiness levels nationwide between 2011-2018. By 2019 the data is clustered at lower happiness levels than in previous years. In 2020 the cluster of happiness data is lower yet again. This suggests a decline in happiness in England between 2019-2020, which would be consistent with the appearance of the Covid-19 pandemic. 2021 shows the data clustering around higher measures of happiness, which in turn implies a recovery from the decline seen in 2019-2020.

 

Some tentative observations can be made that may reflect differences between North and South England. The clustered data representing the South (green) seems to be situated higher overall for 2019-2021, contrasting to the relative equality of 2015, for instance. This may suggest that happiness decreased more and recovered less for the North than the South in this period. The distribution of data, however, is very similar between the two regions and while the variance of data for the South seems to be greater, this could be due to the greater number of locations in this dataset that belong to the South.

bottom of page