top of page

Data Translation: Python-QGIS

This portion of the project was significantly the most difficult. However, it was necessary to maintain the fine-scale analysis of the relationship between air pollution and happiness on a local scale.

Why?

We were at a point where we had three datasets with no missing data. However, the pollution data still included data for Wales, Scotland and Northern Ireland and was not categorised by Local Authority (like the happiness data) but by cordinate. At first, we considered using external libraries such as pyBNG or pyProj to translate the coordinates into the WGS84 system which openstreetmap uses. However, pyBNG kept giving errors and the pyPROJ projections were not very accurate. When searching for software that could use the British National Grid (BNG) coordinate reference system (CRS), we came across QGIS, free and open-source software which is widely used and has a lot of documentation to support its usage. This meant our data could be uploaded using directly the x and y coordinates provided.

​

Data translation was needed to assign the pollution and the happiness data the same geographic attribute: the PM10 and PM2.5 data were much finer, modelled across 1x1 km squares using the British National Grid coordinate system, while the happiness data was assigned an Area Code, corresponding to a Local Authority boundary. We aggregated the pollution data to the Local Authority level: thereby averaging the modelled pollution data from a highly precise, localised level to a much broader regional one. The 1x1km squares were averaged within the local authority boundaries. This provided pollution means for all ONS Local Authority boundaries.  While this limited the precision of the analysis, this still lies within the scope of our project aim: to assess the regional variations in the relationship between air pollution and happiness. 

Step by Step

1. Added Local Authority Shapefile to QGIS

A shapefile (a file format which contains geographic attributes on a map) was added to QGIS from the ONS' Open Geography Portal. It contained the administrative boundaries for Local Authority Districts in Great Britain in December 2021 at the full resolution-- clipped to the coastline level (available here). This layer had latitudes and longitudes and area codes as spatial attributes (this layer will be referred to as LAD21).  The data had a column with the “Area Code”. Upon researching what these meant, we found that all codes starting with the letter “E” denominated England data.

2. Added Cleaned PM2.5 Data to QGIS

The PM2.5 data was imported to QGIS as a CSV file, containing UK Grid Codes and coordinates as geographic attributes. The coordinates for each data point is the centroid of a 1km by 1km quadrant. For each coordinate point we used a 4 segment buffer with a square end cap each at a 500 metre distance from the coordinate. 

3. Completed a Spatial Join

A spatial join was needed to transform the coordinates contained in the pollution data, to the Local Authority boundaries present in the happiness data and the LAD21 layer. By clicking on the LAD21 layer, the processing toolbox and joining the attributes by location (summary), the x and y coordinates of each point in the PM2.5 csv file were lined up with the x and y coordinates in the LAD21 shapefile. Now, the geographic attribute (where the data is assigned to) is constant across all the data, and could be repeated with the PM10 values, and visualized. This was saved as a joined layer. This process was aided through GIS tutorials to 'reproject on the fly', understand coordinate reference systems, and ultimately perform the spatial join.

4. Cross-Referencing 

We cross-referenced by merging layers and finding if there were any areas for which there were no matching data. This came up with 8 locations: Boston, Oadby and Wigston, North Northampton, West Northampton, Gravesham, Adur, Isles of Scilly and Richmondshire. . We now had a total of 300 locations for which we could compare data.

 

Now that there was only data for England, we were able to produce visualisations by using a graduated marking system for which we used automatic classification of the data by the QGIS software. This was repeated for each year, producing a visually comparable map with happiness and pollution data from 2011 to 2021. 

bottom of page