I’m Ian Calvert, a data scientist here at Digital Science. Recently, we released GRID, a free database of metadata for over 50,000 institutes from around the world, and I’d like to show how you can use it to discover new things from existing datasets. For this example, I’m going to explore how university pay varies around the UK.

The Times Higher Education (THE) is a weekly magazine that reports news and issues on higher education. They published a very interesting, and quite detailed, dataset about pay in universities in the UK (Times Higher Education survey of pay). The article does a great job looking at trends and some of the interesting outliers in the data, but I’m more interested in seeing what else we can tease out when we link it up with other sources.

First of all, let’s have a look at original data, specifically the salary distribution for academic and for manual staff. Here we’ll be looking at the universities for which there is data on both the academic and manual staff salaries as well as regional salary data (which unfortunately excludes Northern Ireland). The only other exclusion is the London Business School because their average academic pay is £182k, almost 3 times any other university.

Even after removing the London Business School outlier, the range here is still quite large, with some institutes offering salaries almost double those of others. It is likely that multiple factors account for this, however it would be interesting to see if geography plays a major part in the wage distribution.

Is there a North-South divide in university salaries?

If there’s a North-South divide, we’d expect to see the pay tend to increase as we look further south, which we could plot easily if we knew the precise location of each institute.

Thankfully, the hard work of locating all of these institutes has already been done in GRID. All we need to do here is to align our data with the GRID database to enrich it with detailed geographical information.

The THE data comes with Higher Education Statistics Agency (HESA) codes, which are unique identifiers for UK institutions. We have added these institutional identifiers to GRID, which makes this task a simple lookup. Without unambiguous identifiers, this would have required many hours of manual work, but instead it only took a moment.

Now, let’s compare the pay data to the latitude of each institute, so each institute is a point on this graph and they are arranged South-North as you move left-right. Hover over a point to see which institute it represents.

There is a clear spike between 51° and 52°, which unsurprisingly corresponds to London (51°30’26.0″N 0°07’39.0″W). Most areas outside London are the same with a couple of outliers in the academic staff salaries. Some institutes like The Royal Northern College of Music (53°28’07.0″N 2°14’11.0″W) in Manchester and University of Warwick (52°22’48.3″N 1°33’43.0″W) in Coventry offer generous income, while Norwich University of the Arts (52°37’49.1″N 1°17’47.8″E) provides more modest remunerations.

There doesn’t seem to be any clear trend from the South to the North.

If there’s a big difference between the pay within London and pay outside of London, it seems reasonable to wonder if that trend continues to spread out beyond the M25. This brings us to our second question:

Is university pay higher closer to London?

Let’s change our graph to look at how the pay relates to the distance to London. For a central point, we’ll follow convention and use Charing Cross.

Now this is looking clearer. Wages in and near London look higher, but quite rapidly level out.

Pay seems higher very close to London.

However average London wages are generally higher too. It’s possible the differences in university salary are explained by the general differences in pay around the country which brings us to our third question:

Is university pay in line with regional averages?

To answer this, we’ll need some data on regional averages first. The Office of National Statistics (ONS), produce the Annual Survey of Hours and Earnings (ASHE) which gives highly detailed breakdowns of the distributions of pay around the UK. The data is broken down into Nomenclature of Territorial Units for Statistics (NUTS) regions, which is fantastic. NUTS regions are a well defined set of areas in the EU with roughly similar population sizes (you can see the hierarchy of UK regions here, from the top level NUTS1 regions to the more detailed NUTS3 regions), and have unambiguous codes assigned to them. Having codes that can be used to identify areas rather than names helps with issues like Newport referring to either the city in Wales, or the town in England. Since the ONS (and many other statistical bodies in the EU) release data with NUTS codes, we’ve worked out which region each city is in and added them all to GRID. Again, all the time consuming work here has been done for us and all we have to do is look things up between tables to do our next piece of analysis.

Let’s look at the difference between the NUTS3 regional median and the average university pay. These are absolute differences, so £0 represents being paid exactly the median wage for your area, £10k would be ten thousand pounds over the regional median and £-10k would be ten thousand pounds below the regional median.

This graph shows an interesting feature of the dataset. The relative difference in academic pay when compared to the median pay of all workers in the area shows academics in London are typically worse off than their counterparts elsewhere – London academics are earning between £5K and £15K more whereas outside London, they could expect to earn £15 to £25K more than the regional median. There is a similar picture for those classed as manual workers, with them earning £10 to £20K less in London, and up to £10K less outside the capital.

Final Remarks

My exploration of the dataset has shown the following:

  1. University pay doesn’t appear to increase as you go further south.
  2. There’s a clear difference with universities close to or in London.
  3. Although university pay is higher in London, it’s actually lower relative to the local median.

All this has been done without having to search Google for details on universities or trawl through university contact pages to find locations and addresses. We’ve used GRID as a source of data, for precise lat/long locations, as well as using it as a link to join two different disparate datasets.

Now hopefully you’re left with some lingering questions about what would happen if we used something else instead of median salaries. If there’s a dataset linked to NUTS regions, then it’ll be easy for you to give it a try!

You could try linking the university data up with measures of disposable household income using the data here, for example.

There’s a wealth of data linked to NUTS regions, such as:

HESA codes and NUTS regions aren’t the only identifiers in GRID. We also have identifiers from UKPRN, Crossref Open Funder Registry (formerly Fundref), ISNI, GeoNames and UCAS. With these, there will be even more datasets you can link together.

Hopefully I’ve shown you that GRID can be used to both enrich your current data as well as allow you to link multiple data sources together, why not give it a go? Let us know how you get on at @grid_ac!