Open Data companion

Last updated: 08 July 2020

Governments and other organisations often make open data available through Web service Application Programming Interfaces or APIs. The World Bank, UK Police, and Transport for London are just a few well-known examples. This document details the steps required to request data from these different Web service APIs using R.

Several R packages¹ have been developed as clients for Web service APIs. These don’t assume any knowledge of API endpoints, HTTP requests, or data formats like XML and JSON. These are really convenient but sometimes you want to break into the ‘black box’ of APIs because you want to learn more or because there isn’t an API wrapper package available.

A quick introduction to APIs

APIs or Application Programming Interfaces are a set of rules that allow one software application to interact with another either in the same location or over a network. Inputs and outputs will vary between APIs but the process is the same: a ‘request’ that follows certain programmatic rules is submitted and a ‘response’ containing content in an expected format is returned.

There are many types of API including library-based (e.g. leafletJS) and class-based (e.g. Java) but one of the most common are Web service APIs. A client (browser) submits a Hypertext Transfer Protocol (HTTP) request to a server and the server returns a response to the client. The response contains status information about the request and may also contain the requested content.

The parameters of an HTTP request are typically contained in the URL. For example, to return a map of Manchester using the Google Maps Static API we would submit the following request:

https://maps.googleapis.com/maps/api/staticmap?center=Manchester,England&zoom=13&size=600x300&maptype=roadmap

The request contains:

a URL to the API endpoint (https://maps.googleapis.com/maps/api/staticmap?) and;
a query containing the parameters of the request (center=Manchester,England&zoom=13&size=600x300&maptype=roadmap). In this case, we have specified the location, zoom level, size and type of map.

Web service APIs use two key HTTP verbs to enable data requests: GET and POST. A GET request is submitted within the URL with each parameter separated by an ampersand (&). A POST request is submitted in the message body which is separate from the URL. The advantage of using POST over GET requests is that there are no character limits and the request is more secure because it is not stored in the browser’s cache.

There are several types of Web service APIs (e.g. XML-RPC, JSON-RPC and SOAP) but the most popular is Representational State Transfer or REST. RESTful APIs can return output as XML, JSON, CSV and several other data formats.

Each API has documentation and specifications which determine how data can be transferred. Unfortunately, the specifications tend to be different and the documentation can be hard to follow.

An example API request

Querying a Web service API typically involves the following steps:

submit the request
check for any server error
parse the response
convert to a data frame

In the following example we will submit a request for police reported crime data from the UK Police API. The API uses both HTTP GET and POST requests and provides content in JSON data format.

The two key R packages for submitting HTTP requests to Web service APIs and parsing the content of the response are httr and jsonlite. Let’s load them into our R session. The tidyverse package is also loaded because it contains a suite of useful functions.

library(tidyverse) ; library(httr) ; library(jsonlite)

We would like to retrieve street level crimes within a mile radius of a specific location so we need to use https://data.police.uk/api/crimes-street/all-crime? as our API endpoint (see API documentation). Rather than retrieving ‘all-crime’ lets narrow our request to retrieve only reports of burglary. This will change our path to: https://data.police.uk/api/crimes-street/burglary?

path <- "https://data.police.uk/api/crimes-street/burglary?"

Next we need to build our API request and submit it. We will use the GET function from the httr package. First we supply the path to the API endpoint and provide search parameters in the form of a list to the query argument. There are three parameters available to us:

lat = latitude
lng = longitude
date = and optional date in YYYY-MM format

request <- GET(url = path,
               query = list(
                 lat = 53.421813,
                 lng = -2.330251,
                 date = "2018-05")
               )

Let’s check if the API returned an error. If the request fails the API will return a non-200 status code.

request$status_code

Next we parse the content returned from the server as text using the content function.

response <- content(request, as = "text", encoding = "UTF-8")

Then we’ll parse the JSON content and and convert it to a data frame.

df <- fromJSON(response, flatten = TRUE) %>%
  data.frame()

Finally, we might strip out some of the variables and rename the remaining.

df <- select(df,
             month, category,
             location = location.street.name,
             long = location.longitude,
             lat = location.latitude)

Burglaries within 1m radius of specified location
month	category	location	long	lat
2018-05	burglary	On or near Washway Road	-2.328411	53.422967
2018-05	burglary	On or near Glenthorn Grove	-2.323696	53.418287
2018-05	burglary	On or near Waterside	-2.317188	53.427410
2018-05	burglary	On or near Leicester Road	-2.322199	53.427361

That’s it. We’ve submitted a request to the Police UK API and parsed the response into a data frame ready for use in R.

Setup

The following API requests all rely on the tidyverse, httr, and jsonlite R packages. Make sure that you have installed them.

install.packages("tidyverse", "httr", "jsonlite")

Food Standards Agency

The Food Standards Agency provide food hygiene rating data for the United Kingdom.

Example: fast food outlets in Trafford

HTTP verb: GET
API endpoint URL: http://api.ratings.food.gov.uk/Establishments
Selected parameters: name, address, longitude, latitude, businessTypeId, ratingKey, localAuthorityId
Headers: “x-api-version”, 2
Data format(s): JSON, XML
Documentation: http://api.ratings.food.gov.uk/help

# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# submit the request
path <- "http://api.ratings.food.gov.uk/Establishments"
request <- GET(url = path,
             query = list(
               localAuthorityId = 188,
               BusinessTypeId = 7844,
               pageNumber = 1,
               pageSize = 5000),
             add_headers("x-api-version" = "2"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  pluck("establishments") %>%
  as_tibble()

# tidy the data
df <- response %>%
  mutate_all(funs(replace(., . == '', NA))) %>%
  select(name = BusinessName,
         type = BusinessType,
         address1 = AddressLine1,
         address2 = AddressLine2,
         address3 = AddressLine3,
         address4 = AddressLine4,
         postcode = PostCode,
         long = geocode.longitude,
         lat = geocode.latitude) %>%
  unite(address, address1, address2, address3, address4, remove = TRUE, sep = ", ") %>%
  mutate(address = str_replace_all(address, "NA,", ""),
         address = str_replace_all(address, ", NA", ""),
         long = as.numeric(long),
         lat = as.numeric(lat))

Fast food outlets in Trafford
name	type	address	postcode	long	lat
Aminah Tandoori	Takeaway/sandwich shop	2 Peter Street, Altrincham	WA14 2DS	-2.352011	53.38352
Barburrito	Takeaway/sandwich shop	2 The Orient, Trafford Park, Manchester	M17 8EH	-2.349020	53.46619
Bei Jing	Takeaway/sandwich shop	79 Great Stone Road, Stretford, Manchester	M32 8GR	-2.286284	53.45302
Canadian Charcoal Pit	Takeaway/sandwich shop	6 Church Street, Altrincham, Cheshire	WA14 4DW	-2.349928	53.38981
Canadian Charcoal Pit	Takeaway/sandwich shop	143 Barton Road, Stretford, Manchester	M32 8DN	-2.314259	53.44645
Chinese Kitchen	Takeaway/sandwich shop	12 Deansgate Lane, Timperley, Altrincham	WA15 6SB	-2.340111	53.39759

Nomis

Nomis provide labour market, benefit and census data for the United Kingdom.

Example: Claimant count in Trafford for the last 13 months

HTTP verb: GET and POST
API endpoint URL: https://www.nomisweb.co.uk/api/v01/dataset/
Selected parameters: date, geography, gender, age, measure and measures
Headers: NA
Data format(s): JSON, CSV, xls
Documentation: https://www.nomisweb.co.uk/api/v01/help
R package: nomisr

# load the necessary packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# retrieve the name, id and available parameters for all Nomis datasets with 'claimant' as a keyword
datasets <- fromJSON("https://www.nomisweb.co.uk/api/v01/dataset/def.sdmx.json", flatten = TRUE) %>%
  map("keyfamilies") %>%
  map_df(bind_rows) %>%
  unnest(components.dimension) %>%
  select(id, name = name.value, description = description.value, parameter = conceptref) %>%
  filter(stringr::str_detect(name, regex('^.*?\\b(claimant(s)*)\\b.*?\\bage\\b.*?$', ignore_case = T))) %>%
  distinct(id, name, parameter)

# alternatively try the following link in your browser: https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*claimant*

# retrieve the codelists for all the available parameters for the 'NM_162_1' dataset
parameters <- c("https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/gender/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/age/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measure/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measures/def.sdmx.json") %>%
  map_df(~fromJSON(., flatten = TRUE) %>%
           as.data.frame() %>%
           unnest() %>%
           select(parameter = structure.codelists.codelist.name.value,
                  description = description.value, value))

# submit the request
# note that .data.json is appended to the path because we want data in JSON format
path <- "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1.data.json?"
request <- GET(url = path,
             query = list(
               date = "latestMINUS12-latest",
               geography = "E08000009",
               gender = 0,
               age = 0,
               measure = 1,
               measures = 20100))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  pluck("obs") %>%
  as_tibble()

# tidy the data
df <- response %>%
  mutate(date = as.Date(paste0(time.value, '-01'), format = '%Y-%m-%d')) %>%
  select(date,
         area_name = geography.description,
         area_code = geography.geogcode,
         measure = measure.description,
         n = obs_value.value)

Count of claimants in Trafford
date	area_name	area_code	measure	n
2020-05-01	Trafford	E08000009	Claimant count	7785
2020-04-01	Trafford	E08000009	Claimant count	6590
2020-03-01	Trafford	E08000009	Claimant count	3615
2020-02-01	Trafford	E08000009	Claimant count	3600
2020-01-01	Trafford	E08000009	Claimant count	3415
2019-12-01	Trafford	E08000009	Claimant count	3380
2019-11-01	Trafford	E08000009	Claimant count	3510
2019-10-01	Trafford	E08000009	Claimant count	3620
2019-09-01	Trafford	E08000009	Claimant count	3470
2019-08-01	Trafford	E08000009	Claimant count	3505
2019-07-01	Trafford	E08000009	Claimant count	3455
2019-06-01	Trafford	E08000009	Claimant count	3440
2019-05-01	Trafford	E08000009	Claimant count	3385

Stat-Xplore

Stat-Xplore provides access to a wide variety of statistics published by the Department of Work and Pensions including Universal Credit, Carers Allowance and Disability Living Allowance.

Example: Housing benefit claimants in Trafford’s LSOAs in August 2018.

HTTP verb: POST
API endpoint URL: https://stat-xplore.dwp.gov.uk/webapi/rest/v1/table
Selected parameters: database, measures, dimensions
Headers: API key
Data format(s): JSON
Documentation: Stat-Xplore : Open Data API
R package: stat-xplore-R

You will need an API key to access the Stat-Xplore API. Just create a free account and copy the API key from your Account information. For more details visit: https://stat-xplore.dwp.gov.uk/webapi/online-help/Open-Data-API.html

# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# add your API key
api_key <- ""

# identify the API endpoint
path <- "https://stat-xplore.dwp.gov.uk/webapi/rest/v1/table"

# build your query
query <- list(database = unbox("str:database:hb_new"),
              measures = "str:count:hb_new:V_F_HB_NEW",
              dimensions = c("str:field:hb_new:V_F_HB_NEW:COA_CODE",
                             "str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME") %>% matrix(),
              recodes = list(
                `str:field:hb_new:V_F_HB_NEW:COA_CODE` = list(
                  map = as.list(paste0("str:value:hb_new:V_F_HB_NEW:COA_CODE:V_C_MASTERGEOG11_LSOA_TO_MSOA:E0", seq(1006074, 1006211, 1)))),
                `str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME` = list(
                  map = list("str:value:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME:C_HB_NEW_DATE:201808"))
              )) %>% toJSON()

# submit the API request
request <- POST(
  url = path,
  body = query,
  config = add_headers(APIKey = api_key),
  encode = "json")

# check for any server error
# request$status_code

# parse the response
response <- fromJSON(content(request, as = "text"), flatten = TRUE)

# extract list items and convert to a dataframe
dimnames <- response$fields$items %>% map(~.$labels %>% unlist)
values <- response$cubes[[1]]$values
dimnames(values) <- dimnames
df <- as.data.frame.table(values, stringsAsFactors = FALSE) %>%
  as_tibble() %>%
  set_names(c(response$fields$label,"value"))

Housing Benefit claimants at LSOA level, August 2018
National - Regional - LA - OAs	Month	value
Trafford 025A	201808 (Aug-18)	137
Trafford 025B	201808 (Aug-18)	36
Trafford 025C	201808 (Aug-18)	26
Trafford 027A	201808 (Aug-18)	77
Trafford 024A	201808 (Aug-18)	72
Trafford 025D	201808 (Aug-18)	36

UK Police

The data.police.uk website provides incidents of police recorded crime and anti-social behaviour in England, Wales and Northern Ireland.

Example: Robberies within the borough of Trafford

HTTP verb: GET or POST. However, use POST for large / high resolution polygons because of a 4094 character limit with GET requests.
API endpoint URL: https://data.police.uk/api/crimes-street/all-crime?
Selected parameters: poly, date
Headers: NA
Data format(s): JSON
Documentation: https/://data.police.uk/docs/
R package: ukpolice

# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite) ; library(sf)

# download a vector boundary of Trafford
bdy <- st_read("https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_3.geojson", quiet = TRUE) %>%
  filter(lad17nm == "Trafford")

# extract the coordinates and format for inclusion in the API request parameter
coords <- bdy %>%
  st_coordinates() %>%
  as.data.frame() %>%
  select(X, Y) %>%
  unite(coords, Y, X, sep = ',') %>%
  mutate(coords = sub("$", ":", coords)) %>%
  .[["coords"]] %>%
  paste(collapse = "") %>%
  str_sub(., 1, str_length(.)-1)

# sumbit the API request
path <- "https://data.police.uk/api/crimes-street/robbery"
request <- POST(url = path,
                query = list(poly = coords, date = "2018-04"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  as_tibble()

# convert to a data frame
df <- data.frame(
  month = response$month,
  category = response$category,
  location = response$location.street.name,
  long = as.numeric(as.character(response$location.longitude)),
  lat = as.numeric(as.character(response$location.latitude)),
  stringsAsFactors = FALSE
)

Robberies in Trafford
month	category	location	long	lat
2018-04	robbery	On or near New William Close	-2.424867	53.41983
2018-04	robbery	On or near Parking Area	-2.352642	53.38522
2018-04	robbery	On or near Crescent Road	-2.343565	53.37457
2018-04	robbery	On or near Padbury Close	-2.396574	53.45163
2018-04	robbery	On or near Stretford Road	-2.346899	53.44621
2018-04	robbery	On or near Petrol Station	-2.352497	53.46765

Examples include eurostat, fingertipsR, and WHO ↩︎