Open Data companion

Last updated: 08 July 2020

Governments and other organisations often make open data available through Web service Application Programming Interfaces or APIs. The World Bank, UK Police, and Transport for London are just a few well-known examples. This document details the steps required to request data from these different Web service APIs using R.

Several R packages1 have been developed as clients for Web service APIs. These don’t assume any knowledge of API endpoints, HTTP requests, or data formats like XML and JSON. These are really convenient but sometimes you want to break into the ‘black box’ of APIs because you want to learn more or because there isn’t an API wrapper package available.

A quick introduction to APIs

APIs or Application Programming Interfaces are a set of rules that allow one software application to interact with another either in the same location or over a network. Inputs and outputs will vary between APIs but the process is the same: a ‘request’ that follows certain programmatic rules is submitted and a ‘response’ containing content in an expected format is returned.

There are many types of API including library-based (e.g. leafletJS) and class-based (e.g. Java) but one of the most common are Web service APIs. A client (browser) submits a Hypertext Transfer Protocol (HTTP) request to a server and the server returns a response to the client. The response contains status information about the request and may also contain the requested content.

The parameters of an HTTP request are typically contained in the URL. For example, to return a map of Manchester using the Google Maps Static API we would submit the following request:

https://maps.googleapis.com/maps/api/staticmap?center=Manchester,England&zoom=13&size=600x300&maptype=roadmap

The request contains:

  1. a URL to the API endpoint (https://maps.googleapis.com/maps/api/staticmap?) and;
  2. a query containing the parameters of the request (center=Manchester,England&zoom=13&size=600x300&maptype=roadmap). In this case, we have specified the location, zoom level, size and type of map.

Web service APIs use two key HTTP verbs to enable data requests: GET and POST. A GET request is submitted within the URL with each parameter separated by an ampersand (&). A POST request is submitted in the message body which is separate from the URL. The advantage of using POST over GET requests is that there are no character limits and the request is more secure because it is not stored in the browser’s cache.

There are several types of Web service APIs (e.g. XML-RPC, JSON-RPC and SOAP) but the most popular is Representational State Transfer or REST. RESTful APIs can return output as XML, JSON, CSV and several other data formats.

Each API has documentation and specifications which determine how data can be transferred. Unfortunately, the specifications tend to be different and the documentation can be hard to follow.

An example API request

Querying a Web service API typically involves the following steps:

  1. submit the request
  2. check for any server error
  3. parse the response
  4. convert to a data frame

In the following example we will submit a request for police reported crime data from the UK Police API. The API uses both HTTP GET and POST requests and provides content in JSON data format.

The two key R packages for submitting HTTP requests to Web service APIs and parsing the content of the response are httr and jsonlite. Let’s load them into our R session. The tidyverse package is also loaded because it contains a suite of useful functions.

library(tidyverse) ; library(httr) ; library(jsonlite)

We would like to retrieve street level crimes within a mile radius of a specific location so we need to use https://data.police.uk/api/crimes-street/all-crime? as our API endpoint (see API documentation). Rather than retrieving ‘all-crime’ lets narrow our request to retrieve only reports of burglary. This will change our path to: https://data.police.uk/api/crimes-street/burglary?

path <- "https://data.police.uk/api/crimes-street/burglary?"

Next we need to build our API request and submit it. We will use the GET function from the httr package. First we supply the path to the API endpoint and provide search parameters in the form of a list to the query argument. There are three parameters available to us:

  • lat = latitude
  • lng = longitude
  • date = and optional date in YYYY-MM format
request <- GET(url = path,
               query = list(
                 lat = 53.421813,
                 lng = -2.330251,
                 date = "2018-05")
               )

Let’s check if the API returned an error. If the request fails the API will return a non-200 status code.

request$status_code

Next we parse the content returned from the server as text using the content function.

response <- content(request, as = "text", encoding = "UTF-8")

Then we’ll parse the JSON content and and convert it to a data frame.

df <- fromJSON(response, flatten = TRUE) %>%
  data.frame()

Finally, we might strip out some of the variables and rename the remaining.

df <- select(df,
             month, category,
             location = location.street.name,
             long = location.longitude,
             lat = location.latitude)
Burglaries within 1m radius of specified location
month category location long lat
2018-05 burglary On or near Washway Road -2.328411 53.422967
2018-05 burglary On or near Glenthorn Grove -2.323696 53.418287
2018-05 burglary On or near Waterside -2.317188 53.427410
2018-05 burglary On or near Leicester Road -2.322199 53.427361

That’s it. We’ve submitted a request to the Police UK API and parsed the response into a data frame ready for use in R.


Setup

The following API requests all rely on the tidyverse, httr, and jsonlite R packages. Make sure that you have installed them.

install.packages("tidyverse", "httr", "jsonlite")


Food Standards Agency

The Food Standards Agency provide food hygiene rating data for the United Kingdom.

Example: fast food outlets in Trafford

  • HTTP verb: GET
  • API endpoint URL: http://api.ratings.food.gov.uk/Establishments
  • Selected parameters: name, address, longitude, latitude, businessTypeId, ratingKey, localAuthorityId
  • Headers: “x-api-version”, 2
  • Data format(s): JSON, XML
  • Documentation: http://api.ratings.food.gov.uk/help
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# submit the request
path <- "http://api.ratings.food.gov.uk/Establishments"
request <- GET(url = path,
             query = list(
               localAuthorityId = 188,
               BusinessTypeId = 7844,
               pageNumber = 1,
               pageSize = 5000),
             add_headers("x-api-version" = "2"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  pluck("establishments") %>%
  as_tibble()

# tidy the data
df <- response %>%
  mutate_all(funs(replace(., . == '', NA))) %>%
  select(name = BusinessName,
         type = BusinessType,
         address1 = AddressLine1,
         address2 = AddressLine2,
         address3 = AddressLine3,
         address4 = AddressLine4,
         postcode = PostCode,
         long = geocode.longitude,
         lat = geocode.latitude) %>%
  unite(address, address1, address2, address3, address4, remove = TRUE, sep = ", ") %>%
  mutate(address = str_replace_all(address, "NA,", ""),
         address = str_replace_all(address, ", NA", ""),
         long = as.numeric(long),
         lat = as.numeric(lat))
Fast food outlets in Trafford
name type address postcode long lat
Aminah Tandoori Takeaway/sandwich shop 2 Peter Street, Altrincham WA14 2DS -2.352011 53.38352
Barburrito Takeaway/sandwich shop 2 The Orient, Trafford Park, Manchester M17 8EH -2.349020 53.46619
Bei Jing Takeaway/sandwich shop 79 Great Stone Road, Stretford, Manchester M32 8GR -2.286284 53.45302
Canadian Charcoal Pit Takeaway/sandwich shop 6 Church Street, Altrincham, Cheshire WA14 4DW -2.349928 53.38981
Canadian Charcoal Pit Takeaway/sandwich shop 143 Barton Road, Stretford, Manchester M32 8DN -2.314259 53.44645
Chinese Kitchen Takeaway/sandwich shop 12 Deansgate Lane, Timperley, Altrincham WA15 6SB -2.340111 53.39759


Nomis

Nomis provide labour market, benefit and census data for the United Kingdom.

Example: Claimant count in Trafford for the last 13 months

  • HTTP verb: GET and POST
  • API endpoint URL: https://www.nomisweb.co.uk/api/v01/dataset/
  • Selected parameters: date, geography, gender, age, measure and measures
  • Headers: NA
  • Data format(s): JSON, CSV, xls
  • Documentation: https://www.nomisweb.co.uk/api/v01/help
  • R package: nomisr
# load the necessary packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# retrieve the name, id and available parameters for all Nomis datasets with 'claimant' as a keyword
datasets <- fromJSON("https://www.nomisweb.co.uk/api/v01/dataset/def.sdmx.json", flatten = TRUE) %>%
  map("keyfamilies") %>%
  map_df(bind_rows) %>%
  unnest(components.dimension) %>%
  select(id, name = name.value, description = description.value, parameter = conceptref) %>%
  filter(stringr::str_detect(name, regex('^.*?\\b(claimant(s)*)\\b.*?\\bage\\b.*?$', ignore_case = T))) %>%
  distinct(id, name, parameter)

# alternatively try the following link in your browser: https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*claimant*

# retrieve the codelists for all the available parameters for the 'NM_162_1' dataset
parameters <- c("https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/gender/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/age/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measure/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measures/def.sdmx.json") %>%
  map_df(~fromJSON(., flatten = TRUE) %>%
           as.data.frame() %>%
           unnest() %>%
           select(parameter = structure.codelists.codelist.name.value,
                  description = description.value, value))

# submit the request
# note that .data.json is appended to the path because we want data in JSON format
path <- "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1.data.json?"
request <- GET(url = path,
             query = list(
               date = "latestMINUS12-latest",
               geography = "E08000009",
               gender = 0,
               age = 0,
               measure = 1,
               measures = 20100))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  pluck("obs") %>%
  as_tibble()

# tidy the data
df <- response %>%
  mutate(date = as.Date(paste0(time.value, '-01'), format = '%Y-%m-%d')) %>%
  select(date,
         area_name = geography.description,
         area_code = geography.geogcode,
         measure = measure.description,
         n = obs_value.value)
Count of claimants in Trafford
date area_name area_code measure n
2020-05-01 Trafford E08000009 Claimant count 7785
2020-04-01 Trafford E08000009 Claimant count 6590
2020-03-01 Trafford E08000009 Claimant count 3615
2020-02-01 Trafford E08000009 Claimant count 3600
2020-01-01 Trafford E08000009 Claimant count 3415
2019-12-01 Trafford E08000009 Claimant count 3380
2019-11-01 Trafford E08000009 Claimant count 3510
2019-10-01 Trafford E08000009 Claimant count 3620
2019-09-01 Trafford E08000009 Claimant count 3470
2019-08-01 Trafford E08000009 Claimant count 3505
2019-07-01 Trafford E08000009 Claimant count 3455
2019-06-01 Trafford E08000009 Claimant count 3440
2019-05-01 Trafford E08000009 Claimant count 3385


Stat-Xplore

Stat-Xplore provides access to a wide variety of statistics published by the Department of Work and Pensions including Universal Credit, Carers Allowance and Disability Living Allowance.

Example: Housing benefit claimants in Trafford’s LSOAs in August 2018.

You will need an API key to access the Stat-Xplore API. Just create a free account and copy the API key from your Account information. For more details visit: https://stat-xplore.dwp.gov.uk/webapi/online-help/Open-Data-API.html

# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# add your API key
api_key <- ""

# identify the API endpoint
path <- "https://stat-xplore.dwp.gov.uk/webapi/rest/v1/table"

# build your query
query <- list(database = unbox("str:database:hb_new"),
              measures = "str:count:hb_new:V_F_HB_NEW",
              dimensions = c("str:field:hb_new:V_F_HB_NEW:COA_CODE",
                             "str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME") %>% matrix(),
              recodes = list(
                `str:field:hb_new:V_F_HB_NEW:COA_CODE` = list(
                  map = as.list(paste0("str:value:hb_new:V_F_HB_NEW:COA_CODE:V_C_MASTERGEOG11_LSOA_TO_MSOA:E0", seq(1006074, 1006211, 1)))),
                `str:field:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME` = list(
                  map = list("str:value:hb_new:F_HB_NEW_DATE:NEW_DATE_NAME:C_HB_NEW_DATE:201808"))
              )) %>% toJSON()

# submit the API request
request <- POST(
  url = path,
  body = query,
  config = add_headers(APIKey = api_key),
  encode = "json")

# check for any server error
# request$status_code

# parse the response
response <- fromJSON(content(request, as = "text"), flatten = TRUE)

# extract list items and convert to a dataframe
dimnames <- response$fields$items %>% map(~.$labels %>% unlist)
values <- response$cubes[[1]]$values
dimnames(values) <- dimnames
df <- as.data.frame.table(values, stringsAsFactors = FALSE) %>%
  as_tibble() %>%
  set_names(c(response$fields$label,"value"))
Housing Benefit claimants at LSOA level, August 2018
National - Regional - LA - OAs Month value
Trafford 025A 201808 (Aug-18) 137
Trafford 025B 201808 (Aug-18) 36
Trafford 025C 201808 (Aug-18) 26
Trafford 027A 201808 (Aug-18) 77
Trafford 024A 201808 (Aug-18) 72
Trafford 025D 201808 (Aug-18) 36


UK Police

The data.police.uk website provides incidents of police recorded crime and anti-social behaviour in England, Wales and Northern Ireland.

Example: Robberies within the borough of Trafford

  • HTTP verb: GET or POST. However, use POST for large / high resolution polygons because of a 4094 character limit with GET requests.
  • API endpoint URL: https://data.police.uk/api/crimes-street/all-crime?
  • Selected parameters: poly, date
  • Headers: NA
  • Data format(s): JSON
  • Documentation: https/://data.police.uk/docs/
  • R package: ukpolice
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite) ; library(sf)

# download a vector boundary of Trafford
bdy <- st_read("https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_3.geojson", quiet = TRUE) %>%
  filter(lad17nm == "Trafford")

# extract the coordinates and format for inclusion in the API request parameter
coords <- bdy %>%
  st_coordinates() %>%
  as.data.frame() %>%
  select(X, Y) %>%
  unite(coords, Y, X, sep = ',') %>%
  mutate(coords = sub("$", ":", coords)) %>%
  .[["coords"]] %>%
  paste(collapse = "") %>%
  str_sub(., 1, str_length(.)-1)

# sumbit the API request
path <- "https://data.police.uk/api/crimes-street/robbery"
request <- POST(url = path,
                query = list(poly = coords, date = "2018-04"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>%
  fromJSON(flatten = TRUE) %>%
  as_tibble()

# convert to a data frame
df <- data.frame(
  month = response$month,
  category = response$category,
  location = response$location.street.name,
  long = as.numeric(as.character(response$location.longitude)),
  lat = as.numeric(as.character(response$location.latitude)),
  stringsAsFactors = FALSE
)
Robberies in Trafford
month category location long lat
2018-04 robbery On or near New William Close -2.424867 53.41983
2018-04 robbery On or near Parking Area -2.352642 53.38522
2018-04 robbery On or near Crescent Road -2.343565 53.37457
2018-04 robbery On or near Padbury Close -2.396574 53.45163
2018-04 robbery On or near Stretford Road -2.346899 53.44621
2018-04 robbery On or near Petrol Station -2.352497 53.46765



  1. Examples include eurostat, fingertipsR, and WHO   ↩︎