Open Data companion

Last updated: 13 July 2018

Governments and other organisations often make open data available through Web service Application Programming Interfaces or APIs. The World Bank, UK Police, and Transport for London are just a few well-known examples. This document details the steps required to request data from these different Web service APIs using R.

Several R packages1 have been developed as clients for Web service APIs. These don’t assume any knowledge of API endpoints, HTTP requests, or data formats like XML and JSON. These are really convenient but sometimes you want to break into the ‘black box’ of APIs because you want to learn more or because there isn’t an API wrapper package available.

A quick introduction to APIs

APIs or Application Programming Interfaces are a set of rules that allow one software application to interact with another either in the same location or over a network. Inputs and outputs will vary between APIs but the process is the same: a ‘request’ that follows certain programmatic rules is submitted and a ‘response’ containing content in an expected format is returned.

There are many types of API including library-based (e.g. leafletJS) and class-based (e.g. Java) but one of the most common are Web service APIs. A client (browser) submits a Hypertext Transfer Protocol (HTTP) request to a server and the server returns a response to the client. The response contains status information about the request and may also contain the requested content.

The parameters of an HTTP request are typically contained in the URL. For example, to return a map of Manchester using the Google Maps Static API we would submit the following request:

https://maps.googleapis.com/maps/api/staticmap?center=Manchester,England&zoom=13&size=600x300&maptype=roadmap

The request contains:

  1. a URL to the API endpoint (https://maps.googleapis.com/maps/api/staticmap?) and;
  2. a query containing the parameters of the request (center=Manchester,England&zoom=13&size=600x300&maptype=roadmap). In this case, we have specified the location, zoom level, size and type of map.

Web service APIs use two key HTTP verbs to enable data requests: GET and POST. A GET request is submitted within the URL with each parameter separated by an ampersand (&). A POST request is submitted in the message body which is separate from the URL. The advantage of using POST over GET requests is that there are no character limits and the request is more secure because it is not stored in the browser’s cache.

There are several types of Web service APIs (e.g. XML-RPC, JSON-RPC and SOAP) but the most popular is Representational State Transfer or REST. RESTful APIs can return output as XML, JSON, CSV and several other data formats.

Each API has documentation and specifications which determine how data can be transferred. Unfortunately, the specifications tend to be different and the documentation can be hard to follow.

An example API request

Querying a Web service API typically involves the following steps:

  1. submit the request
  2. check for any server error
  3. parse the response
  4. convert to a data frame

In the following example we will submit a request for police reported crime data from the UK Police API. The API uses both HTTP GET and POST requests and provides content in JSON data format.

The two key R packages for submitting HTTP requests to Web service APIs and parsing the content of the response are httr and jsonlite. Let’s load them into our R session. The tidyverse package is also loaded because it contains a suite of useful functions.

library(tidyverse) ; library(httr) ; library(jsonlite)

We would like to retrieve street level crimes within a mile radius of a specific location so we need to use https://data.police.uk/api/crimes-street/all-crime? as our API endpoint (see API documentation). Rather than retrieving ‘all-crime’ lets narrow our request to retrieve only reports of burglary. This will change our path to: https://data.police.uk/api/crimes-street/burglary?

path <- "https://data.police.uk/api/crimes-street/burglary?"

Next we need to build our API request and submit it. We will use the GET function from the httr package. First we supply the path to the API endpoint and provide search parameters in the form of a list to the query argument. There are three parameters available to us:

  • lat = latitude
  • lng = longitude
  • date = and optional date in YYYY-MM format
request <- GET(url = path, 
               query = list(
                 lat = 53.421813,
                 lng = -2.330251,
                 date = "2018-05")
               )

Let’s check if the API returned an error. If the request fails the API will return a non-200 status code.

request$status_code

Next we parse the content returned from the server as text using the content function.

response <- content(request, as = "text", encoding = "UTF-8")

Then we’ll parse the JSON content and and convert it to a data frame.

df <- fromJSON(response, flatten = TRUE) %>% 
  data.frame()

Finally, we might strip out some of the variables and rename the remaining.

df <- select(df,
             month, category, 
             location = location.street.name,
             long = location.longitude,
             lat = location.latitude)
Burglaries within 1m radius of specified location
month category location long lat
2018-05 burglary On or near Priory Road -2.308988 53.427603
2018-05 burglary On or near Beaufort Avenue -2.314616 53.415363
2018-05 burglary On or near Trinity Avenue -2.307520 53.420236
2018-05 burglary On or near Hyde Grove -2.323764 53.421289

That’s it. We’ve submitted a request to the Police UK API and parsed the response into a data frame ready for use in R.


Setup

The following API requests all rely on the tidyverse, httr, and jsonlite R packages. Make sure that you have installed them.

install.packages("tidyverse", "httr", "jsonlite")


Food Standards Agency

The Food Standards Agency provide food hygiene rating data for the United Kingdom.

Example: fast food outlets in Trafford

  • HTTP verb: GET
  • API endpoint URL: http://api.ratings.food.gov.uk/Establishments
  • Selected parameters: name, address, longitude, latitude, businessTypeId, ratingKey, localAuthorityId
  • Headers: “x-api-version”, 2
  • Data format(s): JSON, XML
  • Documentation: http://api.ratings.food.gov.uk/help
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# submit the request
path <- "http://api.ratings.food.gov.uk/Establishments"
request <- GET(url = path, 
             query = list(
               localAuthorityId = 188,
               BusinessTypeId = 7844,
               pageNumber = 1,
               pageSize = 5000),
             add_headers("x-api-version" = "2"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>% 
  fromJSON(flatten = TRUE) %>% 
  pluck("establishments") %>% 
  as_tibble()

# tidy the data
df <- response %>% 
  mutate_all(funs(replace(., . == '', NA))) %>% 
  select(name = BusinessName,
         type = BusinessType,
         address1 = AddressLine1,
         address2 = AddressLine2,
         address3 = AddressLine3,
         address4 = AddressLine4,
         postcode = PostCode,
         long = geocode.longitude,
         lat = geocode.latitude) %>% 
  unite(address, address1, address2, address3, address4, remove = TRUE, sep = ", ") %>% 
  mutate(address = str_replace_all(address, "NA,", ""),
         address = str_replace_all(address, ", NA", ""),
         long = as.numeric(long),
         lat = as.numeric(lat))
Fast food outlets in Trafford
name type address postcode long lat
Amirah Balti Takeaway/sandwich shop 15 Station Road, Urmston, Manchester M41 9JG -2.352646 53.44749
Anet’s Deli Takeaway/sandwich shop 75 Cross Street, Sale M33 7HF -2.323360 53.42769
Archers Takeaway/sandwich shop 5 Navigation Road, Altrincham, Cheshire WA14 1LW -2.351075 53.39765
Barburrito Takeaway/sandwich shop 2 The Orient, Trafford Park, Manchester M17 8EH -2.349020 53.46619
Bei Jing Takeaway/sandwich shop 79 Great Stone Road, Stretford, Manchester M32 8GR -2.286284 53.45302
Canton Taste Chinese Takeaway/sandwich shop 116 Flixton Road, Urmston, Manchester M41 5AL -2.360467 53.44833


Nomis

Nomis provide labour market, benefit and census data for the United Kingdom.

Example: Claimant count in Trafford for the last 13 months

  • HTTP verb: GET and POST
  • API endpoint URL: https://www.nomisweb.co.uk/api/v01/dataset/
  • Selected parameters: date, geography, gender, age, measure and measures
  • Headers: NA
  • Data format(s): JSON, CSV, xls
  • Documentation: https://www.nomisweb.co.uk/api/v01/help
  • R package: nomisr
# load the necessary packages
library(tidyverse) ; library(httr) ; library(jsonlite)

# retrieve the name, id and available parameters for all Nomis datasets with 'claimant' as a keyword
datasets <- fromJSON("https://www.nomisweb.co.uk/api/v01/dataset/def.sdmx.json", flatten = TRUE) %>%
  map("keyfamilies") %>%
  map_df(bind_rows) %>% 
  unnest(components.dimension) %>% 
  select(id, name = name.value, description = description.value, parameter = conceptref) %>% 
  filter(stringr::str_detect(name, regex('^.*?\\b(claimant(s)*)\\b.*?\\bage\\b.*?$', ignore_case = T))) %>%
  distinct(id, name, parameter)

# alternatively try the following link in your browser: https://www.nomisweb.co.uk/api/v01/dataset/def.htm?search=*claimant*

# retrieve the codelists for all the available parameters for the 'NM_162_1' dataset
parameters <- c("https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/gender/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/age/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measure/def.sdmx.json",
           "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1/measures/def.sdmx.json") %>%
  map_df(~fromJSON(., flatten = TRUE) %>% 
           as.data.frame() %>% 
           unnest() %>% 
           select(parameter = structure.codelists.codelist.name.value, 
                  description = description.value, value))

# submit the request
# note that .data.json is appended to the path because we want data in JSON format
path <- "https://www.nomisweb.co.uk/api/v01/dataset/NM_162_1.data.json?"
request <- GET(url = path, 
             query = list(
               date = "latestMINUS12-latest",
               geography = "E08000009",
               gender = 0,
               age = 0,
               measure = 1,
               measures = 20100))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>% 
  fromJSON(flatten = TRUE) %>% 
  pluck("obs") %>% 
  as_tibble()

# tidy the data
df <- response %>% 
  mutate(date = as.Date(paste0(time.value, '-01'), format = '%Y-%m-%d')) %>% 
  select(date,
         area_name = geography.description,
         area_code = geography.geogcode,
         measure = measure.description,
         n = obs_value.value)
Count of claimants in Trafford
date area_name area_code measure n
2018-05-01 Trafford E08000009 Claimant count 3015
2018-04-01 Trafford E08000009 Claimant count 3195
2018-03-01 Trafford E08000009 Claimant count 2975
2018-02-01 Trafford E08000009 Claimant count 2920
2018-01-01 Trafford E08000009 Claimant count 2675
2017-12-01 Trafford E08000009 Claimant count 2520
2017-11-01 Trafford E08000009 Claimant count 2460
2017-10-01 Trafford E08000009 Claimant count 2490
2017-09-01 Trafford E08000009 Claimant count 2405
2017-08-01 Trafford E08000009 Claimant count 2335
2017-07-01 Trafford E08000009 Claimant count 2300
2017-06-01 Trafford E08000009 Claimant count 2320
2017-05-01 Trafford E08000009 Claimant count 2375


UK Police

The data.police.uk website provides incidents of police recorded crime and anti-social behaviour in England, Wales and Northern Ireland.

Example: Robberies within the borough of Trafford

  • HTTP verb: GET or POST. However, use POST for large / high resolution polygons because of a 4094 character limit with GET requests.
  • API endpoint URL: https://data.police.uk/api/crimes-street/all-crime?
  • Selected parameters: poly, date
  • Headers: NA
  • Data format(s): JSON
  • Documentation: https/://data.police.uk/docs/
  • R package: ukpolice
# load the necessary R packages
library(tidyverse) ; library(httr) ; library(jsonlite) ; library(sf)

# download a vector boundary of Trafford
bdy <- st_read("https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_3.geojson", quiet = TRUE) %>% 
  filter(lad17nm == "Trafford")

# extract the coordinates and format for inclusion in the API request parameter
coords <- bdy %>% 
  st_coordinates() %>%
  as.data.frame() %>%
  select(X, Y) %>% 
  unite(coords, Y, X, sep = ',') %>% 
  mutate(coords = sub("$", ":", coords)) %>% 
  .[["coords"]] %>% 
  paste(collapse = "") %>% 
  str_sub(., 1, str_length(.)-1)

# sumbit the API request
path <- "https://data.police.uk/api/crimes-street/robbery"
request <- POST(url = path,
                query = list(poly = coords, date = "2018-04"))

# check for any server error
# request$status_code

# parse the response and convert to a data frame
response <- content(request, as = "text", encoding = "UTF-8") %>% 
  fromJSON(flatten = TRUE) %>% 
  as_tibble()

# convert to a data frame
df <- data.frame(
  month = response$month,
  category = response$category,
  location = response$location.street.name,
  long = as.numeric(as.character(response$location.longitude)),
  lat = as.numeric(as.character(response$location.latitude)),
  stringsAsFactors = FALSE
)
Robberies in Trafford
month category location long lat
2018-04 robbery On or near Parking Area -2.352642 53.38522
2018-04 robbery On or near School Road -2.319606 53.42471
2018-04 robbery On or near Crescent Road -2.343565 53.37457
2018-04 robbery On or near Padbury Close -2.396574 53.45163
2018-04 robbery On or near Cecil Road -2.345447 53.37487
2018-04 robbery On or near Cecil Road -2.345447 53.37487



  1. Examples include eurostat, fingertipsR, and WHO