Hands-on Exercise 1B: Choropleth Mapping with R

Author

Goh Si Hui

Published

November 16, 2023

Modified

November 18, 2023

1 Overview

In this hands-on exercise, I will be sharing how to plot functional and truthful choropleth maps using an R package tmap.

2 Getting Started

2.1 Packages

For this exercise, we will be using tmap, sf and tidyverse packages. The code chunk below loads tmap, sf and tidyverse packages into R environment.

pacman::p_load(tmap, sf, tidyverse)
Note

We will be using the readr, tidyr and dplyr packages which are part of the tidyverse package.

2.2 Data Acquisition

The datasets we are using are downloaded from the following sources:

3 Importing Data

3.1 Importing Geospatial Data

The code chunk below uses st_read() function of sf package to import the planning subzones, which is a polygon feature data frame.

mpsz <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\sihuihui\ISSS624\Hands-on_Ex\Hands-on_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

Let’s take a look at the contents of mpsz using the following code chunk:

mpsz
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 10 features:
   OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
   PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
     Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
1  29220.19   5267.381  1630379.3 MULTIPOLYGON (((31495.56 30...
2  29782.05   3506.107   559816.2 MULTIPOLYGON (((29092.28 30...
3  29974.66   1740.926   160807.5 MULTIPOLYGON (((29932.33 29...
4  29933.77   3313.625   595428.9 MULTIPOLYGON (((27131.28 30...
5  30005.70   2825.594   387429.4 MULTIPOLYGON (((26451.03 30...
6  29991.38   4428.913  1030378.8 MULTIPOLYGON (((25899.7 297...
7  30230.86   3275.312   551732.0 MULTIPOLYGON (((27746.95 30...
8  30222.86   2208.619   290184.7 MULTIPOLYGON (((29351.26 29...
9  29893.78   6571.323  1084792.3 MULTIPOLYGON (((20996.49 30...
10 30104.18   3454.239   631644.3 MULTIPOLYGON (((24472.11 29...

3.2 Importing Attribute Data

We will use read_csv()

popdata <- read_csv("data/aspatial/respopagesexfa2011to2020.csv")

Let’s take a look at the data imported in:

list(popdata)
[[1]]
# A tibble: 738,492 × 7
   PA         SZ                     AG     Sex     FA              Pop  Time
   <chr>      <chr>                  <chr>  <chr>   <chr>         <dbl> <dbl>
 1 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   <= 60             0  2011
 2 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   >60 to 80        10  2011
 3 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   >80 to 100       30  2011
 4 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   >100 to 120      80  2011
 5 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   >120             20  2011
 6 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Males   Not Available     0  2011
 7 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females <= 60             0  2011
 8 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females >60 to 80        10  2011
 9 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females >80 to 100       40  2011
10 Ang Mo Kio Ang Mo Kio Town Centre 0_to_4 Females >100 to 120      90  2011
# ℹ 738,482 more rows
Note

We can do some exploratory data analysis after importing the data. For example, plot charts to understand more about the population data.

4 Data Preparation

Before a thematic map can be prepared, we need to prepare a data table just for year 2020 values. The data table should include the following variables:

  • PA

  • SZ

  • YOUNG: age group 0 to 4 until age group 20 to 24

  • ECONOMY ACTIVE: age group 20 to 29 up till age group 60 to 64

  • AGED: age group 65 and above

  • TOTAL: all age group

  • DEPENDENCY: the ratio between YOUNG and AGED against ECONOMY ACTIVE group

4.1 Data Wrangling

We will be using the following functions from tidyverse package:

  • pivot_wider() of tidyr package, and

  • mutate(), filter(), group_by() and select() of dplyr package

popdata2020 <- popdata %>%
  filter(Time == 2020) %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from = AG,
              values_from = POP) %>%
  mutate(YOUNG = rowSums(.[3:6])
         + rowSums(.[12])) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:11])+
         rowSums(.[13:15])) %>%
mutate(`AGED` = rowSums(.[16:21])) %>%
mutate(`TOTAL` = rowSums(.[3:21])) %>%
mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)/ `ECONOMY ACTIVE`) %>%
select(`PA`,`SZ`, `YOUNG`,
        `ECONOMY ACTIVE`, `AGED`,
        `TOTAL`,`DEPENDENCY`)

4.2 Joining Attribute Data and Geospatial Data

We will convert the values in PA and SZ files to uppercase because the values of PA and SZ fields are made up of upper and lower case while the SUBZONE_N and PLN_AREA_N are in uppercase.

popdata2020 <- popdata2020 %>%
  mutate_at(.vars = vars(PA, SZ),
            .funs = funs(toupper)) %>%
  filter(`ECONOMY ACTIVE` > 0)

Then, we will use left_join() of dplyr to join the geographical data and trribute table using planning subzone name (SUBZON_N and SZ) as the common identifiers.

mpsz_pop2020 <- left_join(mpsz, popdata2020,
                          by = c("SUBZONE_N" = "SZ"))
write_rds(mpsz_pop2020, "data/rds/mpszpop2020.rds")

5 Choropleth Mapping Geospatial Data Using tmap

To prepare thematic map using tmap, we can:

  • Plot a thematic map quickly using qtm().

  • Plot highly customisable thematic map by using tmap elements.

5.1 Plotting a choropleth map quickly using qtm()

The code chunk below draws a cartographic standard choropleth map.

tmap_mode("plot")
qtm(mpsz_pop2020, fill = "DEPENDENCY")

5.2 Creating a choropleth map by using tmap’s elements

In the following sub-section, we will share with you tmap functions that used to plot these elements.

5.2.1 Drawing a base map

The basic building block of tmap is tm_shape() followed by one or more layer elemments such as tm_fill() and tm_polygons().

In the code chunk below, tm_shape() is used to define the input data (i.e mpsz_pop2020) and tm_polygons() is used to draw the planning subzone polygons.

 tm_shape(mpsz_pop2020) + 
   tm_polygons()

5.2.2 Drawing a choropleth map using tm_polygons()

To draw a choropleth map showing the geographical distribution of a selected variable by planning subzone, we can assign the target variable (e.g. Dependency) to tm_polygons().

tm_shape(mpsz_pop2020) +
  tm_polygons("DEPENDENCY")

5.2.3 Drawing a choropleth map using tm_fill() and tm_border()

tm_polygons is a wraper of tm_fill() and tm_border(). tm_fill() shades the polygons using the default colour scheme and tm_borders() adds the borders of the shapefile onto the choropleth map.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY")

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY") + 
  tm_borders(lw = 0.1, alpha = 1)

5.2.4 Putting it all together

To draw a high quality cartographic choropleth map as shown in the figure below, tmap’s drawing elements should be used.

tm_shape(mpsz_pop2020) +    
  tm_fill("DEPENDENCY",
          style = "quantile",
          palette = "Blues",
          title = "Dependency ratio") +
  tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone",
            main.title.position = "center",
            main.title.size = 1.2,             
            legend.height = 0.45,             
            legend.width = 0.35,             
            frame = TRUE) +    
  tm_borders(alpha = 0.5) +   
  tm_compass(type = "8star", size = 2) +    
  tm_scale_bar() +   
  tm_grid(alpha = 0.2) +   
  tm_credits("Source: Planning Subzone boundary from Urban Redevelopment Authority (URA) \n and Population data from Department of Statistics (DOS)",              
             position = c("left", "bottom")) 

5.3 Data Classification Methods of tmap

Most choropleth maps use some methods of data classification in order to take a large number of observations and group them into data ranges or classes.

tmap provides a total ten data classification methods, namely: fixed, sd, equal, pretty (default), quantile, kmeans, hclust, bclust, fisher, and jenks.

Note

For more information on data classification for choropleth maps, you can refer here.

To define a data classification method, the style argument of tm_fill() or tm_polygons() will be used.

5.3.1 Plotting choropleth maps with built-in classification methods

The code chunk below shows a jenks data classification that used 5 classes

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          n = 5, 
          style = "jenks") + 
  tm_borders(alpha = 0.5)

The code chunk below uses the equal data classification method.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          n = 5,
          style = "equal") +
  tm_borders(alpha = 0.5)

5.3.2 Plotting choropleth map with custom break

We can also compute our own category breaks.

First, we will compute and display the descriptive statistics of DEPENDENCY field.

summary(mpsz_pop2020$DEPENDENCY)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.0000  0.7113  0.7926  0.8561  0.8786 19.0000      92 

Using the above results, we set the break points at 0.60, 0.70, 0.80, 0.90, and 0 will be the minimum while 1 will be the maximum.

We plot the choropleth map with our customised breaks using the following code chunk:

tm_shape(mpsz_pop2020) +
  tm_fill("DEPENDENCY", 
          breaks = c(0.0,0.60,0.70,0.80, 0.90, 1.00)) + 
  tm_borders(alpha = 0.5)

5.4 Colour Scheme

tmap supports colour ramps either defined by the user or a set of predefined colour ramps from the RColorBrewer package.

5.4.1 Using ColourBrewer palette

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          n = 6,
          style = "quantile", 
          palette = "Blues") +
  tm_borders(alpha = 0.5)

To revese the colour shading, add a “-” prefix under “palette”.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          n = 6, 
          style = "quantile",
          palette = "-Greens") + 
  tm_borders(alpha = 0.5)

5.5 Map Layouts

5.5.1 Map Legend

In tmap, several legend options are available to change the placement, format and appearance of the legend.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          style = "jenks",
          palette = "Blues", 
          legend.hist = TRUE,
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1) +
  tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone \n (Jenks Classification)",
            main.title.position = "center",
            main.title.size = 1,
            legend.height = 0.45,
            legend.width = 0.35,
            legend.outside = FALSE,
            legend.position = c("right", "bottom"),
            frame = FALSE) + 
  tm_borders(alpha = 0.5)

5.5.2 Map Style

tmap allows a wide variety of layout settings to be changed. They can be called by using tmap_style().

The code chunk below shows the classic style is used.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          style = "quantile",
          palette = "-Greens") + 
  tm_borders(alpha = 0.5) + 
  tmap_style("classic")

5.5.3 Cartographic Furniture

tmap also also provides arguments to draw other map furniture such as compass, scale bar and grid lines.

In the code chunk below, tm_compass(), tm_scale_bar() and tm_grid() are used to add compass, scale bar and grid lines onto the choropleth map.

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", style = "quantile",
          palette = "Blues", 
          title = "No. of persons") +
  tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone \n (Based on Quantile)",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45,
            legend.width = 0.35,
            frame = TRUE) + 
  tm_borders(alpha = 0.5) + 
  tm_compass(type = "8star", size = 2) + 
  tm_scale_bar(width = 0.15) + 
  tm_grid(lwd = 0.1, alpha = 0.2) + 
  tm_credits("Source: Planning Subzone boundary from Urban Redevelopment Authority (URA) \n and Population data from Department of Statistics (DOS)",
             position = c("left", "bottom"))

To return to the previous map style, use the following code chunk:

tmap_style("white")

5.6 Drawing Multiple Small Choropleth Maps

In tmap, small multiple maps (also known as facet maps) can be plotted in three ways:

  • by assigning multiple values to at least one of the asthetic arguments,

  • by defining a group-by variable in tm_facets(), and

  • by creating multiple stand-alone maps with tmap_arrange().

5.6.1 By assigning multiple values to at least one of the aesthetic arguments

tm_shape(mpsz_pop2020)+
  tm_fill(c("YOUNG", "AGED"),
          style = "equal", 
          palette = "Blues")+
  tm_layout(legend.position = c("right", "bottom")) + 
  tm_borders(alpha = 0.5) + 
  tmap_style("white")

tm_shape(mpsz_pop2020)+
  tm_polygons(c("DEPENDENCY", "AGED"),
              style = c("equal", "quantile"),
              palette = list("Blues", "Greens")) + 
  tm_layout(legend.position = c("right", "bottom"))

5.6.2 By defining a group-by variable in tm_facets()

tm_shape(mpsz_pop2020) + 
  tm_fill("DEPENDENCY", 
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="REGION_N", 
            free.coords = TRUE,
            drop.shapes = TRUE) + 
  tm_layout(legend.show = FALSE,
            title.position = c("center", "center"),
            title.size = 20) + 
  tm_borders(alpha = 0.5)

5.6.3 By creating multiple stand-alone maps with tmap_arrange()

youngmap <- tm_shape(mpsz_pop2020) + 
  tm_polygons("YOUNG", 
              style = "quantile",
              palette = "Blues")

agedmap <- tm_shape(mpsz_pop2020) + 
  tm_polygons("AGED", 
              style = "quantile",
              palette = "Blues")

tmap_arrange(youngmap, agedmap, asp = 1, ncol = 2)

5.7 Mapping Spatial Object Meeting a Selection Criterion

Instead of creating small multiple choropleth map, you can also use selection funtion to map spatial objects meeting the selection criterion.

tm_shape(mpsz_pop2020[mpsz_pop2020$REGION_N == "CENTRAL REGION", ]) +
  tm_fill("DEPENDENCY", 
          style = "quantile",
          palette = "Blues",
          legend.hist = TRUE,
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1)+
  tm_layout(legend.outside = TRUE,
            legend.height = 0.45,
            legend.width = 5.0,
            legend.position = c("right", "bottom"), 
            frame = FALSE) + 
  tm_borders(alpha = 0.5)