::p_load(sf,spdep,tmap,tidyverse,knitr) pacman
Hands-on Exercise 2B: Global and Local Measures of Spatial Autocorrelations
1 Overview
In this hands-on exercise, we will learn how to compute Global and Local Measure of Spatial Autocorrelation (GLSA) using the spdep package.
2 The Analytical Question
In spatial policy, one of the main development objective of the local government and planners is to ensure equal distribution of development in the province. Our task in this study, hence, is to apply appropriate spatial statistical methods to discover if development are even distributed geographically. If the answer is No. Then, our next question will be “is there sign of spatial clustering?”. And, if the answer for this question is yes, then our next question will be “where are these clusters?”
In this case study, we are interested to examine the spatial pattern of a selected development indicator (i.e. GDP per capita) of Hunan Province, People’s Republic of China.
3 Getting Started
3.1 Packages
First, we will import the relevant packages that we will be using for this hands-on exercise.
We will be using the following packages:
sf: to import and handle geospatial data,
tidyverse: to handle and wrangle attribute data,
knitr: to generate tables for matrices,
spdep: to compute spatial weights, global and local spatial autocorrelation statistics; and
tmap: to prepare and plot cartographic quality chropleth map.
3.2 Importing Data
The datasets used in this hands-on exercise are:
Hunan county boundary layer:
a geospatial data set in ESRI shapefile formatHunan_2012.csv
: an aspatial data set in csv format. It contains selected Hunan’s local development indicators in 2012.
The datasets from this exercise were provided as part of the coursework and downloaded from the student learning portal.
3.2.1 Geospatial Data
First, we will use st_read()
of sf package to import Hunan county boundary layer
(a shapefile) into R.
<- st_read(dsn = "data/geospatial", layer = "Hunan") hunan
Reading layer `Hunan' from data source
`C:\sihuihui\ISSS624\Hands-on_Ex\Hands-on_Ex2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS: WGS 84
glimpse(hunan)
Rows: 88
Columns: 8
$ NAME_2 <chr> "Changde", "Changde", "Changde", "Changde", "Changde", "Cha…
$ ID_3 <int> 21098, 21100, 21101, 21102, 21103, 21104, 21109, 21110, 211…
$ NAME_3 <chr> "Anxiang", "Hanshou", "Jinshi", "Li", "Linli", "Shimen", "L…
$ ENGTYPE_3 <chr> "County", "County", "County City", "County", "County", "Cou…
$ Shape_Leng <dbl> 1.869074, 2.360691, 1.425620, 3.474325, 2.289506, 4.171918,…
$ Shape_Area <dbl> 0.10056190, 0.19978745, 0.05302413, 0.18908121, 0.11450357,…
$ County <chr> "Anxiang", "Hanshou", "Jinshi", "Li", "Linli", "Shimen", "L…
$ geometry <POLYGON [°]> POLYGON ((112.0625 29.75523..., POLYGON ((112.2288 …
From the output, we know that hunan
is a polygon sf dataframe with 88 features and 7 fields. It also uses a WGS84 geometric coordinates system.
3.2.2 Aspatial Data
We will import Hunan_2012.csv
into R using read_csv()
of readr package.
<- read_csv("data/aspatial/Hunan_2012.csv") hunan2012
glimpse(hunan2012)
Rows: 88
Columns: 29
$ County <chr> "Anhua", "Anren", "Anxiang", "Baojing", "Chaling", "Changn…
$ City <chr> "Yiyang", "Chenzhou", "Changde", "Hunan West", "Zhuzhou", …
$ avg_wage <dbl> 30544, 28058, 31935, 30843, 31251, 28518, 54540, 28597, 33…
$ deposite <dbl> 10967.0, 4598.9, 5517.2, 2250.0, 8241.4, 10860.0, 24332.0,…
$ FAI <dbl> 6831.7, 6386.1, 3541.0, 1005.4, 6508.4, 7920.0, 33624.0, 1…
$ Gov_Rev <dbl> 456.72, 220.57, 243.64, 192.59, 620.19, 769.86, 5350.00, 1…
$ Gov_Exp <dbl> 2703.0, 1454.7, 1779.5, 1379.1, 1947.0, 2631.6, 7885.5, 11…
$ GDP <dbl> 13225.0, 4941.2, 12482.0, 4087.9, 11585.0, 19886.0, 88009.…
$ GDPPC <dbl> 14567, 12761, 23667, 14563, 20078, 24418, 88656, 10132, 17…
$ GIO <dbl> 9276.90, 4189.20, 5108.90, 3623.50, 9157.70, 37392.00, 513…
$ Loan <dbl> 3954.90, 2555.30, 2806.90, 1253.70, 4287.40, 4242.80, 4053…
$ NIPCR <dbl> 3528.3, 3271.8, 7693.7, 4191.3, 3887.7, 9528.0, 17070.0, 3…
$ Bed <dbl> 2718, 970, 1931, 927, 1449, 3605, 3310, 582, 2170, 2179, 1…
$ Emp <dbl> 494.310, 290.820, 336.390, 195.170, 330.290, 548.610, 670.…
$ EmpR <dbl> 441.4, 255.4, 270.5, 145.6, 299.0, 415.1, 452.0, 127.6, 21…
$ EmpRT <dbl> 338.0, 99.4, 205.9, 116.4, 154.0, 273.7, 219.4, 94.4, 174.…
$ Pri_Stu <dbl> 54.175, 33.171, 19.584, 19.249, 33.906, 81.831, 59.151, 18…
$ Sec_Stu <dbl> 32.830, 17.505, 17.819, 11.831, 20.548, 44.485, 39.685, 7.…
$ Household <dbl> 290.4, 104.6, 148.1, 73.2, 148.7, 211.2, 300.3, 76.1, 139.…
$ Household_R <dbl> 234.5, 121.9, 135.4, 69.9, 139.4, 211.7, 248.4, 59.6, 110.…
$ NOIP <dbl> 101, 34, 53, 18, 106, 115, 214, 17, 55, 70, 44, 84, 74, 17…
$ Pop_R <dbl> 670.3, 243.2, 346.0, 184.1, 301.6, 448.2, 475.1, 189.6, 31…
$ RSCG <dbl> 5760.60, 2386.40, 3957.90, 768.04, 4009.50, 5220.40, 22604…
$ Pop_T <dbl> 910.8, 388.7, 528.3, 281.3, 578.4, 816.3, 998.6, 256.7, 45…
$ Agri <dbl> 4942.253, 2357.764, 4524.410, 1118.561, 3793.550, 6430.782…
$ Service <dbl> 5414.5, 3814.1, 14100.0, 541.8, 5444.0, 13074.6, 17726.6, …
$ Disp_Inc <dbl> 12373, 16072, 16610, 13455, 20461, 20868, 183252, 12379, 1…
$ RORP <dbl> 0.7359464, 0.6256753, 0.6549309, 0.6544614, 0.5214385, 0.5…
$ ROREmp <dbl> 0.8929619, 0.8782065, 0.8041262, 0.7460163, 0.9052651, 0.7…
3.3 Performing Relational Join
We will update the attribute table of hunan
’s spatial polygons dataframe with the attribute fields of hunan2012
dataframe using the left_join()
of dplyr package.
<- left_join(hunan, hunan2012,
hunan_joined by="County")
kable(head(hunan_joined))
NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | Shape_Leng | Shape_Area | County | City | avg_wage | deposite | FAI | Gov_Rev | Gov_Exp | GDP | GDPPC | GIO | Loan | NIPCR | Bed | Emp | EmpR | EmpRT | Pri_Stu | Sec_Stu | Household | Household_R | NOIP | Pop_R | RSCG | Pop_T | Agri | Service | Disp_Inc | RORP | ROREmp | geometry |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Changde | 21098 | Anxiang | County | 1.869074 | 0.1005619 | Anxiang | Changde | 31935 | 5517.2 | 3541.0 | 243.64 | 1779.5 | 12482.0 | 23667 | 5108.9 | 2806.9 | 7693.7 | 1931 | 336.39 | 270.5 | 205.9 | 19.584 | 17.819 | 148.1 | 135.4 | 53 | 346.0 | 3957.9 | 528.3 | 4524.41 | 14100 | 16610 | 0.6549309 | 0.8041262 | POLYGON ((112.0625 29.75523… |
Changde | 21100 | Hanshou | County | 2.360691 | 0.1997875 | Hanshou | Changde | 32265 | 7979.0 | 8665.0 | 386.13 | 2062.4 | 15788.0 | 20981 | 13491.0 | 4550.0 | 8269.9 | 2560 | 456.78 | 388.8 | 246.7 | 42.097 | 33.029 | 240.2 | 208.7 | 95 | 553.2 | 4460.5 | 804.6 | 6545.35 | 17727 | 18925 | 0.6875466 | 0.8511756 | POLYGON ((112.2288 29.11684… |
Changde | 21101 | Jinshi | County City | 1.425620 | 0.0530241 | Jinshi | Changde | 28692 | 4581.7 | 4777.0 | 373.31 | 1148.4 | 8706.9 | 34592 | 10935.0 | 2242.0 | 8169.9 | 848 | 122.78 | 82.1 | 61.7 | 8.723 | 7.592 | 81.9 | 43.7 | 77 | 92.4 | 3683.0 | 251.8 | 2562.46 | 7525 | 19498 | 0.3669579 | 0.6686757 | POLYGON ((111.8927 29.6013,… |
Changde | 21102 | Li | County | 3.474324 | 0.1890812 | Li | Changde | 32541 | 13487.0 | 16066.0 | 709.61 | 2459.5 | 20322.0 | 24473 | 18402.0 | 6748.0 | 8377.0 | 2038 | 513.44 | 426.8 | 227.1 | 38.975 | 33.938 | 268.5 | 256.0 | 96 | 539.7 | 7110.2 | 832.5 | 7562.34 | 53160 | 18985 | 0.6482883 | 0.8312558 | POLYGON ((111.3731 29.94649… |
Changde | 21103 | Linli | County | 2.289506 | 0.1145036 | Linli | Changde | 32667 | 564.1 | 7781.2 | 336.86 | 1538.7 | 10355.0 | 25554 | 8214.0 | 358.0 | 8143.1 | 1440 | 307.36 | 272.2 | 100.8 | 23.286 | 18.943 | 129.1 | 157.2 | 99 | 246.6 | 3604.9 | 409.3 | 3583.91 | 7031 | 18604 | 0.6024921 | 0.8856065 | POLYGON ((111.6324 29.76288… |
Changde | 21104 | Shimen | County | 4.171918 | 0.3719471 | Shimen | Changde | 33261 | 8334.4 | 10531.0 | 548.33 | 2178.8 | 16293.0 | 27137 | 17795.0 | 6026.5 | 6156.0 | 2502 | 392.05 | 329.6 | 193.8 | 29.245 | 26.104 | 190.6 | 184.7 | 122 | 399.2 | 6490.7 | 600.5 | 5266.51 | 6981 | 19275 | 0.6647794 | 0.8407091 | POLYGON ((110.8825 30.11675… |
As we intend to only show the distribution of Gross Domestic Product Per Capita (GDPPC), we can drop some of the columns that we will not be using by selecting the columns that we want using select()
.
<- hunan_joined %>%
hunans select(c(1:4, 7, 15))
kable(head(hunans))
NAME_2 | ID_3 | NAME_3 | ENGTYPE_3 | County | GDPPC | geometry |
---|---|---|---|---|---|---|
Changde | 21098 | Anxiang | County | Anxiang | 23667 | POLYGON ((112.0625 29.75523… |
Changde | 21100 | Hanshou | County | Hanshou | 20981 | POLYGON ((112.2288 29.11684… |
Changde | 21101 | Jinshi | County City | Jinshi | 34592 | POLYGON ((111.8927 29.6013,… |
Changde | 21102 | Li | County | Li | 24473 | POLYGON ((111.3731 29.94649… |
Changde | 21103 | Linli | County | Linli | 25554 | POLYGON ((111.6324 29.76288… |
Changde | 21104 | Shimen | County | Shimen | 27137 | POLYGON ((110.8825 30.11675… |
4 Visualising Regional Development Indicator
We will show the distribution of Gross Domestic Product per Capita (GDPPC) 2012 using qtm()
of tmap package using the following code chunk.
<- tm_shape(hunans) +
equal tm_fill("GDPPC",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal interval classification")
<- tm_shape(hunans) +
quantile tm_fill("GDPPC",
n = 5,
style = "quantile") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal quantile classification")
tmap_arrange(equal,
quantile, asp=1,
ncol=2)
5 Global Spatial Autocorrelation
In this section, we will learn how to compute global spatial autocorrelation statistics and perform spatial complete randomness test for global spatial autocorrelation.
5.1 Computing Contiguity Spatial Weights
Before we can compute the global spatial autocorrelation statistics, we need to construct spatial weights of the study area. Spatial weights is used to define the neighbourhood relationships between the geographical units (i.e., county) in the study area.
To learn more about computing spatial weights, please refer to Hands-on Exercise 2a.
In the code chunk below, poly2nb()
of spdep package is used to compute contiguity weight matrices for the study area. This function builds a neighbours list based on regions with contiguous boundaries.
As mentioned in Hands-on Exercise 2a, poly2nb()
’s default argument for Queen
is queen=TRUE
, meaning that the function computes Queen contiguity by default. If we want to compute Rook contiguity, we need to set queen=FALSE
.
We use the following code chunk to compute Queen contiguity weight matrix.
<- poly2nb(hunans, queen = TRUE)
wm_q summary(wm_q)
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Link number distribution:
1 2 3 4 5 6 7 8 9 11
2 2 12 16 24 14 11 4 2 1
2 least connected regions:
30 65 with 1 link
1 most connected region:
85 with 11 links
The summary report above shows that there are 88 regions in Hunan. The most connected region has 11 neighbours and there are 2 regions with only 1 neighbours. On average, each region has 5.090909 neighbours.
5.2 Row-standardised Weight Matrix
Next, we need to assign weights to each neighbouring polygon. We will assign each neighbouring polygon with equal weight (style="W"
). This is accomplished by assigning the fraction 1/(total number of neighbours) to each neighbouring county then summing the weighted income values.
While assigning each neighbouring polygon with the same weight is most intuitive way to summarise the neighbours’ values, polygons which are situated along the edges of the map will base their lagged values on fewer polygons (due to the nature of their positions on the map). This could cause potential over- or under- estimation of the true nature of the spatial autocorrelation in the data.
For the purpose of this hands-on exercise, we will use the style="W"
option for simplicity sake.
The nb2listw()
function can take in the following styles:
B is the basic binary coding
W is row standardised (sums over all links to n)
C is globally standardised (sums over all links to n)
U is equal to C divided by the number of neighbours (sums over all links to unity)
S is the variance-stabilizing coding scheme proposed by Tiefelsdorf et al. 1999
minmax is based on Kelejian and Prucha (2010), and divides the weights by the minimum of the maximum row sums and maximum column sums of the input weights. It is similar to the C and U styles.
<- nb2listw(wm_q, style="W", zero.policy = TRUE)
rswm_q rswm_q
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88
Number of nonzero links: 448
Percentage nonzero weights: 5.785124
Average number of links: 5.090909
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 88 7744 88 37.86334 365.9147
The zero.policy=TRUE
option allows for lists of non-neighbors. This should be used with caution since the user may not be aware of missing neighbors in their dataset however, a zero.policy = FALSE
would return an error.
5.3 Global Spatial Autocorrelation: Moran’s I
In this section, we will learn how to perform Moran’s I statistics testing using moran.test()
of spdep package.
Moran’s test for spatial autocorrelation using a spatial weights matrix in weights list form. The assumptions underlying the test are sensitive to the form of the graph of neighbour relationships and other factors, and results may be checked against those of moran.mc
permutations.
5.3.1 Moran’s I test
The code chunk below performs Moran’s I statistical testing using moran.test()
of spdep package.
moran.test(hunans$GDPPC,
listw = rswm_q,
zero.policy = TRUE,
alternative = "greater",
na.action = na.omit)
Moran I test under randomisation
data: hunans$GDPPC
weights: rswm_q
Moran I statistic standard deviate = 4.7351, p-value = 1.095e-06
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.300749970 -0.011494253 0.004348351
The Moran I test is a measure of spatial autocorrelation, which assesses whether the observed pattern in the spatial distribution of a variable is different from what would be expected under spatial randomness.
From the above outcome, we see that the Moran I statistic is 0.30075, and its standard deviation is 4.7351. The p-value is 1.095e-06, which is very small. Assuming that our chosen significance level is 0.05, we would reject the null hypothesis since p-value < 0.05, suggesting strong evidence against the null hypothesis of spatial randomness.
The alternative hypothesis is “greater”, indicating that we are testing if there is a positive spatial autocorrelation (i.e., similar values are close to each other). We can specify the alternative hypothesis using alternative =
. The default value is "greater"
, but it can be changed to "less"
or "two.sided"
.
Hence, the results suggests that there is a significant positive spatial autocorrelation in the variable GDPPC and the observed spatial pattern is not likely to have occurred by random chance.
5.3.2 Computing Monte Carlo Moran’s I
If we doubt that the assumptions of Moran’s I are true (normality and randomisation), we can use a Monte Carlo simulation. The purpose of the Monte Carlo simulation is to estimate the significance of the Moran I statistic through random permutations. We will: - Simulate Moran’s I n times under the assumption of no spatial pattern, - Assign all regions the mean value, - Calculate Moran’s I, - Compare the actual value of Moran’s I to randomly simulated distribution to obtain p-value (pseudo significance).
The code chunk below performs permutation test for Moran’s I statistic using moran.mc()
of spdep package. A total of 1000 simulation will be performed.
set.seed(1234)
= moran.mc(hunans$GDPPC,
bperm listw = rswm_q,
nsim = 999,
zero.policy = TRUE,
na.action = na.omit)
The simulation was run with 999 permutations (nsim = 999
) plus the observed statistic, making a total of 1000 simulations.
The Monte Carlo simulation supports the earlier findings from the Moran I test. The small p-value (0.001) indicates that the observe spatial pattern in the variable GDPPC is unlikely due to random chance and there is a strong evidence of positive spatial autocorrelation.
5.3.3 Visualising Monte Carlo Moran’s I
It is a good practice for us the examine the simulated Moran’s I test statistics in greater detail. This can be achieved by plotting the distribution of the statistical values as a histogram by using the code chunk below.
mean(bperm$res[1:999])
[1] -0.01504572
var(bperm$res[1:999])
[1] 0.004371574
summary(bperm$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.18339 -0.06168 -0.02125 -0.01505 0.02611 0.27593
We will use hist() and abline() of R Graphics to plot the histogram.
hist(bperm$res,
freq=TRUE,
breaks = 20,
xlab = "Simulated Moran's I")
abline(v=0,
col="red")
The observed statistic from Monte Carlo Moran’s I Simulation is 0.300749970, which falls way to the right of the histogram distribution suggesting that GDPPC values are clustered (a positive Moran’s I value suggests clustering while a negative Moran’sI value suggests dispersion).
5.4 Global Spatial Autocorrelation: Geary’s C
In this section, we will learn how to perform Geary’s c statistics testing by using appropriate functions of spdep package.
5.4.1 Geary’s C test
The code chunk below performs Geary’s C test for spatial autocorrelation by using geary.test()
of spdep package.
geary.test(hunans$GDPPC, listw = rswm_q)
Geary C test under randomisation
data: hunans$GDPPC
weights: rswm_q
Geary C statistic standard deviate = 3.6108, p-value = 0.0001526
alternative hypothesis: Expectation greater than statistic
sample estimates:
Geary C statistic Expectation Variance
0.6907223 1.0000000 0.0073364
The p-value associated with the Geary C test is 0.0001526, which is very small and less than 0.05. This means that we can reject the null hypothesis of spatial randomness.
The alternative hypothesis’s default value is “Expectation greater than statistic”, indicating that we are testing whether the expected value of Geary’s C is greater than the observed statistic. This suggests positive spatial autocorrelation.
The Geary’s C test result suggests that there is significant positive spatial autocorrelation in the variable GDPPC in the hunans
dataset based on the specified spatial weights matrix. The observed spatial pattern is not likely to have occurred by random chance.
5.4.2 Computing Monte Carlo Geary’s C
The code chunk below performs permutation test for Geary’s C statistuc using geary.mc()
of spdep package.
set.seed(1234)
= geary.mc(hunans$GDPPC,
bperm_g listw = rswm_q,
nsim = 999)
bperm
Monte-Carlo simulation of Moran I
data: hunans$GDPPC
weights: rswm_q
number of simulations + 1: 1000
statistic = 0.30075, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
5.4.3 Visualising Monte Carlo Geary’s C
We will plot a histogram to reveal the distribution of the simulated values using the following code chunks.
mean(bperm_g$res[1:999])
[1] 1.004402
var(bperm_g$res[1:999])
[1] 0.007436493
summary(bperm_g$res[1:999])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7142 0.9502 1.0052 1.0044 1.0595 1.2722
hist(bperm_g$res, freq = TRUE, breaks = 20, xlab="Simulated Geary C")
abline(v=1, col = "red")
6 Spatial Correlogram
Spatial correlograms are great to examine patterns of spatial autocorrelation in your data or model residuals. They show how correlated are pairs of spatial observations when you increase the distance (lag) between them - they are plots of some index of autocorrelation (Moran’s I or Geary’s c) against distance.Although correlograms are not as fundamental as variograms (a keystone concept of geostatistics), they are very useful as an exploratory and descriptive tool. For this purpose they actually provide richer information than variograms.
6.1 Compute Moran’s I Correlogram
We will use sp.correlogram()
of spdep package to compute a 6-lag spatial correlogram of GDPPC for Moran’s I (method = "I"
).We will then use the plot() function of base Graph to plot the output.
<- sp.correlogram(wm_q, hunans$GDPPC,
MI_corr order = 6, method = "I",
style = "W")
plot(MI_corr)
Plotting the output might not allow us to provide complete interpretation because not all autocorrelation values are statistically significant. Hence, it is important for us to examine the full analysis report by printing out the analysis results as in the code chunk below.
print(MI_corr)
Spatial correlogram for hunans$GDPPC
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.3007500 -0.0114943 0.0043484 4.7351 2.189e-06 ***
2 (88) 0.2060084 -0.0114943 0.0020962 4.7505 2.029e-06 ***
3 (88) 0.0668273 -0.0114943 0.0014602 2.0496 0.040400 *
4 (88) 0.0299470 -0.0114943 0.0011717 1.2107 0.226015
5 (88) -0.1530471 -0.0114943 0.0012440 -4.0134 5.984e-05 ***
6 (88) -0.1187070 -0.0114943 0.0016791 -2.6164 0.008886 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6.2 Compute Geary’s C Correlogram and Plot
We will use sp.correlogram()
of spdep package to compute a 6-lag spatial correlogram of GDPPC for Geary’s C (method = "C"
).We will then use the plot() function of base Graph to plot the output.
<- sp.correlogram(wm_q,
GC_corr $GDPPC,
hunansorder = 6,
method = "C",
style = "W")
plot(GC_corr)
Similar to the previous step, we will print out the analysis report by using the code chunk below.
print(GC_corr)
Spatial correlogram for hunans$GDPPC
method: Geary's C
estimate expectation variance standard deviate Pr(I) two sided
1 (88) 0.6907223 1.0000000 0.0073364 -3.6108 0.0003052 ***
2 (88) 0.7630197 1.0000000 0.0049126 -3.3811 0.0007220 ***
3 (88) 0.9397299 1.0000000 0.0049005 -0.8610 0.3892612
4 (88) 1.0098462 1.0000000 0.0039631 0.1564 0.8757128
5 (88) 1.2008204 1.0000000 0.0035568 3.3673 0.0007592 ***
6 (88) 1.0773386 1.0000000 0.0058042 1.0151 0.3100407
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
7 Cluster and Outlier Analysis
Local Indicators of Spatial Association (LISA) are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. For instance, if we are studying cancer rates among census tracts in a given city, local clusters in the rates mean that there are areas that have higher or lower rates than is to be expected by chance alone; that is, the values occurring are above or below those of a random distribution in space.
In this section, we will learn how to apply appropriate LISA, especially local Moran’I, to detect clusters and/or outliers from GDPPC of Hunan Province.
7.1 Computing Local Moran’s I
To compute local Moran’s I, the localmoran()
function of spdep will be used. localmoran()
computes li values, given a set of zi values and a listw object providing neighbouring weights information for the polygon associated with the zi values.
The code chunk below computes local Moran’s I of GDPPC at the county level.
<- order(hunans$County)
fips <- localmoran(hunans$GDPPC, rswm_q)
localMI head(localMI)
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 -0.001468468 -2.815006e-05 4.723841e-04 -0.06626904 0.9471636
2 0.025878173 -6.061953e-04 1.016664e-02 0.26266425 0.7928094
3 -0.011987646 -5.366648e-03 1.133362e-01 -0.01966705 0.9843090
4 0.001022468 -2.404783e-07 5.105969e-06 0.45259801 0.6508382
5 0.014814881 -6.829362e-05 1.449949e-03 0.39085814 0.6959021
6 -0.038793829 -3.860263e-04 6.475559e-03 -0.47728835 0.6331568
localmoran()
function returns a matrix of values with the following columns:
li: the local Moran’s I statistics
E.li: the expectation of local moran statistic under the randomisation hypothesis
Var.li: the variance of local moran statistic under the randomisation hypothesis
Z.li: the standard deviation of local moran statistic
Pr(): the p-value of local moran statistic
We use printCoefmat()
to list the content of the local Moran matrix.
7.2 Mapping Local Moran’s I values and p-values
Before mapping the local Moran’s I map, we append the local Moran’s I dataframe (localMI
) onto the Hunan Spatial Polygon Data Frame (hunans
).
<- cbind(hunans,localMI) %>%
hunan.localMI rename(Pr.Ii = Pr.z....E.Ii..)
We then use choropleth mapping functions of tmap package to plot the local Moran’s I values using the following code chunk:
tm_shape(hunan.localMI) +
tm_fill(col= "Ii",
style = "pretty",
palette = "RdBu",
title = "Local Moran Statistics") +
tm_borders(alpha = 0.5)
The choropleth map shows that there are both positive and negative li values. Hence, we should consider the p-values for each of these values to determine their significance.
The following code chunk creates a choropleth map of Moran’s I p-values using tmap package.
tm_shape(hunan.localMI) +
tm_fill(col= "Pr.Ii",
breaks = c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette = "-Blues",
title = "Local Moran's I p-values") +
tm_borders(alpha = 0.5)
For effective interpretation, we plot both the local Moran’s I values map and its corresponding p-values map next to each other.
<- tm_shape(hunan.localMI) +
localMI.map tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran Statistics") +
tm_borders(alpha = 0.5)
<- tm_shape(hunan.localMI) +
pvalue.map tm_fill(col = "Pr.Ii",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)
tmap_arrange(localMI.map, pvalue.map, asp=1, ncol=2)
8 Creating a LISA Cluster Map
The LISA Cluster Map shows the significant locations color coded by type of spatial autocorrelation. Before we generate the LISA cluster map, we would need to plot the Moran Scatterplot.
###Plotting Moran Scatterplot The Moran scatterplot is an illustration of the relationship between the values of the chosen attribute at each location and the average value of the same attribute at neighboring locations.
<- moran.plot(hunans$GDPPC, rswm_q,
nci labels=as.character(hunans$County),
xlab="GDPPC 2012",
ylab="Spatially Lag GDPPC 2012")
Notice that the plot is split in 4 quadrants. The top right corner belongs to areas that have high GDPPC and are surrounded by other areas that have the average level of GDPPC.
8.1 Plotting Moran Scatterplot with Standardised Variable
We will use scale()
to center and scale the variable. Here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations.
$Z.GDPPC <- scale(hunans$GDPPC) %>%
hunansas.vector()
The as.vector()
added to the end if to make sure that the data type we get from this vector maps neatly onto our dataframe.
We now plot the Moran scatterplot with standardised variable using the following code chunk.
<- moran.plot(hunans$Z.GDPPC, rswm_q,
nci2 labels = as.character((hunans$County), xlab= "z-GDPPC 2012", ylab="Spatially Lag z-GDPPC 2012"))
8.2 Preparing LISA Map Classes
First we generate the quadrants of the LISA cluster map.
<-vector(mode="numeric", length=nrow(localMI)) quadrant
Next, we derive the spatially legged variable of interest (i.e., GDPPC) and centers the spatially lagged variable around its mean.
$lag_GDPPC <- lag.listw(rswm_q, hunans$GDPPC)
hunans
<- hunan$lag_GDPPC - mean(hunans$lag_GDPPC) DV
Then, we center the local Moran’s around the mean.
<- localMI[,1] - mean(localMI[,1]) LM_I
Next, we will set a statistical significance level for the local Moran.
<- 0.05 signif
We also define the low-low(1), low-high(2), high-low(3) and high-high(4) categories.
<0 & LM_I>0] <- 1
quadrant[DV >0 & LM_I<0] <- 2
quadrant[DV <0 & LM_I<0] <- 3
quadrant[DV >0 & LM_I>0] <- 4 quadrant[DV
Lastly,we place non-significant Moran in the category 0.
5]>signif] <- 0 quadrant[localMI[,
<- vector(mode="numeric",length=nrow(localMI))
quadrant $lag_GDPPC <- lag.listw(rswm_q, hunans$GDPPC)
hunans<- hunans$lag_GDPPC - mean(hunans$lag_GDPPC)
DV <- localMI[,1]
LM_I <- 0.05
signif <0 & LM_I>0] <- 1
quadrant[DV >0 & LM_I<0] <- 2
quadrant[DV <0 & LM_I<0] <- 3
quadrant[DV >0 & LM_I>0] <- 4
quadrant[DV 5]>signif] <- 0 quadrant[localMI[,
8.3 Plotting LISA Map
Now, we can build the LISA map by using the following code chunk.
$quadrant <- quadrant
hunan.localMI<- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
colors <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
clusters
tm_shape(hunan.localMI) +
tm_fill(col = "quadrant",
style = "cat",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1],
popup.vars = c("")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
For effective interpretation,we plot both the local Moran’s I values map and its corresponding p-values map next to each other.
<- qtm(hunans, "GDPPC")
gdppc
$quadrant <- quadrant
hunan.localMI<- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
colors <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
clusters
<- tm_shape(hunan.localMI) +
LISAmap tm_fill(col = "quadrant",
style = "cat",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1],
popup.vars = c("")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
tmap_arrange(gdppc, LISAmap,
asp=1, ncol=2)
We can also include the local Moran’s I map and p-value map as shown below for easy comparison.
<- qtm(hunans, "GDPPC")
gdppc
$quadrant <- quadrant
hunan.localMI<- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
colors <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
clusters
<- tm_shape(hunan.localMI) +
LISAmap tm_fill(col = "quadrant",
style = "cat",
palette = colors[c(sort(unique(quadrant)))+1],
labels = clusters[c(sort(unique(quadrant)))+1],
popup.vars = c("")) +
tm_view(set.zoom.limits = c(11,17)) +
tm_borders(alpha=0.5)
<- tm_shape(hunan.localMI) +
localMI.map tm_fill(col = "Ii",
style = "pretty",
title = "Local Moran Statistics") +
tm_borders(alpha = 0.5)
<- tm_shape(hunan.localMI) +
pvalue.map tm_fill(col = "Pr.Ii",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)
tmap_arrange(gdppc,LISAmap,localMI.map, pvalue.map)
9 Hot Spot and Cold Spot Area Analysis
Besides detecting cluster and outliers, localised spatial statistics can be also used to detect hot spot and/or cold spot areas.
The term ‘hot spot’ has been used generically across disciplines to describe a region or value that is higher relative to its surroundings (Lepers et al 2005, Aben et al 2012, Isobe et al 2015).
9.1 Getis and Ord’s G-Statistics
The Getis and Ord’s G-statistics (Getis and Ord, 1972; Ord and Getis, 1995) looks at neighbours within a defined proximity to identify where either high or low values clutser spatially. Here, statistically significant hot-spots are recognised as areas of high values where other areas within a neighbourhood range also share high values too.
The analysis consists of three steps:
- Deriving spatial weight matrix
- Computing Gi statistics
- Mapping Gi statistics
9.2 Deriving Distance-based Weight Matrix
First, we need to define a new set of neighbours. Whist the spatial autocorrelation considered units which shared borders, for Getis-Ord we are defining neighbours based on distance.
There are two type of distance-based proximity matrix, they are:
- fixed distance weight matrix; and
- adaptive distance weight matrix.
9.2.1 Deriving the centroid
We will need points to associate with each polygon before we can make our connectivity graph. We will need to run st_centroid()
and also use a mapping function. The mapping function applies a given function to each element of a vector and returns a vector of the same length.
To get the longtitude values, we map the st_centroid() function over the geometry column of hunan and access the longitude value through the double bracket notation [[]] and 1. This allows us to get only the longitude, which is the first value in each centroid.
<- map_dbl(hunans$geometry, ~st_centroid(.x)[[1]]) longitude
To get the latitude, we will use change the “1” in the double bracket notation to “2” since latitude is the second value in each centroid.
<- map_dbl(hunan$geometry, ~st_centroid(.x)[[2]]) latitude
Now that we have latitude and longitude, we use cbind() to put longitude and latitude into the same object.
<- cbind(longitude, latitude) coords
9.2.2 Determine the cut-off distance
First, we need to determine the upper limit for distance band using the steps below.
Return a matrix with the indices of points belonging to the set of the k nearest neighbours of each other using knearneigh() of spdep.
Convert the k-nearest neighbour object returned by knearneigh() into a neighbours list of class nb with a list of integer vectors containing neighbour region number ids using knn2nb().
Return the length of neighbour relationship edges using nbdists() of spdep. This function returns the Euclidean distances along the links in a list of the same form as the neighbours list. If longlat=TRUE, Great Circle distances are used.
Remove the list structure of the returned object using unlist().
<- knn2nb(knearneigh(coords))
k1 <- unlist(nbdists(k1, coords, longlat = TRUE))
k1dists summary(k1dists)
Min. 1st Qu. Median Mean 3rd Qu. Max.
24.79 32.57 38.01 39.07 44.52 61.79
The summary report shows that the largest first nearest neighbour distance is 61.79 km, so using this as the upper threshold gives certainty that all units will have at least one neighbour.
9.2.3 Computing the fixed distance weight matrix
We will now compute the distance weight matrix using dnearneigh()
and the following code chunk.
<- dnearneigh(coords, 0, 62, longlat = TRUE)
wm_d62 wm_d62
Neighbour list object:
Number of regions: 88
Number of nonzero links: 324
Percentage nonzero weights: 4.183884
Average number of links: 3.681818
From the above output, we know that there are 88 regions in Hunan and on average each region has 3.68 neighbours.
Next, nb2listw()
is used to convert the nb object into spatial weights object.
<- nb2listw(wm_d62, style = "B")
wm62_lw summary(wm62_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88
Number of nonzero links: 324
Percentage nonzero weights: 4.183884
Average number of links: 3.681818
Link number distribution:
1 2 3 4 5 6
6 15 14 26 20 7
6 least connected regions:
6 15 30 32 56 65 with 1 link
7 most connected regions:
21 28 35 45 50 52 82 with 6 links
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 88 7744 324 648 5440
9.3 Computing adaptive distance weight matrix
One of the characteristics of fixed distance weight matrix is that more densely settled areas (usually the urban areas) tend to have more neighbours and the less densely settled areas (usually the rural counties) tend to have lesser neighbours. Having many neighbours smoothens the neighbour relationship across more neighbours.
We can control the numbers of neighbours directly using k-nearest neighbours, either accepting asymmetric neighbours or imposing symmetry as shown in the code chunk below.
<- knn2nb(knearneigh(coords, k=8))
knn8 knn8
Neighbour list object:
Number of regions: 88
Number of nonzero links: 704
Percentage nonzero weights: 9.090909
Average number of links: 8
Non-symmetric neighbours list
Next, nb2listw() is used to convert the nb object into spatial weights object.
<- nb2listw(knn8, style = 'B')
knn8_lw summary(knn8_lw)
Characteristics of weights list object:
Neighbour list object:
Number of regions: 88
Number of nonzero links: 704
Percentage nonzero weights: 9.090909
Average number of links: 8
Non-symmetric neighbours list
Link number distribution:
8
88
88 least connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
88 most connected regions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 with 8 links
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 88 7744 704 1300 23014
10 Computing Gi Statistics
10.1 Gi statistics using fixed distance
<- order(hunans$County)
fips <- localG(hunans$GDPPC, wm62_lw)
gi.fixed gi.fixed
[1] 0.436075843 -0.265505650 -0.073033665 0.413017033 0.273070579
[6] -0.377510776 2.863898821 2.794350420 5.216125401 0.228236603
[11] 0.951035346 -0.536334231 0.176761556 1.195564020 -0.033020610
[16] 1.378081093 -0.585756761 -0.419680565 0.258805141 0.012056111
[21] -0.145716531 -0.027158687 -0.318615290 -0.748946051 -0.961700582
[26] -0.796851342 -1.033949773 -0.460979158 -0.885240161 -0.266671512
[31] -0.886168613 -0.855476971 -0.922143185 -1.162328599 0.735582222
[36] -0.003358489 -0.967459309 -1.259299080 -1.452256513 -1.540671121
[41] -1.395011407 -1.681505286 -1.314110709 -0.767944457 -0.192889342
[46] 2.720804542 1.809191360 -1.218469473 -0.511984469 -0.834546363
[51] -0.908179070 -1.541081516 -1.192199867 -1.075080164 -1.631075961
[56] -0.743472246 0.418842387 0.832943753 -0.710289083 -0.449718820
[61] -0.493238743 -1.083386776 0.042979051 0.008596093 0.136337469
[66] 2.203411744 2.690329952 4.453703219 -0.340842743 -0.129318589
[71] 0.737806634 -1.246912658 0.666667559 1.088613505 -0.985792573
[76] 1.233609606 -0.487196415 1.626174042 -1.060416797 0.425361422
[81] -0.837897118 -0.314565243 0.371456331 4.424392623 -0.109566928
[86] 1.364597995 -1.029658605 -0.718000620
attr(,"internals")
Gi E(Gi) V(Gi) Z(Gi) Pr(z != E(Gi))
[1,] 0.064192949 0.05747126 2.375922e-04 0.436075843 6.627817e-01
[2,] 0.042300020 0.04597701 1.917951e-04 -0.265505650 7.906200e-01
[3,] 0.044961480 0.04597701 1.933486e-04 -0.073033665 9.417793e-01
[4,] 0.039475779 0.03448276 1.461473e-04 0.413017033 6.795941e-01
[5,] 0.049767939 0.04597701 1.927263e-04 0.273070579 7.847990e-01
[6,] 0.008825335 0.01149425 4.998177e-05 -0.377510776 7.057941e-01
[7,] 0.050807266 0.02298851 9.435398e-05 2.863898821 4.184617e-03
[8,] 0.083966739 0.04597701 1.848292e-04 2.794350420 5.200409e-03
[9,] 0.115751554 0.04597701 1.789361e-04 5.216125401 1.827045e-07
[10,] 0.049115587 0.04597701 1.891013e-04 0.228236603 8.194623e-01
[11,] 0.045819180 0.03448276 1.420884e-04 0.951035346 3.415864e-01
[12,] 0.049183846 0.05747126 2.387633e-04 -0.536334231 5.917276e-01
[13,] 0.048429181 0.04597701 1.924532e-04 0.176761556 8.596957e-01
[14,] 0.034733752 0.02298851 9.651140e-05 1.195564020 2.318667e-01
[15,] 0.011262043 0.01149425 4.945294e-05 -0.033020610 9.736582e-01
[16,] 0.065131196 0.04597701 1.931870e-04 1.378081093 1.681783e-01
[17,] 0.027587075 0.03448276 1.385862e-04 -0.585756761 5.580390e-01
[18,] 0.029409313 0.03448276 1.461397e-04 -0.419680565 6.747188e-01
[19,] 0.061466754 0.05747126 2.383385e-04 0.258805141 7.957856e-01
[20,] 0.057656917 0.05747126 2.371303e-04 0.012056111 9.903808e-01
[21,] 0.066518379 0.06896552 2.820326e-04 -0.145716531 8.841452e-01
[22,] 0.045599896 0.04597701 1.928108e-04 -0.027158687 9.783332e-01
[23,] 0.030646753 0.03448276 1.449523e-04 -0.318615290 7.500183e-01
[24,] 0.035635552 0.04597701 1.906613e-04 -0.748946051 4.538897e-01
[25,] 0.032606647 0.04597701 1.932888e-04 -0.961700582 3.362000e-01
[26,] 0.035001352 0.04597701 1.897172e-04 -0.796851342 4.255374e-01
[27,] 0.012746354 0.02298851 9.812587e-05 -1.033949773 3.011596e-01
[28,] 0.061287917 0.06896552 2.773884e-04 -0.460979158 6.448136e-01
[29,] 0.014277403 0.02298851 9.683314e-05 -0.885240161 3.760271e-01
[30,] 0.009622875 0.01149425 4.924586e-05 -0.266671512 7.897221e-01
[31,] 0.014258398 0.02298851 9.705244e-05 -0.886168613 3.755267e-01
[32,] 0.005453443 0.01149425 4.986245e-05 -0.855476971 3.922871e-01
[33,] 0.043283712 0.05747126 2.367109e-04 -0.922143185 3.564539e-01
[34,] 0.020763514 0.03448276 1.393165e-04 -1.162328599 2.451020e-01
[35,] 0.081261843 0.06896552 2.794398e-04 0.735582222 4.619850e-01
[36,] 0.057419907 0.05747126 2.338437e-04 -0.003358489 9.973203e-01
[37,] 0.013497133 0.02298851 9.624821e-05 -0.967459309 3.333145e-01
[38,] 0.019289310 0.03448276 1.455643e-04 -1.259299080 2.079223e-01
[39,] 0.025996272 0.04597701 1.892938e-04 -1.452256513 1.464303e-01
[40,] 0.016092694 0.03448276 1.424776e-04 -1.540671121 1.233968e-01
[41,] 0.035952614 0.05747126 2.379439e-04 -1.395011407 1.630124e-01
[42,] 0.031690963 0.05747126 2.350604e-04 -1.681505286 9.266481e-02
[43,] 0.018750079 0.03448276 1.433314e-04 -1.314110709 1.888090e-01
[44,] 0.015449080 0.02298851 9.638666e-05 -0.767944457 4.425202e-01
[45,] 0.065760689 0.06896552 2.760533e-04 -0.192889342 8.470456e-01
[46,] 0.098966900 0.05747126 2.326002e-04 2.720804542 6.512325e-03
[47,] 0.085415780 0.05747126 2.385746e-04 1.809191360 7.042128e-02
[48,] 0.038816536 0.05747126 2.343951e-04 -1.218469473 2.230456e-01
[49,] 0.038931873 0.04597701 1.893501e-04 -0.511984469 6.086619e-01
[50,] 0.055098610 0.06896552 2.760948e-04 -0.834546363 4.039732e-01
[51,] 0.033405005 0.04597701 1.916312e-04 -0.908179070 3.637836e-01
[52,] 0.043040784 0.06896552 2.829941e-04 -1.541081516 1.232969e-01
[53,] 0.011297699 0.02298851 9.615920e-05 -1.192199867 2.331829e-01
[54,] 0.040968457 0.05747126 2.356318e-04 -1.075080164 2.823388e-01
[55,] 0.023629663 0.04597701 1.877170e-04 -1.631075961 1.028743e-01
[56,] 0.006281129 0.01149425 4.916619e-05 -0.743472246 4.571958e-01
[57,] 0.063918654 0.05747126 2.369553e-04 0.418842387 6.753313e-01
[58,] 0.070325003 0.05747126 2.381374e-04 0.832943753 4.048765e-01
[59,] 0.025947288 0.03448276 1.444058e-04 -0.710289083 4.775249e-01
[60,] 0.039752578 0.04597701 1.915656e-04 -0.449718820 6.529132e-01
[61,] 0.049934283 0.05747126 2.334965e-04 -0.493238743 6.218439e-01
[62,] 0.030964195 0.04597701 1.920248e-04 -1.083386776 2.786368e-01
[63,] 0.058129184 0.05747126 2.343319e-04 0.042979051 9.657182e-01
[64,] 0.046096514 0.04597701 1.932637e-04 0.008596093 9.931414e-01
[65,] 0.012459080 0.01149425 5.008051e-05 0.136337469 8.915545e-01
[66,] 0.091447733 0.05747126 2.377744e-04 2.203411744 2.756574e-02
[67,] 0.049575872 0.02298851 9.766513e-05 2.690329952 7.138140e-03
[68,] 0.107907212 0.04597701 1.933581e-04 4.453703219 8.440175e-06
[69,] 0.019616151 0.02298851 9.789454e-05 -0.340842743 7.332220e-01
[70,] 0.032923393 0.03448276 1.454032e-04 -0.129318589 8.971056e-01
[71,] 0.030317663 0.02298851 9.867859e-05 0.737806634 4.606320e-01
[72,] 0.019437582 0.03448276 1.455870e-04 -1.246912658 2.124295e-01
[73,] 0.055245460 0.04597701 1.932838e-04 0.666667559 5.049845e-01
[74,] 0.074278054 0.05747126 2.383538e-04 1.088613505 2.763244e-01
[75,] 0.013269580 0.02298851 9.719982e-05 -0.985792573 3.242349e-01
[76,] 0.049407829 0.03448276 1.463785e-04 1.233609606 2.173484e-01
[77,] 0.028605749 0.03448276 1.455139e-04 -0.487196415 6.261191e-01
[78,] 0.039087662 0.02298851 9.801040e-05 1.626174042 1.039126e-01
[79,] 0.031447120 0.04597701 1.877464e-04 -1.060416797 2.889550e-01
[80,] 0.064005294 0.05747126 2.359641e-04 0.425361422 6.705732e-01
[81,] 0.044606529 0.05747126 2.357330e-04 -0.837897118 4.020885e-01
[82,] 0.063700493 0.06896552 2.801427e-04 -0.314565243 7.530918e-01
[83,] 0.051142205 0.04597701 1.933560e-04 0.371456331 7.102977e-01
[84,] 0.102121112 0.04597701 1.610278e-04 4.424392623 9.671399e-06
[85,] 0.021901462 0.02298851 9.843172e-05 -0.109566928 9.127528e-01
[86,] 0.064931813 0.04597701 1.929430e-04 1.364597995 1.723794e-01
[87,] 0.031747344 0.04597701 1.909867e-04 -1.029658605 3.031703e-01
[88,] 0.015893319 0.02298851 9.765131e-05 -0.718000620 4.727569e-01
attr(,"cluster")
[1] Low Low High High High High High High High Low Low High Low Low Low
[16] High High High High Low High High Low Low High Low Low Low Low Low
[31] Low Low Low High Low Low Low Low Low Low High Low Low Low Low
[46] High High Low Low Low Low High Low Low Low Low Low High Low Low
[61] Low Low Low High High High Low High Low Low High Low High High Low
[76] High Low Low Low Low Low Low High High Low High Low Low
Levels: Low High
attr(,"gstari")
[1] FALSE
attr(,"call")
localG(x = hunans$GDPPC, listw = wm62_lw)
attr(,"class")
[1] "localG"
The output of localG()
is a vector of G or Gstar values, with attributes “gstari” set to TRUE or FALSE, “call” set to the function call, and class “localG”.
The Gi statistics is represented as a Z-score. Greater values represent a greater intensity of clustering and the direction (positive or negative) indicates high or low clusters.
Next, we will join the Gi values to their corresponding hunan sf data frame by using the code chunk below.
<- cbind(hunans, as.matrix(gi.fixed)) %>%
hunan.gi rename(gstat_fixed = as.matrix.gi.fixed.)
The code chunk above performs three tasks. 1. First, it convert the output vector (i.e. gi.fixed) into r matrix object by using as.matrix()
. 2. Next, cbind()
is used to join hunans
data frame and gi.fixed
matrix to produce a new SpatialPolygonDataFrame called hunan.gi
. Lastly, the field name of the gi values is renamed to gstat_fixed
by using rename()
.
10.2 Mapping Gi values with fixed distance weights
The code chunk below maps the Gi values derived using fixed distance weight matrix.
<- qtm(hunans, "GDPPC")
gdppc
<- tm_shape(hunan.gi) +
Gimap tm_fill(col = "gstat_fixed",
style= "pretty",
palette = "-RdBu",
title = "Local Gi") +
tm_borders(alpha = 0.5)
tmap_arrange(gdppc, Gimap, asp = 1, ncol = 2)
10.3 Gi statistics using adaptive distance
We will now compute the Gi values for GDPPC2012 using an adaptive distance weight matrix (i.e. knb_lw
).
<- order(hunans$County)
fips <- localG(hunans$GDPPC, knn8_lw)
gi.adaptive <- cbind(hunans, as.matrix(gi.adaptive)) %>%
hunan.gi rename(gstat_adaptive = as.matrix.gi.adaptive.)
10.4 Mapping Gi values with adaptive distance weights
To visualise the hot and cold spot areas, we will use tmap package to map the Gi values derived using adaptive distance weight matrix.
<- qtm(hunans, "GDPPC")
gdppc
<- tm_shape(hunan.gi) +
Gimap tm_fill(col = "gstat_adaptive",
style="pretty",
palette="-RdBu",
title = "Local Gi using adaptive distance weights") +
tm_borders(alpha = 0.5)
tmap_arrange(gdppc, Gimap, asp=1, ncol=2)