Have you ever wondered about how different or similar you are to your neighbours? Do you think any similarities are just a chance occurrence, or might there be a pattern to how people settle in neighbourhoods? Using spatial autocorrelation analysis with census data, questions like this can be answered with statistical significance attached the resulting observations.
Census data is collected every 5 years, in May, with 2016 the most recent, and the next one in 2021, and is collected and maintained by Stats Canada. [1]. This data gives regular insights into population estimation and distribution of income, housing types, eduction level, marital status, and a variety of other socio-cultural-economic variables, sampled from the Canadian public, for over 100 years. The questions were first asked in personal interview, then by paper questionaire, and is now possible by digital options, with the variety, type, and language tone of the questions fine-tuned over time in consultation with the public.
After getting our data, and performing some cleaning operations to make the data available for use, and choosing the first variable to analyze: Household Total Median Income (MdInc), we will be performing both a global and local Moran's I test to determine if any variation between each individual neighbouring polygons is dependent on the distance between them, and where they are all situated - is there any spatial autocorrelation?
Though census data contains important descriptive and quantitative statistics about the Canadian population over time, any patterns observed from the numbers will only be meaningful if appropriate statistical analysis is performed, to ensure the results are statistically significant. If we were interested in how variable the socioeconomic patterns in the Victoria Capital Regional District (CRD) are, one useful technique would be to apply Spatial Autocorrelation analysis to determine how different neighbours are from each other, which would be one way to determine if Tobler's law of geography, which states that closer things are generally more similar to each other than to things further away can be statistically proven.
This R Tutorial will describe both how to use the open-source code R statistical platform to perform a spatial autocorrelation analysis with one variable, and with a few changes, how it can be reproducible for multiple census variables, one-at-a-time, as well as produce maps and tables as outputs. One of the benefits of R is the potential for creating reproducible results. Note for Variable II: After each of the following code segments, a note will be added to identify the code segments needed to be changed as necessary for a different variable, with an example of the output available in clickable unfolding 'accordion' section as noted. Note that the code must be run through in its entirety for one variable, before changing to another variable, as unless some key variable names are changed, the same variable names get reused for a new variable.
Set the working directory to the location the data files are located, finding this by copying the computer path of the shape file location. This is how R will know where they are when the R code needs to run some data. This is also where your output tables, graphs and map image files will be saved.
dir <- "/Working/Data"
setwd(dir)
getwd()
In order to use R to do spatial autocorrelation analysis, the following packages are needed to run the code. Each package is a tool that performs different functions, and contains a bundle of open-source shareable code, documentation, and data. Links are provided to the package documentation if you'd like more information.
If any, or all, of these R packages have not been previously installed, the function install.packages("packageName") is used to install each separate package. This step only needs to be done once on a computer, though packages may need updating over time, as due to the nature of open source coding, fixes and new tools are regularly updates. If more than one package needs to be installed, this can be done with the concatenation function c() to combine all the packages into a list. Even though packages are already on your computer, they may need updating, and R may prompt with a message to restart R prior to the install - restarting may, or may not be necessary.
install.packages(c("plyr", "dplyr", "spdep", "GISTools", "raster", "maptools", "rgdal", "tmap", "BAMMtools", "shinyjs"))
After the R packages have been installed, their Libraries need to be called in order for the R code to access the package-specific functions. The libraries will need to be recalled for this code if the R console has been closed.
library("plyr")
library("dplyr")
library("spdep")
library("GISTools")
library("raster")
library("maptools")
library("rgdal")
library("tmap")
library("BAMMtools")
library("shinyjs")
Download the most recent census shape and data files from Statistics Canada (2016) [1], and save all the data files in the computer's working directory, as specified in section II.
Read in the data from the census tract boundaries shapefile, using the rgdal package - the "." is shorthand for the working directory accessed using dsn = . When using the layer = to identify the shapefile, don't include the .shp file extension. Save the data in an object named tracts, which is a spatial polygon dataframe.
tracts <- readOGR(dsn = ".", layer = "Vic_Census")
Type the name of the new data object variable tracts in the R console to produce results shown in the following console results table, including the type of data this object holds, and the number of polygon features, which is 78.
tracts
## class : SpatialPolygonsDataFrame
## features : 78
## extent : 3903547, 3974156, 1910542, 1962451 (xmin, xmax, ymin, ymax)
## crs : +proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 +lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## variables : 2
## names : CTUID, GUID
## min values : 9350001.00, 9350001
## max values : 9350180.06, 9350180.06
The data from the 2016 census data .csv file is read into R, including the .csv file extension, and assigning the newly created data frame object to a variable named census.16. The metadata file, which is included with the data, will offer insight into the abbreviated names of the 102 different dataset variable columns, when choosing a variable to analyze for spatial autocorrelation.
census.16 <- read.csv("CensusTractData.csv")
The summary statistics of the attributes for each of the variable columns of the Census Tract Data are produced using the function Summary(census.16) and includes the descriptive statistics of minimum, median, mean, maximum, and 1st and 3rd Quartiles, as well as the the number of missing NA values in each column of data, which will be removed from further analysis in step IV.
Click for Summary of Census Tract Data
## GUID CUID Pop PopCh
## Min. :9350000 Min. : 0.0 Min. : 94 Min. :-60.200
## 1st Qu.:9350102 1st Qu.:101.5 1st Qu.: 3479 1st Qu.: 1.725
## Median :9350128 Median :128.0 Median : 4689 Median : 4.100
## Mean :9350110 Mean :109.6 Mean : 9430 Mean : 4.599
## 3rd Qu.:9350154 3rd Qu.:154.0 3rd Qu.: 6156 3rd Qu.: 7.275
## Max. :9350180 Max. :180.1 Max. :367770 Max. : 44.200
## NA's :1 NA's :1
## PopD Area Age0yrs Age15yrs
## Min. : 21.1 Min. : 0.220 Min. : 25.0 Min. : 60
## 1st Qu.: 574.3 1st Qu.: 1.350 1st Qu.: 416.2 1st Qu.: 2192
## Median :1974.3 Median : 2.490 Median : 552.5 Median : 3108
## Mean :2137.9 Mean : 17.624 Mean : 1237.2 Mean : 6198
## 3rd Qu.:2877.9 3rd Qu.: 4.975 3rd Qu.: 735.0 3rd Qu.: 4121
## Max. :7582.6 Max. :696.150 Max. :48255.0 Max. :241740
## NA's :1 NA's :1 NA's :1
## Age65yrs AvgAge Detach Apart
## Min. : 10.0 Min. :29.20 Min. : 25.0 Min. : 0.0
## 1st Qu.: 622.5 1st Qu.:40.60 1st Qu.: 482.5 1st Qu.: 0.0
## Median : 930.0 Median :43.85 Median : 835.0 Median : 5.0
## Mean : 1993.8 Mean :43.65 Mean : 1647.0 Mean : 266.9
## 3rd Qu.: 1418.8 3rd Qu.:46.58 3rd Qu.: 1130.0 3rd Qu.: 85.0
## Max. :77770.0 Max. :56.50 Max. :64235.0 Max. :10400.0
## NA's :1 NA's :1 NA's :1 NA's :1
## SemiD Row Flat ApartH
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 25.0 1st Qu.: 21.25 1st Qu.: 136.2 1st Qu.: 55.0
## Median : 52.5 Median : 97.50 Median : 302.5 Median : 312.5
## Mean : 154.2 Mean : 261.99 Mean : 655.9 Mean : 1135.6
## 3rd Qu.: 108.8 3rd Qu.: 215.00 3rd Qu.: 472.5 3rd Qu.: 860.0
## Max. :6015.0 Max. :10210.00 Max. :25575.0 Max. :44280.0
## NA's :1 NA's :1 NA's :1 NA's :1
## Attach Avghhld OtherA Wchild
## Min. : 0.000 Min. :1.500 Min. : 0.0 Min. : 15.0
## 1st Qu.: 0.000 1st Qu.:2.100 1st Qu.: 482.5 1st Qu.: 428.8
## Median : 5.000 Median :2.300 Median : 1122.5 Median : 575.0
## Mean : 7.821 Mean :2.312 Mean : 2214.8 Mean : 1244.0
## 3rd Qu.: 5.000 3rd Qu.:2.575 3rd Qu.: 1757.5 3rd Qu.: 767.5
## Max. :305.000 Max. :3.700 Max. :86385.0 Max. :48515.0
## NA's :1 NA's :1 NA's :1 NA's :1
## WTChild FamSize Married ComLaw
## Min. : 5.0 Min. :2.200 Min. : 0.0 Min. : 10.0
## 1st Qu.: 401.2 1st Qu.:2.600 1st Qu.: 606.2 1st Qu.: 125.0
## Median : 612.5 Median :2.700 Median : 865.0 Median : 210.0
## Mean : 1232.4 Mean :2.685 Mean : 1778.8 Mean : 446.1
## 3rd Qu.: 827.5 3rd Qu.:2.800 3rd Qu.: 1152.5 3rd Qu.: 293.8
## Max. :48045.0 Max. :3.400 Max. :69370.0 Max. :17390.0
## NA's :1 NA's :1 NA's :1 NA's :1
## LParentF LParentM CWTChild CWChild
## Min. : 5.0 Min. : 0.0 Min. : 5.0 Min. : 10.0
## 1st Qu.: 90.0 1st Qu.: 25.0 1st Qu.: 417.5 1st Qu.: 280.0
## Median : 137.5 Median : 35.0 Median : 640.0 Median : 455.0
## Mean : 296.6 Mean : 84.3 Mean : 1295.1 Mean : 928.9
## 3rd Qu.: 202.5 3rd Qu.: 60.0 3rd Qu.: 860.0 3rd Qu.: 577.5
## Max. :11565.0 Max. :3295.0 Max. :50520.0 Max. :36235.0
## NA's :1 NA's :1 NA's :1 NA's :1
## Child1 Child2 Child3 Lparent1
## Min. : 5.0 Min. : 5.0 Min. : 0.0 Min. : 5.00
## 1st Qu.: 140.0 1st Qu.: 116.2 1st Qu.: 35.0 1st Qu.: 76.25
## Median : 190.0 Median : 192.5 Median : 55.0 Median : 120.00
## Mean : 414.9 Mean : 394.5 Mean : 120.6 Mean : 256.41
## 3rd Qu.: 263.8 3rd Qu.: 253.8 3rd Qu.: 80.0 3rd Qu.: 186.25
## Max. :16170.0 Max. :15365.0 Max. :4700.0 Max. :9995.00
## NA's :1 NA's :1 NA's :1 NA's :1
## Lparent2 Lparent3 Single Separated
## Min. : 0 Min. : 0.00 Min. : 25.0 Min. : 0.00
## 1st Qu.: 30 1st Qu.: 6.25 1st Qu.: 742.5 1st Qu.: 56.25
## Median : 40 Median : 10.00 Median : 1085.0 Median : 90.00
## Mean : 100 Mean : 24.49 Mean : 2246.7 Mean : 222.88
## 3rd Qu.: 65 3rd Qu.: 15.00 3rd Qu.: 1572.5 3rd Qu.: 158.75
## Max. :3910 Max. :950.00 Max. :87610.0 Max. :8710.00
## NA's :1 NA's :1 NA's :1 NA's :1
## Divorced Widowed MdInc MDAftTaxInc
## Min. : 0.0 Min. : 0.0 Min. :12384 Min. :12192
## 1st Qu.: 196.2 1st Qu.: 135.0 1st Qu.:34056 1st Qu.:30667
## Median : 302.5 Median : 215.0 Median :37481 Median :33380
## Mean : 671.4 Mean : 476.2 Mean :37557 Mean :33388
## 3rd Qu.: 473.8 3rd Qu.: 328.8 3rd Qu.:42240 3rd Qu.:37328
## Max. :26195.0 Max. :18565.0 Max. :57557 Max. :48939
## NA's :1 NA's :1 NA's :4 NA's :4
## GovTrans EInc MInc LEnglish
## Min. : 720 Min. :11616 Min. :11200 Min. : 90
## 1st Qu.: 3447 1st Qu.:29162 1st Qu.:31456 1st Qu.: 3329
## Median : 5792 Median :31669 Median :34422 Median : 4602
## Mean : 5454 Mean :31562 Mean :34638 Mean : 9089
## 3rd Qu.: 7248 3rd Qu.:34280 3rd Qu.:38374 3rd Qu.: 5776
## Max. :12079 Max. :52395 Max. :53632 Max. :354470
## NA's :4 NA's :4 NA's :4 NA's :1
## LFrench LAboriginal LnAboriginal IAmericas
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 346.2 1st Qu.: 0.00 1st Qu.: 521.2 1st Qu.: 85.0
## Median : 457.5 Median : 0.00 Median : 802.5 Median : 122.5
## Mean : 928.8 Mean : 17.82 Mean : 1704.1 Mean : 247.7
## 3rd Qu.: 611.2 3rd Qu.: 10.00 3rd Qu.: 1182.5 3rd Qu.: 165.0
## Max. :36220.0 Max. :715.00 Max. :66470.0 Max. :9695.0
## NA's :1 NA's :1 NA's :1 NA's :1
## IEurope IAfrica IAsia IOceania
## Min. : 0.0 Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 260.0 1st Qu.: 20.00 1st Qu.: 107.5 1st Qu.: 10.00
## Median : 365.0 Median : 35.00 Median : 240.0 Median : 15.00
## Mean : 759.1 Mean : 72.50 Mean : 567.1 Mean : 33.01
## 3rd Qu.: 535.0 3rd Qu.: 53.75 3rd Qu.: 430.0 3rd Qu.: 25.00
## Max. :29660.0 Max. :2835.00 Max. :22115.0 Max. :1305.00
## NA's :1 NA's :1 NA's :1 NA's :1
## Ifirstgen I2ndgen I3rdgen EcoIm
## Min. : 0.0 Min. : 0 Min. : 85 Min. : 0.0
## 1st Qu.: 712.5 1st Qu.: 865 1st Qu.: 1632 1st Qu.: 161.2
## Median : 1032.5 Median : 1050 Median : 2498 Median : 245.0
## Mean : 1953.3 Mean : 2115 Mean : 5104 Mean : 499.7
## 3rd Qu.: 1305.0 3rd Qu.: 1390 3rd Qu.: 3124 3rd Qu.: 353.8
## Max. :76165.0 Max. :82485 Max. :199045 Max. :19515.0
## NA's :1 NA's :1 NA's :1 NA's :1
## FamIm RefIm IstNations Metis
## Min. : 0.0 Min. : 0.00 Min. : 10.0 Min. : 0.00
## 1st Qu.: 125.0 1st Qu.: 15.00 1st Qu.: 35.0 1st Qu.: 31.25
## Median : 170.0 Median : 30.00 Median : 97.5 Median : 80.00
## Mean : 356.1 Mean : 70.96 Mean : 254.9 Mean : 167.37
## 3rd Qu.: 233.8 3rd Qu.: 55.00 3rd Qu.: 176.2 3rd Qu.: 103.75
## Max. :13910.0 Max. :2770.00 Max. :9940.0 Max. :6530.00
## NA's :1 NA's :1 NA's :1 NA's :1
## Inuk Treaty Ntreaty SAbAncest
## Min. : 0.000 Min. : 0.00 Min. : 10 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 26.25 1st Qu.: 3315 1st Qu.: 10.0
## Median : 0.000 Median : 62.50 Median : 4540 Median : 35.0
## Mean : 3.269 Mean : 191.54 Mean : 8979 Mean : 117.2
## 3rd Qu.: 0.000 3rd Qu.: 128.75 3rd Qu.: 5736 3rd Qu.: 65.0
## Max. :130.000 Max. :7490.00 Max. :350200 Max. :4560.0
## NA's :1 NA's :1 NA's :1 NA's :1
## MAbAncest SAsian Chinese Black
## Min. : 0.000 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 45.0 1st Qu.: 70.0 1st Qu.: 11.25
## Median : 0.000 Median : 82.5 Median : 155.0 Median : 35.00
## Mean : 1.859 Mean : 261.7 Mean : 418.8 Mean : 88.78
## 3rd Qu.: 0.000 3rd Qu.: 176.2 3rd Qu.: 290.0 3rd Qu.: 70.00
## Max. :80.000 Max. :10220.0 Max. :16345.0 Max. :3445.00
## NA's :1 NA's :1 NA's :1 NA's :1
## Fillipino LatinAmer Arab SEAsian
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 15.0 1st Qu.: 11.25 1st Qu.: 0.00 1st Qu.: 10.00
## Median : 60.0 Median : 25.00 Median : 10.00 Median : 22.50
## Mean : 155.8 Mean : 65.13 Mean : 37.56 Mean : 65.06
## 3rd Qu.: 120.0 3rd Qu.: 53.75 3rd Qu.: 28.75 3rd Qu.: 48.75
## Max. :6065.0 Max. :2570.00 Max. :1480.00 Max. :2540.00
## NA's :1 NA's :1 NA's :1 NA's :1
## WAsian Korean Japanese Vminority
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 15.00 1st Qu.: 0.00
## Median : 10.00 Median : 20.00 Median : 32.50 Median : 0.00
## Mean : 27.88 Mean : 53.91 Mean : 63.59 Mean : 11.35
## 3rd Qu.: 20.00 3rd Qu.: 43.75 3rd Qu.: 48.75 3rd Qu.: 10.00
## Max. :1095.00 Max. :2095.00 Max. :2485.00 Max. :470.00
## NA's :1 NA's :1 NA's :1 NA's :1
## MVisMin Owner Renter Band
## Min. : 0.00 Min. : 20.0 Min. : 0.0 Min. : 0.000
## 1st Qu.: 10.00 1st Qu.: 937.5 1st Qu.: 308.8 1st Qu.: 0.000
## Median : 20.00 Median : 1277.5 Median : 602.5 Median : 0.000
## Mean : 38.78 Mean : 2612.4 Mean : 1557.2 Mean : 2.821
## 3rd Qu.: 28.75 3rd Qu.: 1690.0 3rd Qu.: 1071.2 3rd Qu.: 0.000
## Max. :1505.00 Max. :101885.0 Max. :60745.0 Max. :105.000
## NA's :1 NA's :1 NA's :1 NA's :1
## NoDeg Secondary Psecondary Employed
## Min. : 25.0 Min. : 15.0 Min. : 25 Min. : 30
## 1st Qu.: 305.0 1st Qu.: 693.8 1st Qu.: 1841 1st Qu.: 1668
## Median : 460.0 Median : 1092.5 Median : 2412 Median : 2360
## Mean : 949.4 Mean : 2257.8 Mean : 4730 Mean : 4804
## 3rd Qu.: 613.8 3rd Qu.: 1481.2 3rd Qu.: 3091 3rd Qu.: 3131
## Max. :37030.0 Max. :88060.0 Max. :184455 Max. :187335
## NA's :1 NA's :1 NA's :1 NA's :1
## Unemp CarDrive CarPass Transit
## Min. : 10.0 Min. : 15 Min. : 0.00 Min. : 0.0
## 1st Qu.: 95.0 1st Qu.: 885 1st Qu.: 61.25 1st Qu.: 120.0
## Median : 135.0 Median : 1282 Median : 95.00 Median : 207.5
## Mean : 285.1 Mean : 2847 Mean : 210.00 Mean : 477.2
## 3rd Qu.: 197.5 3rd Qu.: 1770 3rd Qu.: 143.75 3rd Qu.: 345.0
## Max. :11125.0 Max. :111035 Max. :8180.00 Max. :18610.0
## NA's :1 NA's :1 NA's :1 NA's :1
## Walk Bike Other T15mins
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 20.0
## 1st Qu.: 60.0 1st Qu.: 60.0 1st Qu.: 30.00 1st Qu.: 396.2
## Median : 150.0 Median : 140.0 Median : 50.00 Median : 642.5
## Mean : 452.4 Mean : 288.3 Mean : 105.58 Mean : 1304.1
## 3rd Qu.: 278.8 3rd Qu.: 216.2 3rd Qu.: 73.75 3rd Qu.: 906.2
## Max. :17640.0 Max. :11245.0 Max. :4120.00 Max. :50855.0
## NA's :1 NA's :1 NA's :1 NA's :1
## T29mins T44mins T59mins Tover69mins
## Min. : 10.0 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.: 607.5 1st Qu.: 222.5 1st Qu.: 55.0 1st Qu.: 31.25
## Median : 912.5 Median : 377.5 Median : 80.0 Median : 60.00
## Mean : 1803.5 Mean : 807.2 Mean : 272.2 Mean : 194.17
## 3rd Qu.: 1266.2 3rd Qu.: 477.5 3rd Qu.: 133.8 3rd Qu.: 108.75
## Max. :70320.0 Max. :31485.0 Max. :10620.0 Max. :7555.00
## NA's :1 NA's :1 NA's :1 NA's :1
## C5am C6am C7am C8am
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 15.0
## 1st Qu.: 45.0 1st Qu.: 161.2 1st Qu.: 385.0 1st Qu.: 385.0
## Median : 75.0 Median : 297.5 Median : 592.5 Median : 532.5
## Mean : 185.3 Mean : 708.3 Mean : 1199.6 Mean : 1082.6
## 3rd Qu.: 115.0 3rd Qu.: 467.5 3rd Qu.: 836.2 3rd Qu.: 697.5
## Max. :7220.0 Max. :27635.0 Max. :46790.0 Max. :42215.0
## NA's :1 NA's :1 NA's :1 NA's :1
## C9am C12pm
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 201.2 1st Qu.: 201.2
## Median : 285.0 Median : 285.0
## Mean : 583.5 Mean : 621.5
## 3rd Qu.: 387.5 3rd Qu.: 442.5
## Max. :22740.0 Max. :24240.0
## NA's :1 NA's :1
To find the data variable names and data attributes in the boundary shape file, use the @data command.
tracts@data
Click for Census Tracts Boundary Data
## CTUID GUID
## 0 9350151.03 9350151
## 1 9350151.04 9350151
## 2 9350160.04 9350160
## 3 9350160.05 9350160
## 4 9350001.00 9350001
## 5 9350002.00 9350002
## 6 9350004.00 9350004
## 7 9350009.00 9350009
## 8 9350005.00 9350005
## 9 9350006.00 9350006
## 10 9350007.00 9350007
## 11 9350133.00 9350133
## 12 9350170.00 9350170
## 13 9350171.00 9350171
## 14 9350010.00 9350010
## 15 9350011.00 9350011
## 16 9350012.00 9350012
## 17 9350100.00 9350100
## 18 9350101.00 9350101
## 19 9350102.00 9350102
## 20 9350103.00 9350103
## 21 9350003.01 9350003
## 22 9350126.00 9350126
## 23 9350127.00 9350127
## 24 9350104.00 9350104
## 25 9350110.00 9350110
## 26 9350128.00 9350128
## 27 9350131.00 9350131
## 28 9350120.00 9350120
## 29 9350122.00 9350122
## 30 9350124.00 9350124
## 31 9350008.00 9350008
## 32 9350152.00 9350152
## 33 9350153.00 9350153
## 34 9350123.02 9350123
## 35 9350003.02 9350003
## 36 9350129.01 9350129
## 37 9350129.02 9350129
## 38 9350151.02 9350151
## 39 9350111.01 9350111
## 40 9350111.02 9350111
## 41 9350121.01 9350121
## 42 9350121.02 9350121
## 43 9350121.03 9350121
## 44 9350123.01 9350123
## 45 9350160.09 9350160
## 46 9350155.03 9350155
## 47 9350156.05 9350156
## 48 9350155.04 9350155
## 49 9350150.03 9350150
## 50 9350132.04 9350132
## 51 9350150.02 9350150
## 52 9350156.04 9350156
## 53 9350160.08 9350160
## 54 9350180.05 9350180
## 55 9350180.04 9350180
## 56 9350160.07 9350160
## 57 9350013.01 9350013
## 58 9350013.02 9350013
## 59 9350121.04 9350121
## 60 9350130.01 9350130
## 61 9350130.02 9350130
## 62 9350180.03 9350180
## 63 9350150.05 9350150
## 64 9350150.04 9350150
## 65 9350156.06 9350156
## 66 9350180.06 9350180
## 67 9350160.06 9350160
## 68 9350154.02 9350154
## 69 9350132.01 9350132
## 70 9350125.01 9350125
## 71 9350125.02 9350125
## 72 9350014.01 9350014
## 73 9350014.02 9350014
## 74 9350155.01 9350155
## 75 9350156.01 9350156
## 76 9350154.01 9350154
## 77 9350132.03 9350132
As long as both data files has the same number of rows, the common column, Geographic ID GUID is used to merge the census tract boundary shape file with the census data file. This will position the numerical census data spatially in the corresponding census tract polygon, in order to be able to map this variable in the next step. Assign this merged data to the variable name crd.data, which will be used in further code segments.
crd.data <- merge(tracts, census.16, by = "GUID")
To specify a column to use from the dataset, a $ dollar sign is placed between the data set name and the column name. In this tutorial example, the variable chosen is Median Income MdInc, which is already a normalized average value, and needs no further treatment before use in analysis.
Any missing ‘na’ values will need to be removed from the newly merged data, as it has no numerical value, and allows no calculations to be done. Summary() showed that MdInc has 4 rows with na values out of 78 polygon feature rows. The !is.na() is used to specify which data in crd.data is not na. Once the dataset has the rows containing a na value removed, the remaining data is reassigned back to the crd.data data frame. The crd.data object now has 74 polygon features to analyze.
crd.data <- crd.data[!is.na(crd.data$MdInc),]
Note for Variable II: The na values will also be needed to removed from each different variable. To choose a different column from the census data, replace the column name after the $ crd.data$NewColumnName. Also, in order to compare any numerical count census variable, like the Renter variable, between different polygons, the variable first needs to be ‘normalized’, by dividing each count of Renters in a polygon by the population of that polygon (e.g. $Renter / $Pop). A new column called $RenterNorm will be created with the % of population that is a renter in each polygon, and this new variable will now be used for the analysis. These % values can now be compared across different polygons.
A chloropleth classification map will be created using the tmap package, and in particular the tm_polygon() function is used to style the map and the legend, and add a variety of tools like a scale and a compass.
To select a colour palette to create the chloropleth map use tmaptools::palette_explorer(), after first removing the comment # from the code. The # hashtag symbol is used to make a line of code a ‘comment’, and the code will not be run without first removing the # from the beginning of the line. Leaving the # will ensure that this code won’t be run if not necessary, as the palette explorer window must be closed in order for R to continue processing the code. To generate the colour code needed for the map palette, check the Code Generator button for tmap layer function code, then copy/paste the code found under the colour palette into the tm_polygon code.
# tmaptools::palette_explorer()
The shape of the polygons comes from the merged crd.data file.
The polygons are classified by colour based on the values in the MdInc column (col="MdInc").
The title is used for the classification legend.
The classification style Jenks uses natural breaks to set up the best classification of variable values, based on the MdInc column data itself. n=6 specifies 6 different classification breaks.
A scale bar and compass can be added in specific positions, specified by the first value as RIGHT / LEFT and the second value as TOP/BOTTOM (note these positions must be capitalized).
The layout function adds a title to the map, in a specific postion, using the same terminology. To make room for north arrow and scale bar, place inner.margins around map and image boundary (bottom, left, top, right).
Note for Variable II: For the new variable, the crd.data will reflect the new data column as created in Step IV.7 after removing the na values from the new variable column. Change the following values tm_polygons(col = "NewColumnName", title= "New Title"). If comparing results from different census variables, each would need to use the same classification scheme and number of breaks.
map_CensVar <- tm_shape(crd.data) +
tm_polygons(col = "MdInc",
title = "Median 2015 $ Income",
style = "jenks",
palette = "viridis", n = 6) +
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median $ 2015 Income", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
To display the map in the R window, run the variable name map_CensVar. In order to save the map, the file object png is called using a filename to save, call the plot, then close the graphic device connection with dev.off() in order to return back to the R code console.
map_CensVar
Note for Variable II: Change the following values name of png file to be saved "newName.png".
# create png of Median Income
png("map_CensVar_MdInc.png")
map_CensVar
dev.off()
Interpreting the output of the MdInc Classification Map: The finished map shows the 2015 median household income range for the CRD census tracts. The darkest blue colours correspond to the lowest median income range of $12,384 - $12,704, and the yellow colour the highest range of $47,872 - $57,557. Some polygons of similar income ranges appear to grouped closer together in the $37,931-$42,261 and $42,262-$47,872 ranges. Looking at the autocorrelation statistics in further code segments will determine whether these groupings are significant. Note that the largest area to the south includes islands in the Juan de Fuca Strait, and is actually mostly water.
Interpreting Variable II Renter % of Population output: The finished map shows the 2016 range of the Renters % of Population in each polygon for the CRD census tracts. The darkest blue colours correspond to the lowest % of renters range of 1.8 % - 7.1 %, and the yellow colour the highest % of renters range of 35.9 % - 44.1 % . Some polygons of similar % of renters appear to grouped closer together in many of the ranges. Looking at the autocorrelation statistics in further code segments will determine whether these groupings are significant. Note again that the largest area to the south includes islands in the Juan de Fuca Strait, and is actually mostly water.
Neighbourhoods are defined by weighting schemes to determine which polygons with adjoining boundaries are considered neighbours. The Rook selects neighbouring polygons with shared boundaries in only 4 directions: top, bottom, and both sides, giving these neighbours a value of 1, and all other shared boundary neighbours a value of 0. The Queen Neighbour weighting scheme selects the same neighbours as the Rook scheme, as well as all the diagonally adjoining polygons, giving them all a value of 1.
Using the spdep package, a list of neighbours is built from polygon regions whose boundaries are continguous, using the function poly2nb().
The function nb2lines() stores spatial coordinates as line vectors, the data form that is needed to diagram the weights.
To define the neighbourhood as Rook, use queen - FALSE.
Note for Variable II: The crd.data will reflect the new data column as created in Step IV.6.ii) after removing the na values from the new variable column. The tm_layout value for title will need to be changed for the new variable.
Defining Queen Neighbours
crd.nb <- poly2nb(crd.data)
crd.net <- nb2lines(crd.nb, coords=coordinates(crd.data))
Map Queen Neighbours (default weight scheme - see figure 2a).
map_crd.net <- tm_shape(crd.data) + tm_borders(col='lightgrey') +
tm_shape(crd.net) + tm_lines(col='red') +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nQueen Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
Defining Rook Neighbours
crd.nb2 <- poly2nb(crd.data, queen = FALSE)
crd.net2 <- nb2lines(crd.nb2, coords=coordinates(crd.data))
Mapping Rook Neighbours (see figure 3a)
map_crd.net2 <- tm_shape(crd.data) + tm_borders(col='lightgrey') +
tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
Mapping Rook & Queen Neighbours (see figure 4a)
map_crd.net3 <- tm_shape(crd.data) + tm_borders(col='lightgrey') +
tm_shape(crd.net) + tm_lines(col='red', lwd = 2) +
tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook (blue) & Queen (blue & red) Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
Create Neighbourhood map for Queen's weight.
map_crd.net
Create Neighbourhood map for Rook’s weight
map_crd.net2
Create Neighbourhood map for Rook’s weight plus Queens weight
map_crd.net3
Create a png image file for the Queen Neighbourhoods (crd.net), and the Rooks Neighbourhoods (crd.net2) weighting schemes.
Note for Variable II: change the file name to reflect the new variable when saving the file.
# create png of Median Income CRD net
png("map_crd.net_MdInc.png")
map_crd.net
dev.off()
# create png of Median Income CRD net 2
png("map_crd.net2_MdInc.png")
map_crd.net2
dev.off()
# create png of Median Income CRD net 3
png("map_crd.net3_MdInc.png")
map_crd.net3
dev.off()
Interpreting the output of Rook & Queen Neighbours for MdInc: As you can see when comparing the maps of both Rook and Queen Neighbour maps, the Queen Neighbour weighing scheme includes all the Rook neighbours plus the extra diagonal polygons. When you compare the output maps for the 2 different variable, they are identical - the polygon boundaries have not changed, just the individual polygon data statistics have.
# Defining Neighbourhoods
crd.nbR <- poly2nb(crd.dataR)
crd.netR <- nb2lines(crd.nbR, coords=coordinates(crd.dataR))
Click for Variable II: Queen Neighbours Map for Renter % of Population
crd.nbR2 <- poly2nb(crd.dataR, queen = FALSE)
crd.netR2 <- nb2lines(crd.nbR2, coords=coordinates(crd.dataR))
Click for Variable II: Rook Neighbours Map for Renter % of Population
Click for Variable II: Rook & Queen Neighbours Map for Renter % of Population
Using the spdep package, the function nb2listw uses the neighbours list previously created in step VI, for the Queen and Rook Neighbours weighting scheme (we will be using the result from the Queen Neighbour weighting scheme), and adds the weight to create a new variable crd.lw, which will be used to map the Lagged Mean in section VIII.
crd.lw <- nb2listw(crd.nb, zero.policy = TRUE, style = "W")
print.listw(crd.lw, zero.policy = TRUE)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 74
## Number of nonzero links: 424
## Percentage nonzero weights: 7.742878
## Average number of links: 5.72973
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 74 5476 74 29.46931 322.5924
Interpreting the output for MdInc Neighbourhood Weights Matrix: The output summary table shows the statistics for the weights list object, using only non-zero weight values. The weights list notes 424 links, with an average of 5.73 links for each polygon. The same weight matrix will be used regardless of the variable used.
Click for Variable II: Weight Matrix for Renter % of Population
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 74
## Number of nonzero links: 424
## Percentage nonzero weights: 7.742878
## Average number of links: 5.72973
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 74 5476 74 29.46931 322.5924
Interpreting Variable II Renters % of Population Neighbourhood Weights Matrix output: As in the Defining the Neighbourhoods section, all polygons, will have the same shape and number of neighbours regardless of variable attribute used to analyze, thus the results of the Neighbourhood Weights Matrix for both variables will be the same.
The lagged means calculates the sum of the value of the weighted mean of the median income value of each of the neighbouring polygons identified in steps VI and VII, and creates a new column $IncLagMeans with these values, using the function lag.listw() from the spdep package. The weighting for each identified neighbour is done by dividing the median income value by the number of neighbours identified in each neighbourhood weighting scheme, thus a Rook's 4 neighbours would be weighted 1/4 of their median income value, and a Queen's 8 neighbours would be weighted 1/8 of their median income value.
The lagged means map visualizes the sum of the weighted value of all the identified neighbours, after the neighbours receive their proportionate weights of their polygon's median income in the previous step. The Jenks classification scheme colors are based on the values of this new column, using the tmap package.
Note for Variable II: The column name needs to be changed to the different variable name. The title for the legend and tm_layout will also need to be changed to the new variable name.
# Calculate Lag Means
crd.data$IncLagMeans = lag.listw(crd.lw, crd.data$MdInc, zero.policy = TRUE)
map_LagMean <- tm_shape(crd.data) +
tm_polygons(col = "IncLagMeans",
title = "Median 2015 $ Income\nLagged Means",
style = "jenks",
palette = "viridis", n = 6) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLagged Means", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
Note for Variable II: Change the file name to reflect the new variable when saving the file.
map_LagMean
# create png of map LagMean Median Income CRD net 2
png("map_LagMean_MdInc.png")
map_LagMean
dev.off()
Interpreting the output of Lagged Means Map for Md Inc: The largest area of same values is in the $40,203 - $42,617 range, with a 3 small isolated areas of yellow, the highest income range $42,617 - $45,824, all in areas of the most expensive housing units. The lagged means is the weighted average of the neighbouring polygons, not the polygons themselves, which is shown on the chloropleth map.
Interpreting Variable II Rental % of Population output: The lagged weighted means for Rental % of Population of the neighbours of the colored polygons show many areas of similar population proportion grouped together, which supports the high values of statistical significance for spatial autocorrelation.
Global Moran's I Tests to see if the spatial distribution of a group of polygons is any different from that of a random distribution, in order to test the autocorrelation of how different or similar the data in one polygon is to itself. This global test creates one statistical I value for every polygon as if they were all one polygon, and that all the data can be summarized by this one statistic. The Global Moran's I test uses the crd.lw variable created in the Neighbourhood Weights Matrix in section VII, to calculate the I statistic, the expected I statistic, the Variance, as well as the z value and p-value (which needs to be double to compliment the 95% confidence 2-tailed test).
A calculation is done to test first the random Global Moran's I value expected range of -0.8193252 to 1.0821054.
# moran's I test
mi <- moran.test(crd.data$MdInc, crd.lw, zero.policy = TRUE)
mi
## Moran I test under randomisation
##
## data: crd.data$MdInc
## weights: crd.lw
##
## Moran I statistic standard deviate = 2.2665, p-value = 0.01171
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.143170949 -0.013698630 0.004790316
moran.range <- function(lw) {
wmat <- listw2mat(lw)
return(range(eigen((wmat + t(wmat))/2)$values))
}
moran.range(crd.lw)
## Global Moran's I Range for MdInc
## [1] -0.8193252 1.0821054
In order to test the significance of this I value, the z value is calculated using the values of the Global Moran's I, the estimated value of Global Moran's I, and the Variable. The z value is Global Moran's I less the Estimated Global Moran's I, divided by the square root of the variance.
## calculate z value
mI <- mi$estimate[[1]]
eI <- mi$estimate[[2]]
var <- mi$estimate[[3]]
z <- (mI - eI) / sqrt(var)
These data variables are used to create a table of values.
# Add Stats data objects to data.frame ########
data.for.table1 = data.frame(mI, eI, var, z, )
data.for.table1
## mI eI var z
## 1 0.1431709 -0.01369863 0.004790316 2.266505
Interpreting the MdInc output for Global Moran's I: Deciding on a level of confidence of 95%, a z-value of 1.96 was chosen to determine if the Global Moran's I value is significantly greater than would be expected. As the calculated z-value was 2.266505 which is greater than 1.96, the Global Moran's I value for autocorrelation is significant, and this is confirmed by the 2-tailed p-value of 0.02342 which is less than 0.05 chance of error of making the wrong decision about the significance, so the distribution can be now be evaluated to determine if it is dipsersed or clustered by comparing the value of the measured Global Moran's mI value to the estimated Global Moran's eI value. As mIis > than eI, there is a positive clustered autocorrelation.
Click for Variable II: Global Moran's I Table for Renter % of Population
## Global Moran's I Range for Renter % of Population
## [1] -0.8193252 1.0821054
## mIR eIR varR z_R p-value
## 1 0.595416 -0.01369863 0.004916714 8.686834 2.2e-16
Interpreting Variable II Renters % of Population output for Global Moran's I: The z value of the Renter % of Population variable is a very high 8.686834, and the 2-tailed p-value is a very low 4.4e-16 chance of this being in error, and the mIR is less than the eIR, so the autocorrelation is negative and dispersed.
The table values for the Global Moran's I can be saved in a csv file. Note for Variable II: change the file name to reflect the new variable.
# write csv file with Global Moran's results for I, E, Var, Z, P
write.csv(data.for.table1, "moransGlobal_MdInc.csv", row.names = FALSE)
The Local Moran's i test is used for a local scale indicator of spatial association (Lisa), by calculating separate i values for every polygon in order to test how spatially autocorrelated each polygon is, if the distribution is clustered or dispersed, and to determine if there are any outliers. If the local moran's i value is positive, the polygon will have a similar value to its neighbours, and be considered clustered. If the local moran's i value is negative, the polygon will have a different value to its neighbours, and be considered dispersed.
The lisa test creates 5 different variables for local moran's i, the local moran's i, the Expected i, the Variance of i, the Z value, and a p value. These values are assigned to new columns and given new column names.
Note for Variable II: Change the new variable column name after the $, $NewColumnName.
# Lisa Test
lisa.test <- localmoran(crd.data$MdInc, crd.lw)
crd.data$Ii <- lisa.test[,1]
crd.data$E.Ii<- lisa.test[,2]
crd.data$Var.Ii<- lisa.test[,3]
crd.data$Z.Ii<- lisa.test[,4]
crd.data$P<- lisa.test[,5]
Calculating the statistical significance values of autocorrelation is done by targeting a z-value of 1.96 as the 95% significance level, where if the z-value is less than 1.96, the null hypothesis holds true that there is no difference between the measured local Moran's i value and the expected randomly distributed local Moran's i value. However if the z-value is greater than 1.96, then the alternate hypothesis holds true that with a 95% level of confidence, there is a significant spatial autocorrelation. Using a 2-tailed distribution, if the Ii value is less than the Ei value, then it would be a dispersed negative spatial autocorrelation distribution, and if the Ii value is greater than the Ei value, it would be a clustered positive spatial autocorrelation distribution. The p-value is the probability of making a wrong decision about the 95% significance level, and is calculated as 100% - 95% = 5%, so if the p-value is less than 0.05 it confirms the 95% confidence limit with only 5% or less chance of being wrong. The p-value calculated in the R lisa test is a one-tailed test, so the value would need to be doubled to be used as a reliable test for error.
# if z value is less than 95% significance level, value is FALSE
crd.data$Z.Ho <- crd.data$Z.Ii < 1.96
# if Ii < 0, value is negative
crd.data$Ii.Neg <- crd.data$Ii < 0
# P-value is one-tailed, need to multiply p-value *2 to correspond to 95% 2 -tailed if P *2>0.05 - higher chance of wrongly deciding significance
crd.data$Po <- (crd.data$P * 2) > 0.05
Add the data value variables from step X.3 to a data frame to create a new table of values for the Local Moran’s Ii. The GUID is included to make it easier to identify specific polygons of interest. The table is available to view by clicking the drop-down accordion button below.
# make a data frome from the Lisa Test Data
data.for.table2 = data.frame(crd.data$GUID, crd.data$Ii, crd.data$E.Ii, crd.data$Var.Ii, crd.data$Z.Ii, crd.data$GUID, crd.data$Ii.Neg, crd.data$Z.Ho, crd.data$P.Ho, crd.data$P)
Click for Table of Lisa Test Data for Median 2015 $ Income
## crd.data.GUID crd.data.Ii crd.data.E.Ii crd.data.Var.Ii crd.data.Z.Ii
## 1 9350151 0.2744151305 -0.01369863 0.14693811 0.751617311
## 2 9350151 0.0557309861 -0.01369863 0.22656398 0.145864407
## 3 9350160 -0.2349448907 -0.01369863 0.14693811 -0.577176596
## 4 9350160 0.3326591723 -0.01369863 0.22656398 0.727661741
## 5 9350001 0.1990156088 -0.01369863 0.17878846 0.503068148
## 6 9350002 -0.0338955771 -0.01369863 0.14693811 -0.052688823
## 7 9350004 0.0128136506 -0.01369863 0.17878846 0.062701416
## 8 9350009 0.7919282302 -0.01369863 0.17878846 1.905303636
## 9 9350005 -0.0464760893 -0.01369863 0.17878846 -0.077518533
## 10 9350006 0.4657065327 -0.01369863 0.12418786 1.360389058
## 11 9350007 0.0067879072 -0.01369863 0.10712518 0.062592570
## 12 9350133 0.2947025748 -0.01369863 0.09385420 1.006674841
## 13 9350170 -0.0923032394 -0.01369863 0.46544159 -0.115216696
## 14 9350171 -0.0285935895 -0.01369863 0.30618985 -0.026918070
## 15 9350010 0.4126788509 -0.01369863 0.09385420 1.391769798
## 16 9350011 -0.2187477999 -0.01369863 0.10712518 -0.626487255
## 17 9350012 0.4433240328 -0.01369863 0.12418786 1.296875124
## 18 9350100 1.2205874502 -0.01369863 0.30618985 2.230593514
## 19 9350101 0.4616826274 -0.01369863 0.17878846 1.124274380
## 20 9350102 0.2638695873 -0.01369863 0.12418786 0.787644346
## 21 9350103 2.1078481863 -0.01369863 0.22656398 4.457149340
## 22 9350003 -0.0124306146 -0.01369863 0.17878846 0.002998851
## 23 9350126 0.0980896658 -0.01369863 0.08323742 0.387469072
## 24 9350127 -0.0231942136 -0.01369863 0.22656398 -0.019949234
## 25 9350104 0.1047856106 -0.01369863 0.12418786 0.336218041
## 26 9350110 0.0246750827 -0.01369863 0.10712518 0.117243303
## 27 9350128 0.1900350829 -0.01369863 0.14693811 0.531490704
## 28 9350131 0.0019320024 -0.01369863 0.08323742 0.054177288
## 29 9350120 0.9284834191 -0.01369863 0.17878846 2.228255999
## 30 9350122 0.2032721811 -0.01369863 0.17878846 0.513134921
## 31 9350124 0.1259530137 -0.01369863 0.12418786 0.396283943
## 32 9350008 0.4535832667 -0.01369863 0.17878846 1.105119431
## 33 9350152 0.0760594107 -0.01369863 0.14693811 0.234156457
## 34 9350153 0.6622247407 -0.01369863 0.30618985 1.221524176
## 35 9350123 0.2429889826 -0.01369863 0.17878846 0.607064965
## 36 9350003 0.0356433844 -0.01369863 0.17878846 0.116693626
## 37 9350129 -0.0004154355 -0.01369863 0.17878846 0.031414691
## 38 9350129 -0.0193496543 -0.01369863 0.12418786 -0.016035688
## 39 9350151 0.0074249521 -0.01369863 0.17878846 0.049957170
## 40 9350111 0.1856654215 -0.01369863 0.22656398 0.418843150
## 41 9350111 0.0595949852 -0.01369863 0.12418786 0.207982392
## 42 9350121 -0.2359649540 -0.01369863 0.12418786 -0.630716351
## 43 9350121 -0.0579399062 -0.01369863 0.14693811 -0.115414512
## 44 9350121 0.1041228751 -0.01369863 0.30618985 0.212926233
## 45 9350123 0.1458425508 -0.01369863 0.14693811 0.416203354
## 46 9350160 -0.1574332175 -0.01369863 0.17878846 -0.339931605
## 47 9350155 0.0536021511 -0.01369863 0.22656398 0.141391946
## 48 9350132 0.6232149028 -0.01369863 0.14693811 1.661549369
## 49 9350150 0.5484882380 -0.01369863 0.10712518 1.717650982
## 50 9350156 -0.0440773555 -0.01369863 0.02923030 -0.177685867
## 51 9350160 -1.9458534186 -0.01369863 0.30618985 -3.491777155
## 52 9350180 -0.3484095342 -0.01369863 0.14693811 -0.873177696
## 53 9350160 -0.0252503561 -0.01369863 0.12418786 -0.032779876
## 54 9350013 1.0843288469 -0.01369863 0.14693811 2.864481232
## 55 9350013 0.2836125755 -0.01369863 0.10712518 0.908375690
## 56 9350121 0.0384246510 -0.01369863 0.22656398 0.109505596
## 57 9350130 0.0025467591 -0.01369863 0.14693811 0.042380189
## 58 9350130 -0.0346348845 -0.01369863 0.22656398 -0.043984894
## 59 9350180 -3.7243153785 -0.01369863 0.94319681 -3.820717593
## 60 9350150 0.0117550365 -0.01369863 0.09385420 0.083085168
## 61 9350150 0.2614551227 -0.01369863 0.22656398 0.578069434
## 62 9350156 0.1399328018 -0.01369863 0.46544159 0.225189161
## 63 9350180 0.4384667710 -0.01369863 0.46544159 0.662772886
## 64 9350160 -0.3612599654 -0.01369863 0.94319681 -0.357874121
## 65 9350154 0.2206921127 -0.01369863 0.14693811 0.611467288
## 66 9350132 0.4853407163 -0.01369863 0.14693811 1.301869827
## 67 9350125 0.3210323517 -0.01369863 0.22656398 0.703235000
## 68 9350125 -0.0802313849 -0.01369863 0.14693811 -0.173567448
## 69 9350014 0.3159098132 -0.01369863 0.12418786 0.935316835
## 70 9350014 0.6415313555 -0.01369863 0.12418786 1.859320199
## 71 9350155 0.4949972138 -0.01369863 0.17878846 1.203063216
## 72 9350156 -0.0146686830 -0.01369863 0.30618985 -0.001753073
## 73 9350154 0.2927142780 -0.01369863 0.09385420 1.000184696
## 74 9350132 1.0828675345 -0.01369863 0.30618985 1.981707007
## crd.data.GUID.1 crd.data.Ii.Neg crd.data.Z.Ho crd.data.P.Ho
## 1 9350151 FALSE TRUE TRUE
## 2 9350151 FALSE TRUE TRUE
## 3 9350160 TRUE TRUE TRUE
## 4 9350160 FALSE TRUE TRUE
## 5 9350001 FALSE TRUE TRUE
## 6 9350002 TRUE TRUE TRUE
## 7 9350004 FALSE TRUE TRUE
## 8 9350009 FALSE TRUE TRUE
## 9 9350005 TRUE TRUE TRUE
## 10 9350006 FALSE TRUE TRUE
## 11 9350007 FALSE TRUE TRUE
## 12 9350133 FALSE TRUE TRUE
## 13 9350170 TRUE TRUE TRUE
## 14 9350171 TRUE TRUE TRUE
## 15 9350010 FALSE TRUE TRUE
## 16 9350011 TRUE TRUE TRUE
## 17 9350012 FALSE TRUE TRUE
## 18 9350100 FALSE FALSE FALSE
## 19 9350101 FALSE TRUE TRUE
## 20 9350102 FALSE TRUE TRUE
## 21 9350103 FALSE FALSE FALSE
## 22 9350003 TRUE TRUE TRUE
## 23 9350126 FALSE TRUE TRUE
## 24 9350127 TRUE TRUE TRUE
## 25 9350104 FALSE TRUE TRUE
## 26 9350110 FALSE TRUE TRUE
## 27 9350128 FALSE TRUE TRUE
## 28 9350131 FALSE TRUE TRUE
## 29 9350120 FALSE FALSE FALSE
## 30 9350122 FALSE TRUE TRUE
## 31 9350124 FALSE TRUE TRUE
## 32 9350008 FALSE TRUE TRUE
## 33 9350152 FALSE TRUE TRUE
## 34 9350153 FALSE TRUE TRUE
## 35 9350123 FALSE TRUE TRUE
## 36 9350003 FALSE TRUE TRUE
## 37 9350129 TRUE TRUE TRUE
## 38 9350129 TRUE TRUE TRUE
## 39 9350151 FALSE TRUE TRUE
## 40 9350111 FALSE TRUE TRUE
## 41 9350111 FALSE TRUE TRUE
## 42 9350121 TRUE TRUE TRUE
## 43 9350121 TRUE TRUE TRUE
## 44 9350121 FALSE TRUE TRUE
## 45 9350123 FALSE TRUE TRUE
## 46 9350160 TRUE TRUE TRUE
## 47 9350155 FALSE TRUE TRUE
## 48 9350132 FALSE TRUE TRUE
## 49 9350150 FALSE TRUE TRUE
## 50 9350156 TRUE TRUE TRUE
## 51 9350160 TRUE TRUE TRUE
## 52 9350180 TRUE TRUE TRUE
## 53 9350160 TRUE TRUE TRUE
## 54 9350013 FALSE FALSE FALSE
## 55 9350013 FALSE TRUE TRUE
## 56 9350121 FALSE TRUE TRUE
## 57 9350130 FALSE TRUE TRUE
## 58 9350130 TRUE TRUE TRUE
## 59 9350180 TRUE TRUE TRUE
## 60 9350150 FALSE TRUE TRUE
## 61 9350150 FALSE TRUE TRUE
## 62 9350156 FALSE TRUE TRUE
## 63 9350180 FALSE TRUE TRUE
## 64 9350160 TRUE TRUE TRUE
## 65 9350154 FALSE TRUE TRUE
## 66 9350132 FALSE TRUE TRUE
## 67 9350125 FALSE TRUE TRUE
## 68 9350125 TRUE TRUE TRUE
## 69 9350014 FALSE TRUE TRUE
## 70 9350014 FALSE TRUE TRUE
## 71 9350155 FALSE TRUE TRUE
## 72 9350156 TRUE TRUE TRUE
## 73 9350154 FALSE TRUE TRUE
## 74 9350132 FALSE FALSE FALSE
## crd.data.P
## 1 2.261406e-01
## 2 4.420142e-01
## 3 7.180899e-01
## 4 2.334103e-01
## 5 3.074582e-01
## 6 5.210101e-01
## 7 4.750021e-01
## 8 2.837031e-02
## 9 5.308945e-01
## 10 8.685342e-02
## 11 4.750455e-01
## 12 1.570455e-01
## 13 5.458633e-01
## 14 5.107375e-01
## 15 8.199606e-02
## 16 7.345023e-01
## 17 9.733708e-02
## 18 1.285403e-02
## 19 1.304483e-01
## 20 2.154524e-01
## 21 4.152837e-06
## 22 4.988036e-01
## 23 3.492045e-01
## 24 5.079581e-01
## 25 3.683532e-01
## 26 4.533336e-01
## 27 2.975394e-01
## 28 4.783970e-01
## 29 1.293172e-02
## 30 3.039285e-01
## 31 3.459478e-01
## 32 1.345539e-01
## 33 4.074318e-01
## 34 1.109438e-01
## 35 2.719039e-01
## 36 4.535514e-01
## 37 4.874694e-01
## 38 5.063970e-01
## 39 4.800783e-01
## 40 3.376654e-01
## 41 4.176214e-01
## 42 7.358870e-01
## 43 5.459417e-01
## 44 4.156923e-01
## 45 3.386306e-01
## 46 6.330460e-01
## 47 4.437802e-01
## 48 4.830158e-02
## 49 4.293015e-02
## 50 5.705152e-01
## 51 9.997601e-01
## 52 8.087169e-01
## 53 5.130749e-01
## 54 2.088465e-03
## 55 1.818399e-01
## 56 4.564007e-01
## 57 4.830978e-01
## 58 5.175418e-01
## 59 9.999335e-01
## 60 4.668919e-01
## 61 2.816086e-01
## 62 4.109161e-01
## 63 2.537380e-01
## 64 6.397812e-01
## 65 2.704451e-01
## 66 9.648044e-02
## 67 2.409547e-01
## 68 5.688973e-01
## 69 1.748125e-01
## 70 3.149088e-02
## 71 1.144759e-01
## 72 5.006994e-01
## 73 1.586106e-01
## 74 2.375602e-02
Note for Variable II: change the file name to reflect the new variable when saving the table results to a csv file.
# write csv file with Lisa Test results for I, E, Var, Z, P
write.csv(data.for.table2, "LisaTest_MdInc.csv", row.names = FALSE)
Interpreting the output: If the value of Z is < 1.96, the value of the Zo null hypothesis will be TRUE, but if the value of Z is gt; 1.96, the Zo value will be FALSE, meaning that with 95% confidence there is a spatial autocorrelation with this particular polygon. If the P.Ho value is also false, the 2-tailed P value is less than 0.05 which helps confirms the Z-value statistic. To determine how the autocorrelation is distributed, if the Ii.Neg is TRUE, the distribution is negatively dispersed, but if the Ii.Ned is FALSE, the distribution is positively clustered.
The MdInc variable analysis had 5 polygons with 95% confidence levels and less than 5% chance of error that they were positively clustered autocorrelation distribution (the polygon GUID 9350100, 9350103, 9350120, 9350013, 9350132 are highlighted on the scatter plot in figure 6a, section XII).
Click for Table of Lisa Test Data for Renter % of Population
## crd.dataR.GUID crd.dataR.Ii crd.dataR.E.Ii crd.dataR.Var.Ii
## 1 9350151 0.233821626 -0.01369863 0.15045712
## 2 9350151 0.032943097 -0.01369863 0.23218304
## 3 9350160 0.549070589 -0.01369863 0.15045712
## 4 9350160 0.503692530 -0.01369863 0.23218304
## 5 9350001 0.060472308 -0.01369863 0.18314749
## 6 9350002 1.556366078 -0.01369863 0.15045712
## 7 9350004 3.391433333 -0.01369863 0.18314749
## 8 9350009 1.553256264 -0.01369863 0.18314749
## 9 9350005 1.140736503 -0.01369863 0.18314749
## 10 9350006 -0.071622535 -0.01369863 0.12710685
## 11 9350007 1.333661488 -0.01369863 0.10959416
## 12 9350133 0.360821928 -0.01369863 0.09597317
## 13 9350170 0.205149532 -0.01369863 0.47736081
## 14 9350171 0.316206463 -0.01369863 0.31390896
## 15 9350010 3.643380638 -0.01369863 0.09597317
## 16 9350011 1.379862981 -0.01369863 0.10959416
## 17 9350012 1.839610501 -0.01369863 0.12710685
## 18 9350100 0.359894929 -0.01369863 0.31390896
## 19 9350101 -0.045775803 -0.01369863 0.18314749
## 20 9350102 -0.018460853 -0.01369863 0.12710685
## 21 9350103 0.640176636 -0.01369863 0.23218304
## 22 9350003 1.897336263 -0.01369863 0.18314749
## 23 9350126 0.098397443 -0.01369863 0.08507638
## 24 9350127 -0.026062633 -0.01369863 0.23218304
## 25 9350104 0.147756323 -0.01369863 0.12710685
## 26 9350110 0.431022202 -0.01369863 0.10959416
## 27 9350128 0.356600204 -0.01369863 0.15045712
## 28 9350131 0.304032189 -0.01369863 0.08507638
## 29 9350120 0.577861864 -0.01369863 0.18314749
## 30 9350122 -0.168322406 -0.01369863 0.18314749
## 31 9350124 -0.001621923 -0.01369863 0.12710685
## 32 9350008 4.474925356 -0.01369863 0.18314749
## 33 9350152 0.064613726 -0.01369863 0.15045712
## 34 9350153 -0.904832954 -0.01369863 0.31390896
## 35 9350123 0.011019807 -0.01369863 0.18314749
## 36 9350003 3.129918431 -0.01369863 0.18314749
## 37 9350129 0.136728114 -0.01369863 0.18314749
## 38 9350129 0.199580888 -0.01369863 0.12710685
## 39 9350151 0.097877099 -0.01369863 0.18314749
## 40 9350111 0.529096109 -0.01369863 0.23218304
## 41 9350111 0.236191920 -0.01369863 0.12710685
## 42 9350121 0.226356428 -0.01369863 0.12710685
## 43 9350121 0.604421772 -0.01369863 0.15045712
## 44 9350121 0.442216620 -0.01369863 0.31390896
## 45 9350123 0.147383189 -0.01369863 0.15045712
## 46 9350160 0.713133316 -0.01369863 0.18314749
## 47 9350155 0.387150645 -0.01369863 0.23218304
## 48 9350132 0.612030700 -0.01369863 0.15045712
## 49 9350150 0.156866528 -0.01369863 0.10959416
## 50 9350156 0.234052890 -0.01369863 0.02964488
## 51 9350160 0.662880165 -0.01369863 0.31390896
## 52 9350180 0.630290771 -0.01369863 0.15045712
## 53 9350160 0.223824443 -0.01369863 0.12710685
## 54 9350013 1.838131703 -0.01369863 0.15045712
## 55 9350013 0.305859518 -0.01369863 0.10959416
## 56 9350121 0.387356307 -0.01369863 0.23218304
## 57 9350130 0.099487976 -0.01369863 0.15045712
## 58 9350130 0.046748357 -0.01369863 0.23218304
## 59 9350180 0.998762311 -0.01369863 0.96771634
## 60 9350150 0.043229883 -0.01369863 0.09597317
## 61 9350150 -0.252114651 -0.01369863 0.23218304
## 62 9350156 0.355471073 -0.01369863 0.47736081
## 63 9350180 0.645739004 -0.01369863 0.47736081
## 64 9350160 0.290854679 -0.01369863 0.96771634
## 65 9350154 0.299887489 -0.01369863 0.15045712
## 66 9350132 0.601525828 -0.01369863 0.15045712
## 67 9350125 0.811880946 -0.01369863 0.23218304
## 68 9350125 -0.037825907 -0.01369863 0.15045712
## 69 9350014 0.151580204 -0.01369863 0.12710685
## 70 9350014 0.262426181 -0.01369863 0.12710685
## 71 9350155 0.584495330 -0.01369863 0.18314749
## 72 9350156 0.157706771 -0.01369863 0.31390896
## 73 9350154 0.104039684 -0.01369863 0.09597317
## 74 9350132 0.768114071 -0.01369863 0.31390896
## crd.dataR.Z.Ii crd.dataR.GUID.1 crd.dataR.Ii.Neg crd.dataR.Z.Ho
## 1 0.63812297 9350151 FALSE TRUE
## 2 0.09679644 9350151 FALSE TRUE
## 3 1.45085485 9350160 FALSE TRUE
## 4 1.07375144 9350160 FALSE TRUE
## 5 0.17331386 9350001 FALSE TRUE
## 6 4.04772671 9350002 FALSE FALSE
## 7 7.95670883 9350004 FALSE FALSE
## 8 3.66147450 9350009 FALSE FALSE
## 9 2.69754721 9350005 FALSE FALSE
## 10 -0.16247006 9350006 TRUE TRUE
## 11 4.06995858 9350007 FALSE FALSE
## 12 1.20892885 9350133 FALSE TRUE
## 13 0.31675212 9350170 FALSE TRUE
## 14 0.58882625 9350171 FALSE TRUE
## 15 11.80482233 9350010 FALSE FALSE
## 16 4.20951901 9350011 FALSE FALSE
## 17 5.19832443 9350012 FALSE FALSE
## 18 0.66680297 9350100 FALSE TRUE
## 19 -0.07495414 9350101 TRUE TRUE
## 20 -0.01335750 9350102 TRUE TRUE
## 21 1.35699942 9350103 FALSE TRUE
## 22 4.46547986 9350003 FALSE FALSE
## 23 0.38431373 9350126 FALSE TRUE
## 24 -0.02565924 9350127 TRUE TRUE
## 25 0.45286305 9350104 FALSE TRUE
## 26 1.34336421 9350110 FALSE TRUE
## 27 0.95465395 9350128 FALSE TRUE
## 28 1.08931840 9350131 FALSE TRUE
## 29 1.38228846 9350120 FALSE TRUE
## 30 -0.36130652 9350122 TRUE TRUE
## 31 0.03387381 9350124 TRUE TRUE
## 32 10.48848459 9350008 FALSE FALSE
## 33 0.20189424 9350152 FALSE TRUE
## 34 -1.59052798 9350153 TRUE TRUE
## 35 0.05775911 9350123 FALSE TRUE
## 36 7.34563180 9350003 FALSE FALSE
## 37 0.35149939 9350129 FALSE TRUE
## 38 0.59822515 9350129 FALSE TRUE
## 39 0.26071694 9350151 FALSE TRUE
## 40 1.12647195 9350111 FALSE TRUE
## 41 0.70091499 9350111 FALSE TRUE
## 42 0.67332754 9350121 FALSE TRUE
## 43 1.59355372 9350121 FALSE TRUE
## 44 0.81373362 9350121 FALSE TRUE
## 45 0.41527918 9350123 FALSE TRUE
## 46 1.69837476 9350160 FALSE TRUE
## 47 0.83188991 9350155 FALSE TRUE
## 48 1.61317003 9350132 FALSE TRUE
## 49 0.51522464 9350150 FALSE TRUE
## 50 1.43893589 9350156 FALSE TRUE
## 51 1.20758171 9350160 FALSE TRUE
## 52 1.66024565 9350180 FALSE TRUE
## 53 0.66622560 9350160 FALSE TRUE
## 54 4.77413642 9350013 FALSE FALSE
## 55 0.96528642 9350013 FALSE TRUE
## 56 0.83231672 9350121 FALSE TRUE
## 57 0.29180227 9350130 FALSE TRUE
## 58 0.12544675 9350130 FALSE TRUE
## 59 1.02921058 9350180 FALSE TRUE
## 60 0.18376167 9350150 FALSE TRUE
## 61 -0.49478917 9350150 TRUE TRUE
## 62 0.53432153 9350156 FALSE TRUE
## 63 0.95444377 9350180 FALSE TRUE
## 64 0.30959168 9350160 FALSE TRUE
## 65 0.80844497 9350154 FALSE TRUE
## 66 1.58608780 9350132 FALSE TRUE
## 67 1.71334055 9350125 FALSE TRUE
## 68 -0.06220166 9350125 TRUE TRUE
## 69 0.46358861 9350014 FALSE TRUE
## 70 0.77449915 9350014 FALSE TRUE
## 71 1.39778875 9350155 FALSE TRUE
## 72 0.30593041 9350156 FALSE TRUE
## 73 0.38005189 9350154 FALSE TRUE
## 74 1.39540689 9350132 FALSE TRUE
## crd.dataR.P.Ho crd.dataR.P
## 1 TRUE 2.616968e-01
## 2 TRUE 4.614440e-01
## 3 TRUE 7.341014e-02
## 4 TRUE 1.414671e-01
## 5 TRUE 4.312024e-01
## 6 FALSE 2.585874e-05
## 7 FALSE 8.833791e-16
## 8 FALSE 1.253839e-04
## 9 FALSE 3.492619e-03
## 10 TRUE 5.645321e-01
## 11 FALSE 2.351075e-05
## 12 TRUE 1.133451e-01
## 13 TRUE 3.757159e-01
## 14 TRUE 2.779889e-01
## 15 FALSE 1.842843e-32
## 16 FALSE 1.279575e-05
## 17 FALSE 1.005465e-07
## 18 TRUE 2.524490e-01
## 19 TRUE 5.298744e-01
## 20 TRUE 5.053287e-01
## 21 TRUE 8.739070e-02
## 22 FALSE 3.994474e-06
## 23 TRUE 3.503730e-01
## 24 TRUE 5.102354e-01
## 25 TRUE 3.253237e-01
## 26 TRUE 8.957703e-02
## 27 TRUE 1.698764e-01
## 28 TRUE 1.380067e-01
## 29 TRUE 8.344157e-02
## 30 TRUE 6.410648e-01
## 31 TRUE 4.864889e-01
## 32 FALSE 4.879043e-26
## 33 TRUE 4.199997e-01
## 34 TRUE 9.441421e-01
## 35 TRUE 4.769703e-01
## 36 FALSE 1.023947e-13
## 37 TRUE 3.626069e-01
## 38 TRUE 2.748449e-01
## 39 TRUE 3.971554e-01
## 40 TRUE 1.299829e-01
## 41 TRUE 2.416780e-01
## 42 TRUE 2.503695e-01
## 43 TRUE 5.551801e-02
## 44 TRUE 2.078988e-01
## 45 TRUE 3.389688e-01
## 46 TRUE 4.471853e-02
## 47 TRUE 2.027355e-01
## 48 TRUE 5.335379e-02
## 49 TRUE 3.031980e-01
## 50 TRUE 7.508434e-02
## 51 TRUE 1.136041e-01
## 52 TRUE 4.843252e-02
## 53 TRUE 2.526335e-01
## 54 FALSE 9.024001e-07
## 55 TRUE 1.672007e-01
## 56 TRUE 2.026151e-01
## 57 TRUE 3.852189e-01
## 58 TRUE 4.500849e-01
## 59 TRUE 1.516904e-01
## 60 TRUE 4.271002e-01
## 61 TRUE 6.896255e-01
## 62 TRUE 2.965595e-01
## 63 TRUE 1.699295e-01
## 64 TRUE 3.784357e-01
## 65 TRUE 2.094172e-01
## 66 TRUE 5.635970e-02
## 67 TRUE 4.332495e-02
## 68 TRUE 5.247989e-01
## 69 TRUE 3.214713e-01
## 70 TRUE 2.193178e-01
## 71 TRUE 8.108826e-02
## 72 TRUE 3.798288e-01
## 73 TRUE 3.519534e-01
## 74 TRUE 8.144659e-02
Interpreting Variable II output: Renter & of population variable analysis had 12 polygons with 95% confidence levels and less than 5% chance of error that they were positively clustered autocorrelation distribution. Though only 10 are considered highly significant enough to be mapped as outliers in the scatter plot of figure 6a in section XII., with the two lowest significant z values of 2.6975 and 3.6614 not included (9350002, 9350004, 9350007, 9350010, 9350011, 9350012, 9350003, 9350008, 9350003, 9350013).
The Local Moran's i test is used to analyze the autocorrelation of each individual polygon, as opposed to the Global Moran's I which treats all polygons together as one.
Note for Variable II: Change the following values tm_polygons(col = "NewColumnName", title= "New Title") and tm_layout(title= "New Title"). For the new variable, the crd.data will reflect the new data column as created in Step IV.6 after removing the na values from the new variable column. When saving a file, the file name needs to be changed to reflect the different variable name.
# Map Lisa
map_LISA <- tm_shape(crd.data) +
tm_polygons(col = "Ii",
title = "Local Moran's i",
style = "jenks",
palette = "viridis", n = 6) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLocal Moran's i", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
# create png of Map LISA MdInc
png("map_LISA_MdInc.png")
map_LISA
dev.off()
Interpreting the output: The Local Moran's i map shows how the values are distributed through the different polygons.
Interpreting Variable II output: The Local Moran's i map shows how the values are distributed through the different polygons, and interestingly seems to have some areas of similar groupings exactly opposite to the Median Income Local Moran's i map.
The Local Moran's i Scatter Plot plots each of the separate polygon Local Moran's i value against the weighted lagged mean value of that polygons neighbours. The plot is created using the census data from the MdInc variable, and the Queen's Neighbourhood Weighted value (crd.lw), ignoring the 0 values of polygons not considered as neighbours.
Note for Variable II: When running the code for the second variable, the new column name will need to replace the item crd.data$MdInc, the xlab="New Variable Label Name", the ylab = "New Variable Label Name" and main = "New VariableName title".
# Scatter Plot
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income",
ylab="Spatially Lagged Mean Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
Note for Variable II: Change the saved file name to reflect the new variable. When running the code for the second variable, the new column name will need to replace the item crd.data$MdInc, the xlab="New Variable Label Name", the ylab = "New Variable Label Name", and main = "New VariableName title".
# create png of Moran scatter plot
png("moran.plot_MdInc.png")
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income",
ylab="Spatially Lagged Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
dev.off()
Interpreting the output: The scatter plot of the Local Moran's i values are plotted against the weighted lagged means values of the neighbouring polygons. As the weighted lagged means is a value created by the Queen Neighbourhood weighting scheme, the value will depend on how the neighbours are chosen. The centre of the dashed grid lines represent the 0 value, from where the values are placed in relation to each other. The solid trend line from the lower lelft to the upper right signifies a positive autocorrelation, with 5 highly significant outlier values portrayed by the diamond shapes, and related to the 5 points of significance found in the table of section X.3
Click for Variable II: Local Moran's i Plot for Renter % of Population
Interpreting Variable II output: The Local Moran's plot of the lagged mean of renter % of population shows a positive autocorrelation, with 10 points with significantly higher z values, between 4.06 and 11.8 (see table of values in section X.5.ii).
Mapping data classifications with chloropleth maps gives a good vizualization of how groups of polygons are distributed, however, when statistical tests are applied to this data, spatial autocorrelation patterns can be stated with 95% confidence that they are statistically significant, and might warrant further study.
R code can be reused, and modified to accomodate available datasets, and work towards producing reproducible results for specific information needs. However, if more than 2 variables were to be used, an automated variable name replacement coding scheme would be useful, but for first learning these techniques, less complication results from having identifiable variable names to work with until more familiar with how the spatial autocorrelation analysis works in R. This could be the topic of another future R tutorial, if there was enough interest.
Census data not only gives a snapshot of a sample location's household attributes, but when 95% confidence statements can be made about the clustering or dispersal of similar household income levels, or patterns of rental housing, or any other household information that can assist regional and local planners, as well as different levels of government, use this data to allocate funding for specific needs of the population.
Spatial autocorrelation analysis can help make more sense of patterns of similarities and differences in the Capital Regional District, and due to the cyclical nature of census data collection every 5 years, could produce regular studies to look at trends and changes over time.
Click for The Full Code for One Variable
## ----Set_Working_Directory-----------------
## Set working directory
dir <- "/Working/Data"
setwd(dir)
getwd()
## ----Install_Libraries---------------------------------------------------
# remove has# to run code packages need to be installed
# install.packages(c("plyr", "dplyr", "spdep", "GISTools", "raster", "maptools", "rgdal", "tmap", "BAMMtools", "shinyjs"))
## ----Load_Libraries------------------------
library("plyr")
library("dplyr")
library("spdep")
library("GISTools")
library("raster")
library("maptools")
library("rgdal")
library("tmap")
library("BAMMtools")
library("shinyjs")
## ----Clean_Data_Read_shp-------------------------------------------------
tracts <- readOGR(dsn = ".", layer = "Vic_Census")
tracts
## ----Clean_Data_Read_csv-------------------------------------------------
census.16 <- read.csv("CensusTractData.csv")
## ----Summary---------------------------------
summary(census.16)
## ----Find_common_variable---------------------------------------------
tracts@data
## ----Merge_shp_csv-------------------------------------------------------
crd.data <- merge(tracts, census.16, by = "GUID")
## ----Remove_na-----------------------------------------------
crd.data <- crd.data[!is.na(crd.data$MdInc),]
## ----summary_crddata-------------------------
class(crd.data)
summary(crd.data)
## ----Colour_Palette------------------------------------------------------
# remove has# to run code
# tmaptools::palette_explorer()
## ----Make_Chloropleth_map------------------------------------------------
map_CensVar <- tm_shape(crd.data) +
tm_polygons(col = "MdInc",
title = "Median 2015 $ Income",
style = "jenks",
palette = "viridis", n = 6) +
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
## ----Create_Map-----------------------------------------------------------
map_CensVar
# create png of Median Income
png("map_CensVar_MdInc.png")
map_CensVar
dev.off()
## ----Defining_Queen_Neighbourhoods---------------------------------------
crd.nb <- poly2nb(crd.data)
crd.net <- nb2lines(crd.nb, coords=coordinates(crd.data))
## ----Map_Queen_Neighbours-------------------------------------------------
map_crd.net <- tm_shape(crd.data) + tm_borders(col='lightgrey') +
tm_shape(crd.net) + tm_lines(col='red') +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nQueen Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
## ----Defining_Rook_Neighbourhoods----------------------------------------
crd.nb2 <- poly2nb(crd.data, queen = FALSE)
crd.net2 <- nb2lines(crd.nb2, coords=coordinates(crd.data))
## ----Mapping_Rook_Neighbours-----------------------------------------------
map_crd.net2 <- tm_shape(crd.data) + tm_borders(col='lightgrey') +
tm_shape(crd.net) + tm_lines(col='red', lwd = 2) +
tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook (blue) & Queen (blue & red) Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
## ----create_Neighbourhood_map---------------------------------------------
map_crd.net
## ----Rook_Neighbour-------------------------------------------------------
map_crd.net2
## ----Map Neighbours--------------------------------------------------------
# create png of Median Income CRD net
png("map_crd.net_MdInc.png")
map_crd.net
dev.off()
# create png of Median Income CRD net 2
png("map_crd.net2_MdInc.png")
map_crd.net2
dev.off()
## ----Weight_Matrix-------------------------------------------------------
crd.lw <- nb2listw(crd.nb, zero.policy = TRUE, style = "W")
print.listw(crd.lw, zero.policy = TRUE)
## ----Calculate_Lag_Means-------------------------------------------------
# Calculate Lag Means
crd.data$IncLagMeans = lag.listw(crd.lw, crd.data$MdInc, zero.policy = TRUE)
map_LagMean <- tm_shape(crd.data) +
tm_polygons(col = "IncLagMeans",
title = "Median 2015 $ Income\nLagged Means",
style = "jenks",
palette = "viridis", n = 6) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLagged Means", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
## ----LagMean_Output------------------------------------------------------
map_LagMean
# create png of map LagMean Median Income CRD net 2
png("map_LagMean_MdInc.png")
map_LagMean
dev.off()
## ----Global morans_I_test--------------------------------------------------
# moran's I test
mi <- moran.test(crd.data$MdInc, crd.lw, zero.policy = TRUE)
mi
moran.range <- function(lw) {
wmat <- listw2mat(lw)
return(range(eigen((wmat + t(wmat))/2)$values))
}
moran.range(crd.lw)
## ----Calculate_Z_value for Global Morans-----------------------------------
## calculate z value
mI <- mi$estimate[[1]]
eI <- mi$estimate[[2]]
var <- mi$estimate[[3]]
z <- (mI - eI) / sqrt(var)
## ----add_data_to_table_object for Global Morans---------------------------
# Add Stats data objects to data.frame ########
data.for.table1 = data.frame(mI, eI, var, z)
data.for.table1
## ----write_csv_Globalmorans----------------------------------------------
# write csv file with Global Moran's results for I, E, Var, Z, P
write.csv(data.for.table1, "moransGlobal_MdInc.csv", row.names = FALSE)
## ----Lisa_Test Local Moran's i--------------------------------------------
# Lisa Test
lisa.test <- localmoran(crd.data$MdInc, crd.lw)
crd.data$Ii <- lisa.test[,1]
crd.data$E.Ii<- lisa.test[,2]
crd.data$Var.Ii<- lisa.test[,3]
crd.data$Z.Ii<- lisa.test[,4]
crd.data$P<- lisa.test[,5]
## ----Calculate_significance_values Local Moran's i-----------------------
# if z value is less than 95% significance level, value is FALSE
crd.data$Z.Ho <- crd.data$Z.Ii < 1.96
# if Ii < 0, value is negative
crd.data$Ii.Neg <- crd.data$Ii < 0
# P-value is one-tailed, need to multiply p-value * 2 to correspond to 95% 2 -tailed if P *2>0.05 - higher chance of wrongly deciding significance
crd.data$Po <- (crd.data$P * 2) > 0.05
## ----data_frame_Localmorans----------------------------------------------
# make a data frome from the Lisa Test Data
data.for.table2 = data.frame(crd.data$Ii, crd.data$E.Ii, crd.data$Var.Ii, crd.data$Z.Ii, crd.data$Ii.Neg, crd.data$Z.Ho, crd.data$Po, crd.data$P)
## ----table2_Lisa_LocalMorans-----------------------------------------------
data.for.table2
## ----Write_csv_Localmorans-----------------------------------------------
# write csv file with Lisa Test results for I, E, Var, Z, P
write.csv(data.for.table2, "LisaTest_MdInc.csv", row.names = FALSE)
## ----Map_Lisa--------------------------------------------------------------
# Map Lisa
map_LISA <- tm_shape(crd.data) +
tm_polygons(col = "Ii",
title = "Local Moran's i",
style = "jenks",
palette = "viridis", n = 6) +
# add scale bar - first value is TOP, second value is BOTTOM
tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) +
# add compass
tm_compass(position = c("RIGHT", "TOP")) +
tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLocal Moran's i", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))
## ----png_Localmorans-------------------------------------------------------
map_LISA
# create png of Map LISA MdInc
png("map_LISA_MdInc.png")
map_LISA
dev.off()
## ---Scatter_Plot_mdInc-----------------------------------------------------
# Scatter Plot
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income",
ylab="Spatially Lagged Mean Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
## ----png_Scatter_Plot_Localmorans----------------------------------------
# create png of Moran scatter plot
png("moran.plot_MdInc.png")
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income",
ylab="Spatially Lagged Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
dev.off()
Statistics Canada. (2017). Census Profile, 2016 Census Capital, Regional district, British Columbia. https://www12.statcan.gc.ca/ ↳
Wickham, Hadley.(2016). plyr v1.8.4. https://www.rdocumentation.org/packages/plyr/versions/1.8.4 ↳
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller. (2015). dplyr v.0.8.3. https://dplyr.tidyverse.org/ ↳
RStudio. (2019). dplyr v.0.7.0. https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf ↳
Bivand, Roger, et.al. (2019). spdep v.1.1-3. https://cran.r-project.org/web/packages/spdep/index.html ↳
Bivand, Roger, et.al. (2019). Package 'spdep', v.1.1-3. https://cran.r-project.org/web/packages/spdep/spdep.pdf ↳
Brunsdon, Chris, & Hongyan Chen, (2015). Package 'GISTools', v..7-4. https://cran.r-project.org/web/packages/GISTools/GISTools.pdf ↳
Hijmans, Robert J., et.al. (2019). Package ‘raster’, 3.0-7. https://cran.r-project.org/web/packages/raster/raster.pdf ↳
Bivand, Roger, et.al. (2019). Package 'maptools', v.0.9-8. https://cran.r-project.org/web/packages/maptools/maptools.pdf ↳
Bivand, Roger, et.al. (2019). Package 'rgdal', v.1.4-6. https://cran.r-project.org/web/packages/rgdal/rgdal.pdf ↳
Tennekes, Martijn, et.al. (2019). Package 'tmap', v.2.3-1. https://cran.r-project.org/web/packages/tmap/tmap.pdf ↳
Rabosky, Dan, et.al. (2017). Package'BAMMtools', v.2.1.6. https://cran.r-project.org/web/packages/BAMMtools/BAMMtools.pdf ↳
Attali, Dean. (2016). shinyjs. https://deanattali.com/shinyjs/overview ↳
Built with R v.3.6.1 and R Markdown in RStudio v.1.1.463. Exported HTML edited in Sublime Text v.3.0-3143.