Analyzing Spatial Autocorrelation Using CRD 2016 Census Tracts Data

I. Introduction
II. Set Working Directory
III. Install Packages and Load Libraries
IV. Download, Read and Clean Data
V. Map the Data by Making a Chloropleth Map
VI. Defining the Neighbourhoods
VII. Creating a Neighbourhood Weights Matrix
VIII. Mapping the Lagged Means
IX. Global Moran's I Test
X. Local Moran's i Test (Lisa Test)
XI. Mapping the Local Moran's i
XII. Local Moran's i Scatter Plot
XIII. Summary
XIV. The Full Code
XV. References

I. Introduction

Have you ever wondered about how different or similar you are to your neighbours? Do you think any similarities are just a chance occurrence, or might there be a pattern to how people settle in neighbourhoods? Using spatial autocorrelation analysis with census data, can help answer questions like this with statistical significance attached to the results.

Census data is collected every 5 years, in May, with the most recent in 2016, and the next one in 2021, and is collected and maintained by Stats Canada. ^[1]. This data gives regular insights into population estimation and distribution of income, housing types, education level, marital status, and a variety of other socio-cultural-economic variables, sampled from the Canadian public, for over 100 years. The questions were first asked in personal interview, then by paper questionnaire, and is now possible by digital options, with the variety, type, and language tone of the questions fine-tuned over time in consultation with the public.

After gathering the census data necessary for completing this tutorial, and performing some data cleaning operations to make the data available for use, the first variable is chosen for analysis: Household Total Median Income (MdInc). Both a global and local Moran's I test will be performed to determine if any variation between each individual neighbouring polygons is dependent on the distance between them, and where they are all located - is there any spatial autocorrelation?

Though census data contains important descriptive and quantitative statistics about the Canadian population over time, any patterns observed from the numbers will only be meaningful if appropriate statistical analysis is performed, to ensure the results are statistically significant. If we were interested in how variable the socioeconomic patterns in the Victoria Capital Regional District (CRD) are, one useful technique would be to apply Spatial Autocorrelation analysis to determine how different neighbours are from each other, which would be one way to determine if Tobler's law of geography, which states that closer things are generally more similar to each other than to things further away, can be statistically provable.

This R Tutorial will describe both how to use the open-source code R statistical language environment to perform a spatial autocorrelation analysis with one variable, and with a few changes, how it can be reproducible for multiple census variables, one-at-a-time, as well as to produce maps and tables as outputs. Note for Variable II: After each of the following code segments, a note will be added to identify the code segments needed to be changed as necessary for a different variable, with an example of the output available in clickable unfolding 'accordion' section as noted. Note that the code must be run through in its entirety for one variable, before changing to another variable, as unless some key variable names are changed, the same variable names get reused for a new variable. The second variable that will be analyzed is rental households as a percentage of the population of each census tract.

ToC

II. Set Working Directory

Set the working directory to the location the data files are located, finding this by copying the computer path of the shape file location. This is how R will know where the files are when the R code needs to run some data. This is also where your output tables, graphs and map image files will be saved.

dir <- "/Working/Data"
setwd(dir)
getwd()

III. Install Packages and Load Libraries

In order to use R to do spatial autocorrelation analysis, the following packages are needed to run the code. Each package is a tool that performs different functions, and contains a bundle of open-source shareable code, documentation, and data. Links are provided to the package documentation for more information.

plyr is a tool to split, apply and combine data. ^[2]
dplyr is a tool to transform and manipulate data. ^[3]^, ^[4]
spdep uses spatial dependencies to create weighted matrix from contiguous polygons. ^[5]^, ^[6]
GISTools is a set of mapping and spatial data tools, especially for choloropleth maps and legends. ^[7]
raster is a tool to read, write, analyze and model gridded spatial data. ^[8]
maptools manipulate geographic data and spatial objects. ^[9]
rgdal transformations projections for the PROJ.4 library. ^[10]
tmap is a thematic map tool to visualize spatial data. ^[11]
BAMMtools is used for macroevolutionary analysis and visualization, and was suggested for use in this exercise, though I'm not very clear what role it plays. ^[12]
shinyjs allows javascript to be used along with the R code. ^[13]

If any, or all, of these R packages have not been previously installed, the function install.packages("packageName") is used to install each separate package. This step only needs to be done once on a computer, though packages may need updating over time, as due to the nature of open source coding, fixes and new tools have regular updates. If more than one package needs to be installed, this can be done with the concatenation function c() to combine all the packages into a list. Even though packages are already on your computer, they may need updating, and R may prompt with a message to restart R prior to the install - restarting may, or may not be necessary.

install.packages(c("plyr", "dplyr", "spdep", "GISTools", "raster", "maptools", "rgdal", "tmap", "BAMMtools", "shinyjs"))

After the R packages have been installed, their Libraries need to be called in order for the R code to access the package-specific functions. The libraries will need to be recalled with this code if the R console has been closed.

library("plyr")
library("dplyr")
library("spdep")
library("GISTools")
library("raster")
library("maptools")
library("rgdal")
library("tmap")
library("BAMMtools")
library("shinyjs")

ToC

IV. Download, Read and Clean Data

Download the most recent census shape and data files from Statistics Canada (2016) ^[1], and save all the data files in the computer's working directory, as specified in section II.

Read in the data from the census tract boundaries shapefile, using the rgdal package - the "." is shorthand for the working directory, accessed using dsn = , and when using layer = to identify the shapefile, don't include the .shp file extension. Save the data in an object named tracts, which is a spatial polygon dataframe.

tracts <- readOGR(dsn = ".", layer = "Vic_Census")

Type the name of the new data object variable tracts in the R console to produce results shown in the following console results table, including the type of data this object holds, and the number of polygon features, which is 78.

tracts

## class       : SpatialPolygonsDataFrame 
## features    : 78 
## extent      : 3903547, 3974156, 1910542, 1962451  (xmin, xmax, ymin, ymax)
## crs         : +proj=lcc +lat_1=49 +lat_2=77 +lat_0=63.390675 +lon_0=-91.86666666666666 +x_0=6200000 +y_0=3000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## variables   : 2
## names       :      CTUID,       GUID 
## min values  : 9350001.00,    9350001 
## max values  : 9350180.06, 9350180.06

The data from the 2016 census data .csv file is read into R, including the .csv file extension, and assigning the newly created data frame object to a variable named census.16. The metadata file, which is included with the data, will offer insight into the abbreviated names of the 102 different dataset variable columns, when choosing a variable to analyze for spatial autocorrelation.

census.16 <- read.csv("CensusTractData.csv")

The summary statistics of the attributes for each of the variable columns of the Census Tract Data are produced using the function Summary(census.16) and includes the descriptive statistics of minimum, median, mean, maximum, and 1st and 3rd Quartiles, as well as the the number of missing NA values in each column of data, which will be removed from further analysis in step IV.

Click for Summary of Census Tract Data

##       GUID              CUID            Pop             PopCh        
##  Min.   :9350000   Min.   :  0.0   Min.   :    94   Min.   :-60.200  
##  1st Qu.:9350102   1st Qu.:101.5   1st Qu.:  3479   1st Qu.:  1.725  
##  Median :9350128   Median :128.0   Median :  4689   Median :  4.100  
##  Mean   :9350110   Mean   :109.6   Mean   :  9430   Mean   :  4.599  
##  3rd Qu.:9350154   3rd Qu.:154.0   3rd Qu.:  6156   3rd Qu.:  7.275  
##  Max.   :9350180   Max.   :180.1   Max.   :367770   Max.   : 44.200  
##                                    NA's   :1        NA's   :1        
##       PopD             Area            Age0yrs           Age15yrs     
##  Min.   :  21.1   Min.   :  0.220   Min.   :   25.0   Min.   :    60  
##  1st Qu.: 574.3   1st Qu.:  1.350   1st Qu.:  416.2   1st Qu.:  2192  
##  Median :1974.3   Median :  2.490   Median :  552.5   Median :  3108  
##  Mean   :2137.9   Mean   : 17.624   Mean   : 1237.2   Mean   :  6198  
##  3rd Qu.:2877.9   3rd Qu.:  4.975   3rd Qu.:  735.0   3rd Qu.:  4121  
##  Max.   :7582.6   Max.   :696.150   Max.   :48255.0   Max.   :241740  
##  NA's   :1                          NA's   :1         NA's   :1       
##     Age65yrs           AvgAge          Detach            Apart        
##  Min.   :   10.0   Min.   :29.20   Min.   :   25.0   Min.   :    0.0  
##  1st Qu.:  622.5   1st Qu.:40.60   1st Qu.:  482.5   1st Qu.:    0.0  
##  Median :  930.0   Median :43.85   Median :  835.0   Median :    5.0  
##  Mean   : 1993.8   Mean   :43.65   Mean   : 1647.0   Mean   :  266.9  
##  3rd Qu.: 1418.8   3rd Qu.:46.58   3rd Qu.: 1130.0   3rd Qu.:   85.0  
##  Max.   :77770.0   Max.   :56.50   Max.   :64235.0   Max.   :10400.0  
##  NA's   :1         NA's   :1       NA's   :1         NA's   :1        
##      SemiD             Row                Flat             ApartH       
##  Min.   :   0.0   Min.   :    0.00   Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:  25.0   1st Qu.:   21.25   1st Qu.:  136.2   1st Qu.:   55.0  
##  Median :  52.5   Median :   97.50   Median :  302.5   Median :  312.5  
##  Mean   : 154.2   Mean   :  261.99   Mean   :  655.9   Mean   : 1135.6  
##  3rd Qu.: 108.8   3rd Qu.:  215.00   3rd Qu.:  472.5   3rd Qu.:  860.0  
##  Max.   :6015.0   Max.   :10210.00   Max.   :25575.0   Max.   :44280.0  
##  NA's   :1        NA's   :1          NA's   :1         NA's   :1        
##      Attach           Avghhld          OtherA            Wchild       
##  Min.   :  0.000   Min.   :1.500   Min.   :    0.0   Min.   :   15.0  
##  1st Qu.:  0.000   1st Qu.:2.100   1st Qu.:  482.5   1st Qu.:  428.8  
##  Median :  5.000   Median :2.300   Median : 1122.5   Median :  575.0  
##  Mean   :  7.821   Mean   :2.312   Mean   : 2214.8   Mean   : 1244.0  
##  3rd Qu.:  5.000   3rd Qu.:2.575   3rd Qu.: 1757.5   3rd Qu.:  767.5  
##  Max.   :305.000   Max.   :3.700   Max.   :86385.0   Max.   :48515.0  
##  NA's   :1         NA's   :1       NA's   :1         NA's   :1        
##     WTChild           FamSize         Married            ComLaw       
##  Min.   :    5.0   Min.   :2.200   Min.   :    0.0   Min.   :   10.0  
##  1st Qu.:  401.2   1st Qu.:2.600   1st Qu.:  606.2   1st Qu.:  125.0  
##  Median :  612.5   Median :2.700   Median :  865.0   Median :  210.0  
##  Mean   : 1232.4   Mean   :2.685   Mean   : 1778.8   Mean   :  446.1  
##  3rd Qu.:  827.5   3rd Qu.:2.800   3rd Qu.: 1152.5   3rd Qu.:  293.8  
##  Max.   :48045.0   Max.   :3.400   Max.   :69370.0   Max.   :17390.0  
##  NA's   :1         NA's   :1       NA's   :1         NA's   :1        
##     LParentF          LParentM         CWTChild          CWChild       
##  Min.   :    5.0   Min.   :   0.0   Min.   :    5.0   Min.   :   10.0  
##  1st Qu.:   90.0   1st Qu.:  25.0   1st Qu.:  417.5   1st Qu.:  280.0  
##  Median :  137.5   Median :  35.0   Median :  640.0   Median :  455.0  
##  Mean   :  296.6   Mean   :  84.3   Mean   : 1295.1   Mean   :  928.9  
##  3rd Qu.:  202.5   3rd Qu.:  60.0   3rd Qu.:  860.0   3rd Qu.:  577.5  
##  Max.   :11565.0   Max.   :3295.0   Max.   :50520.0   Max.   :36235.0  
##  NA's   :1         NA's   :1        NA's   :1         NA's   :1        
##      Child1            Child2            Child3          Lparent1      
##  Min.   :    5.0   Min.   :    5.0   Min.   :   0.0   Min.   :   5.00  
##  1st Qu.:  140.0   1st Qu.:  116.2   1st Qu.:  35.0   1st Qu.:  76.25  
##  Median :  190.0   Median :  192.5   Median :  55.0   Median : 120.00  
##  Mean   :  414.9   Mean   :  394.5   Mean   : 120.6   Mean   : 256.41  
##  3rd Qu.:  263.8   3rd Qu.:  253.8   3rd Qu.:  80.0   3rd Qu.: 186.25  
##  Max.   :16170.0   Max.   :15365.0   Max.   :4700.0   Max.   :9995.00  
##  NA's   :1         NA's   :1         NA's   :1        NA's   :1        
##     Lparent2       Lparent3          Single          Separated      
##  Min.   :   0   Min.   :  0.00   Min.   :   25.0   Min.   :   0.00  
##  1st Qu.:  30   1st Qu.:  6.25   1st Qu.:  742.5   1st Qu.:  56.25  
##  Median :  40   Median : 10.00   Median : 1085.0   Median :  90.00  
##  Mean   : 100   Mean   : 24.49   Mean   : 2246.7   Mean   : 222.88  
##  3rd Qu.:  65   3rd Qu.: 15.00   3rd Qu.: 1572.5   3rd Qu.: 158.75  
##  Max.   :3910   Max.   :950.00   Max.   :87610.0   Max.   :8710.00  
##  NA's   :1      NA's   :1        NA's   :1         NA's   :1        
##     Divorced          Widowed            MdInc        MDAftTaxInc   
##  Min.   :    0.0   Min.   :    0.0   Min.   :12384   Min.   :12192  
##  1st Qu.:  196.2   1st Qu.:  135.0   1st Qu.:34056   1st Qu.:30667  
##  Median :  302.5   Median :  215.0   Median :37481   Median :33380  
##  Mean   :  671.4   Mean   :  476.2   Mean   :37557   Mean   :33388  
##  3rd Qu.:  473.8   3rd Qu.:  328.8   3rd Qu.:42240   3rd Qu.:37328  
##  Max.   :26195.0   Max.   :18565.0   Max.   :57557   Max.   :48939  
##  NA's   :1         NA's   :1         NA's   :4       NA's   :4      
##     GovTrans          EInc            MInc          LEnglish     
##  Min.   :  720   Min.   :11616   Min.   :11200   Min.   :    90  
##  1st Qu.: 3447   1st Qu.:29162   1st Qu.:31456   1st Qu.:  3329  
##  Median : 5792   Median :31669   Median :34422   Median :  4602  
##  Mean   : 5454   Mean   :31562   Mean   :34638   Mean   :  9089  
##  3rd Qu.: 7248   3rd Qu.:34280   3rd Qu.:38374   3rd Qu.:  5776  
##  Max.   :12079   Max.   :52395   Max.   :53632   Max.   :354470  
##  NA's   :4       NA's   :4       NA's   :4       NA's   :1       
##     LFrench         LAboriginal      LnAboriginal       IAmericas     
##  Min.   :    0.0   Min.   :  0.00   Min.   :    0.0   Min.   :   0.0  
##  1st Qu.:  346.2   1st Qu.:  0.00   1st Qu.:  521.2   1st Qu.:  85.0  
##  Median :  457.5   Median :  0.00   Median :  802.5   Median : 122.5  
##  Mean   :  928.8   Mean   : 17.82   Mean   : 1704.1   Mean   : 247.7  
##  3rd Qu.:  611.2   3rd Qu.: 10.00   3rd Qu.: 1182.5   3rd Qu.: 165.0  
##  Max.   :36220.0   Max.   :715.00   Max.   :66470.0   Max.   :9695.0  
##  NA's   :1         NA's   :1        NA's   :1         NA's   :1       
##     IEurope           IAfrica            IAsia            IOceania      
##  Min.   :    0.0   Min.   :   0.00   Min.   :    0.0   Min.   :   0.00  
##  1st Qu.:  260.0   1st Qu.:  20.00   1st Qu.:  107.5   1st Qu.:  10.00  
##  Median :  365.0   Median :  35.00   Median :  240.0   Median :  15.00  
##  Mean   :  759.1   Mean   :  72.50   Mean   :  567.1   Mean   :  33.01  
##  3rd Qu.:  535.0   3rd Qu.:  53.75   3rd Qu.:  430.0   3rd Qu.:  25.00  
##  Max.   :29660.0   Max.   :2835.00   Max.   :22115.0   Max.   :1305.00  
##  NA's   :1         NA's   :1         NA's   :1         NA's   :1        
##    Ifirstgen          I2ndgen         I3rdgen           EcoIm        
##  Min.   :    0.0   Min.   :    0   Min.   :    85   Min.   :    0.0  
##  1st Qu.:  712.5   1st Qu.:  865   1st Qu.:  1632   1st Qu.:  161.2  
##  Median : 1032.5   Median : 1050   Median :  2498   Median :  245.0  
##  Mean   : 1953.3   Mean   : 2115   Mean   :  5104   Mean   :  499.7  
##  3rd Qu.: 1305.0   3rd Qu.: 1390   3rd Qu.:  3124   3rd Qu.:  353.8  
##  Max.   :76165.0   Max.   :82485   Max.   :199045   Max.   :19515.0  
##  NA's   :1         NA's   :1       NA's   :1        NA's   :1        
##      FamIm             RefIm           IstNations         Metis        
##  Min.   :    0.0   Min.   :   0.00   Min.   :  10.0   Min.   :   0.00  
##  1st Qu.:  125.0   1st Qu.:  15.00   1st Qu.:  35.0   1st Qu.:  31.25  
##  Median :  170.0   Median :  30.00   Median :  97.5   Median :  80.00  
##  Mean   :  356.1   Mean   :  70.96   Mean   : 254.9   Mean   : 167.37  
##  3rd Qu.:  233.8   3rd Qu.:  55.00   3rd Qu.: 176.2   3rd Qu.: 103.75  
##  Max.   :13910.0   Max.   :2770.00   Max.   :9940.0   Max.   :6530.00  
##  NA's   :1         NA's   :1         NA's   :1        NA's   :1        
##       Inuk             Treaty           Ntreaty         SAbAncest     
##  Min.   :  0.000   Min.   :   0.00   Min.   :    10   Min.   :   0.0  
##  1st Qu.:  0.000   1st Qu.:  26.25   1st Qu.:  3315   1st Qu.:  10.0  
##  Median :  0.000   Median :  62.50   Median :  4540   Median :  35.0  
##  Mean   :  3.269   Mean   : 191.54   Mean   :  8979   Mean   : 117.2  
##  3rd Qu.:  0.000   3rd Qu.: 128.75   3rd Qu.:  5736   3rd Qu.:  65.0  
##  Max.   :130.000   Max.   :7490.00   Max.   :350200   Max.   :4560.0  
##  NA's   :1         NA's   :1         NA's   :1        NA's   :1       
##    MAbAncest          SAsian           Chinese            Black        
##  Min.   : 0.000   Min.   :    0.0   Min.   :    0.0   Min.   :   0.00  
##  1st Qu.: 0.000   1st Qu.:   45.0   1st Qu.:   70.0   1st Qu.:  11.25  
##  Median : 0.000   Median :   82.5   Median :  155.0   Median :  35.00  
##  Mean   : 1.859   Mean   :  261.7   Mean   :  418.8   Mean   :  88.78  
##  3rd Qu.: 0.000   3rd Qu.:  176.2   3rd Qu.:  290.0   3rd Qu.:  70.00  
##  Max.   :80.000   Max.   :10220.0   Max.   :16345.0   Max.   :3445.00  
##  NA's   :1        NA's   :1         NA's   :1         NA's   :1        
##    Fillipino        LatinAmer            Arab            SEAsian       
##  Min.   :   0.0   Min.   :   0.00   Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:  15.0   1st Qu.:  11.25   1st Qu.:   0.00   1st Qu.:  10.00  
##  Median :  60.0   Median :  25.00   Median :  10.00   Median :  22.50  
##  Mean   : 155.8   Mean   :  65.13   Mean   :  37.56   Mean   :  65.06  
##  3rd Qu.: 120.0   3rd Qu.:  53.75   3rd Qu.:  28.75   3rd Qu.:  48.75  
##  Max.   :6065.0   Max.   :2570.00   Max.   :1480.00   Max.   :2540.00  
##  NA's   :1        NA's   :1         NA's   :1         NA's   :1        
##      WAsian            Korean           Japanese         Vminority     
##  Min.   :   0.00   Min.   :   0.00   Min.   :   0.00   Min.   :  0.00  
##  1st Qu.:   0.00   1st Qu.:   0.00   1st Qu.:  15.00   1st Qu.:  0.00  
##  Median :  10.00   Median :  20.00   Median :  32.50   Median :  0.00  
##  Mean   :  27.88   Mean   :  53.91   Mean   :  63.59   Mean   : 11.35  
##  3rd Qu.:  20.00   3rd Qu.:  43.75   3rd Qu.:  48.75   3rd Qu.: 10.00  
##  Max.   :1095.00   Max.   :2095.00   Max.   :2485.00   Max.   :470.00  
##  NA's   :1         NA's   :1         NA's   :1         NA's   :1       
##     MVisMin            Owner              Renter             Band        
##  Min.   :   0.00   Min.   :    20.0   Min.   :    0.0   Min.   :  0.000  
##  1st Qu.:  10.00   1st Qu.:   937.5   1st Qu.:  308.8   1st Qu.:  0.000  
##  Median :  20.00   Median :  1277.5   Median :  602.5   Median :  0.000  
##  Mean   :  38.78   Mean   :  2612.4   Mean   : 1557.2   Mean   :  2.821  
##  3rd Qu.:  28.75   3rd Qu.:  1690.0   3rd Qu.: 1071.2   3rd Qu.:  0.000  
##  Max.   :1505.00   Max.   :101885.0   Max.   :60745.0   Max.   :105.000  
##  NA's   :1         NA's   :1          NA's   :1         NA's   :1        
##      NoDeg           Secondary         Psecondary        Employed     
##  Min.   :   25.0   Min.   :   15.0   Min.   :    25   Min.   :    30  
##  1st Qu.:  305.0   1st Qu.:  693.8   1st Qu.:  1841   1st Qu.:  1668  
##  Median :  460.0   Median : 1092.5   Median :  2412   Median :  2360  
##  Mean   :  949.4   Mean   : 2257.8   Mean   :  4730   Mean   :  4804  
##  3rd Qu.:  613.8   3rd Qu.: 1481.2   3rd Qu.:  3091   3rd Qu.:  3131  
##  Max.   :37030.0   Max.   :88060.0   Max.   :184455   Max.   :187335  
##  NA's   :1         NA's   :1         NA's   :1        NA's   :1       
##      Unemp            CarDrive         CarPass           Transit       
##  Min.   :   10.0   Min.   :    15   Min.   :   0.00   Min.   :    0.0  
##  1st Qu.:   95.0   1st Qu.:   885   1st Qu.:  61.25   1st Qu.:  120.0  
##  Median :  135.0   Median :  1282   Median :  95.00   Median :  207.5  
##  Mean   :  285.1   Mean   :  2847   Mean   : 210.00   Mean   :  477.2  
##  3rd Qu.:  197.5   3rd Qu.:  1770   3rd Qu.: 143.75   3rd Qu.:  345.0  
##  Max.   :11125.0   Max.   :111035   Max.   :8180.00   Max.   :18610.0  
##  NA's   :1         NA's   :1        NA's   :1         NA's   :1        
##       Walk              Bike             Other            T15mins       
##  Min.   :    0.0   Min.   :    0.0   Min.   :   0.00   Min.   :   20.0  
##  1st Qu.:   60.0   1st Qu.:   60.0   1st Qu.:  30.00   1st Qu.:  396.2  
##  Median :  150.0   Median :  140.0   Median :  50.00   Median :  642.5  
##  Mean   :  452.4   Mean   :  288.3   Mean   : 105.58   Mean   : 1304.1  
##  3rd Qu.:  278.8   3rd Qu.:  216.2   3rd Qu.:  73.75   3rd Qu.:  906.2  
##  Max.   :17640.0   Max.   :11245.0   Max.   :4120.00   Max.   :50855.0  
##  NA's   :1         NA's   :1         NA's   :1         NA's   :1        
##     T29mins           T44mins           T59mins         Tover69mins     
##  Min.   :   10.0   Min.   :    0.0   Min.   :    0.0   Min.   :   0.00  
##  1st Qu.:  607.5   1st Qu.:  222.5   1st Qu.:   55.0   1st Qu.:  31.25  
##  Median :  912.5   Median :  377.5   Median :   80.0   Median :  60.00  
##  Mean   : 1803.5   Mean   :  807.2   Mean   :  272.2   Mean   : 194.17  
##  3rd Qu.: 1266.2   3rd Qu.:  477.5   3rd Qu.:  133.8   3rd Qu.: 108.75  
##  Max.   :70320.0   Max.   :31485.0   Max.   :10620.0   Max.   :7555.00  
##  NA's   :1         NA's   :1         NA's   :1         NA's   :1        
##       C5am             C6am              C7am              C8am        
##  Min.   :   0.0   Min.   :    0.0   Min.   :    0.0   Min.   :   15.0  
##  1st Qu.:  45.0   1st Qu.:  161.2   1st Qu.:  385.0   1st Qu.:  385.0  
##  Median :  75.0   Median :  297.5   Median :  592.5   Median :  532.5  
##  Mean   : 185.3   Mean   :  708.3   Mean   : 1199.6   Mean   : 1082.6  
##  3rd Qu.: 115.0   3rd Qu.:  467.5   3rd Qu.:  836.2   3rd Qu.:  697.5  
##  Max.   :7220.0   Max.   :27635.0   Max.   :46790.0   Max.   :42215.0  
##  NA's   :1        NA's   :1         NA's   :1         NA's   :1        
##       C9am             C12pm        
##  Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:  201.2   1st Qu.:  201.2  
##  Median :  285.0   Median :  285.0  
##  Mean   :  583.5   Mean   :  621.5  
##  3rd Qu.:  387.5   3rd Qu.:  442.5  
##  Max.   :22740.0   Max.   :24240.0  
##  NA's   :1         NA's   :1

To find the data variable names and data attributes in the boundary shape file, use the @data command.

tracts@data

Click for Census Tracts Boundary Data

##         CTUID    GUID
## 0  9350151.03 9350151
## 1  9350151.04 9350151
## 2  9350160.04 9350160
## 3  9350160.05 9350160
## 4  9350001.00 9350001
## 5  9350002.00 9350002
## 6  9350004.00 9350004
## 7  9350009.00 9350009
## 8  9350005.00 9350005
## 9  9350006.00 9350006
## 10 9350007.00 9350007
## 11 9350133.00 9350133
## 12 9350170.00 9350170
## 13 9350171.00 9350171
## 14 9350010.00 9350010
## 15 9350011.00 9350011
## 16 9350012.00 9350012
## 17 9350100.00 9350100
## 18 9350101.00 9350101
## 19 9350102.00 9350102
## 20 9350103.00 9350103
## 21 9350003.01 9350003
## 22 9350126.00 9350126
## 23 9350127.00 9350127
## 24 9350104.00 9350104
## 25 9350110.00 9350110
## 26 9350128.00 9350128
## 27 9350131.00 9350131
## 28 9350120.00 9350120
## 29 9350122.00 9350122
## 30 9350124.00 9350124
## 31 9350008.00 9350008
## 32 9350152.00 9350152
## 33 9350153.00 9350153
## 34 9350123.02 9350123
## 35 9350003.02 9350003
## 36 9350129.01 9350129
## 37 9350129.02 9350129
## 38 9350151.02 9350151
## 39 9350111.01 9350111
## 40 9350111.02 9350111
## 41 9350121.01 9350121
## 42 9350121.02 9350121
## 43 9350121.03 9350121
## 44 9350123.01 9350123
## 45 9350160.09 9350160
## 46 9350155.03 9350155
## 47 9350156.05 9350156
## 48 9350155.04 9350155
## 49 9350150.03 9350150
## 50 9350132.04 9350132
## 51 9350150.02 9350150
## 52 9350156.04 9350156
## 53 9350160.08 9350160
## 54 9350180.05 9350180
## 55 9350180.04 9350180
## 56 9350160.07 9350160
## 57 9350013.01 9350013
## 58 9350013.02 9350013
## 59 9350121.04 9350121
## 60 9350130.01 9350130
## 61 9350130.02 9350130
## 62 9350180.03 9350180
## 63 9350150.05 9350150
## 64 9350150.04 9350150
## 65 9350156.06 9350156
## 66 9350180.06 9350180
## 67 9350160.06 9350160
## 68 9350154.02 9350154
## 69 9350132.01 9350132
## 70 9350125.01 9350125
## 71 9350125.02 9350125
## 72 9350014.01 9350014
## 73 9350014.02 9350014
## 74 9350155.01 9350155
## 75 9350156.01 9350156
## 76 9350154.01 9350154
## 77 9350132.03 9350132

As long as both data files has the same number of rows, the common column, Geographic ID GUID is used to merge the census tract boundary shape file with the census data file. This will position the numerical census data spatially in the corresponding census tract polygon, in order to be able to map this variable in the next step. Assign this merged data to the variable name crd.data, which will be used in further code segments.

crd.data <- merge(tracts, census.16, by = "GUID")

To specify a column to use from the dataset, a $ dollar sign is placed between the data set name and the column name. In this tutorial example, the variable chosen is Median Income MdInc, based on the 2015 total income, which is already a normalized average value, and needs no further treatment before use in analysis.

Any rows with missing ‘na’ values will need to be removed from analysis of the newly merged data, as it has no numerical value, and allows no calculations to be done. Summary() showed that MdInc has 4 rows with na values out of 78 polygon feature rows. The !is.na() is used to specify which data in crd.data is not na. Once the dataset has the rows containing a na value removed, the remaining data is reassigned back to the crd.data data frame. The crd.data object now has 74 polygon features to analyze.

crd.data <- crd.data[!is.na(crd.data$MdInc),]

Note for Variable II: The na values will also be needed to removed from each different variable. To choose a different column from the census data, replace the column name after the $ crd.data$NewColumnName. Also, in order to compare any numerical count census variable, like the Renter variable, between different polygons, the variable first needs to be ‘normalized’, by dividing each count of Renters in a polygon by the population of that polygon (e.g. $Renter / $Pop). A new column called $RenterNorm will be created with the % of population that is a renter in each polygon, and this new variable will now be used for the analysis. These % values can now be compared across different polygons.

ToC

V. Map the Data by Making a Chloropleth Map

A chloropleth classification map will be created using the tmap package, and in particular the tm_polygon() function is used to style the map and the legend, and add a variety of tools like a scale and a compass.

To select a colour palette to create the chloropleth map use tmaptools::palette_explorer(), after first removing the comment # from the code. The # hashtag symbol is used to make a line of code a ‘comment’, and the code will not be run without first removing the # from the beginning of the line. Leaving the # will ensure that this code won’t be run if not necessary, as the palette explorer window must be closed in order for R to continue processing the code. To generate the colour code needed for the map palette, check the Code Generator button for tmap layer function code, then copy/paste the code found under the colour palette into the tm_polygon code.

# tmaptools::palette_explorer()

The shape of the polygons comes from the merged crd.data file.
The polygons are classified by colour based on the values in the MdInc column (col="MdInc").
The title is used for the classification legend.
The classification style Jenks uses natural breaks to set up the best classification of variable values, based on the MdInc column data itself. n=6 specifies 6 different classification breaks.
A scale bar and compass can be added in specific positions, specified by the first value as RIGHT / LEFT and the second value as TOP/BOTTOM (note these positions must be capitalized).
The layout function adds a title to the map, in a specific postion, using the same terminology. To make room for north arrow and scale bar, place inner.margins around map and image boundary (bottom, left, top, right).

Note for Variable II: For the new variable, the crd.data will reflect the new data column as created in Step IV.7 after removing the na values from the new variable column. Change the following values tm_polygons(col = "NewColumnName", title= "New Title"). If comparing results from different census variables, each would need to use the same classification scheme and number of breaks.

map_CensVar <- tm_shape(crd.data) + 
  tm_polygons(col = "MdInc", 
              title = "Median 2015 $ Income", 
              style = "jenks", 
              palette = "viridis", n = 6) + 
  
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  
  tm_compass(position = c("RIGHT", "TOP")) + 
  
  tm_layout(title = "CRD Census Tracts Median $ 2015 Income", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

To display the map in the R window, run the variable name map_CensVar. In order to save the map, the file object png is called using a filename to save, call the plot, then close the graphic device connection with dev.off() in order to return back to the R code console.

map_CensVar

$\label{fig:figs}Figure 1a: CRD Census Map of Median 2015 $ Income$

Figure 1a: CRD Census Map of Median 2015 $ Income

Note for Variable II: Change the following values name of png file to be saved "newName.png".

# create png of Median Income 
png("map_CensVar_MdInc.png")  
map_CensVar
dev.off()

Interpreting the output of the MdInc Classification Map: The finished map shows the 2015 median household income range for the CRD census tracts. The darkest blue colours correspond to the lowest median income range of $12,384 - $12,704, and the yellow colour the highest range of $47,872 - $57,557. Some polygons of similar income ranges appear to grouped closer together in the $37,931-$42,261 and $42,262-$47,872 ranges. Looking at the autocorrelation statistics in further code segments will determine whether these groupings are significant. Note that the largest area to the south includes islands in the Juan de Fuca Strait, and is actually mostly water.

Click for Variable II: Chloropleth Map for Renter % of Population

$\label{fig:figs}Figure 1b: CRD 2016 Census Map of Renter % of Population$

Figure 1b: CRD 2016 Census Map of Renter % of Population

Interpreting Variable II Renter % of Population output: The finished map shows the 2016 range of the Renters % of Population in each polygon for the CRD census tracts. The darkest blue colours correspond to the lowest % of renters range of 1.8 % - 7.1 %, and the yellow colour the highest % of renters range of 35.9 % - 44.1 % . Some polygons of similar % of renters appear to grouped closer together in many of the ranges. Looking at the autocorrelation statistics in further code segments will determine whether these groupings are significant. Note again that the largest area to the south includes islands in the Juan de Fuca Strait, and is actually mostly water.

ToC

VI. Defining the Neighbourhoods

Neighbourhoods are defined by weighting schemes to determine which polygons with adjoining boundaries are considered neighbours. The Rook scheme selects neighbouring polygons with shared boundaries in only 4 directions: top, bottom, and both sides, giving these neighbours a value of 1, and all other shared boundary neighbours a value of 0. The Queen Neighbour weighting scheme selects the same neighbours as the Rook scheme, as well as all the diagonally adjoining polygons, giving them all a value of 1.

Using the spdep package, a list of neighbours is built from polygon regions whose boundaries are continguous, using the function poly2nb().

The function nb2lines() stores spatial coordinates as line vectors, the data form that is needed to diagram the weights.

To define the neighbourhood as Rook, use queen - FALSE.

Note for Variable II: The crd.data will reflect the new data column as created in Step IV. after removing the na values from the new variable column. The tm_layout value for title will also need to be changed for the new variable.

Defining Queen Neighbours

crd.nb <- poly2nb(crd.data)
crd.net <- nb2lines(crd.nb, coords=coordinates(crd.data))

Map Queen Neighbours (default weight scheme - see figure 2a).

map_crd.net <- tm_shape(crd.data) + tm_borders(col='lightgrey') + 
  tm_shape(crd.net) + tm_lines(col='red') +
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nQueen Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

Defining Rook Neighbours

crd.nb2 <- poly2nb(crd.data, queen = FALSE)
crd.net2 <- nb2lines(crd.nb2, coords=coordinates(crd.data))

Mapping Rook Neighbours (see figure 3a)

map_crd.net2 <- tm_shape(crd.data) + tm_borders(col='lightgrey') + 
  tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) + 
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

Mapping Rook & Queen Neighbours (see figure 4a)

map_crd.net3 <- tm_shape(crd.data) + tm_borders(col='lightgrey') + 
  tm_shape(crd.net) + tm_lines(col='red', lwd = 2) +
  tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) + 
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook (blue) & Queen (blue & red) Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

Create Neighbourhood map for Queen's weight.

map_crd.net

$\label{fig:figs}Figure 2a: CRD Census Tracts Median 2015 $ Income Map of Queen Neighbours$

Figure 2a: CRD Census Tracts Median 2015 $ Income Map of Queen Neighbours

Create Neighbourhood map for Rook’s weight

map_crd.net2

$\label{fig:figs}Figure 3a: CRD Census Tracts Median 2015 $ Income Map of Rook Neighbours$

Figure 3a: CRD Census Tracts Median 2015 $ Income Map of Rook Neighbours

Create Neighbourhood map for Rook’s weight plus Queens weight

map_crd.net3

$\label{fig:figs}Figure 4a: CRD Census Tracts Median 2015 $ Income Map of Rook & Queen Neighbours$

Figure 4a: CRD Census Tracts Median 2015 $ Income Map of Rook & Queen Neighbours

Create a png image file for the Queen Neighbourhoods (crd.net), and the Rooks Neighbourhoods (crd.net2) weighting schemes.

Note for Variable II: change the file name to reflect the new variable when saving the file.

# create png of Median Income CRD net
png("map_crd.net_MdInc.png")  
map_crd.net
dev.off()

# create png of Median Income CRD net 2
png("map_crd.net2_MdInc.png")  
map_crd.net2
dev.off()

# create png of Median Income CRD net 3
png("map_crd.net3_MdInc.png")  
map_crd.net3
dev.off()

Interpreting the output of Rook & Queen Neighbours for MdInc: As you can see when comparing the maps of both Rook and Queen Neighbour maps, the Queen Neighbour weighing scheme includes all the Rook neighbours plus the extra diagonal polygons. When you compare the output maps for the 2 different variable, they are identical - the polygon boundaries have not changed, just the individual polygon data statistics have.

# Defining Neighbourhoods
crd.nbR <- poly2nb(crd.dataR)
crd.netR <- nb2lines(crd.nbR, coords=coordinates(crd.dataR))

Click for Variable II: Queen Neighbours Map for Renter % of Population

$\label{fig:figs}Figure 2b: CRD 2016 Census Tracts Renter % of Population Map of Queen Neighbours$

Figure 2b: CRD 2016 Census Tracts Renter % of Population Map of Queen Neighbours

crd.nbR2 <- poly2nb(crd.dataR, queen = FALSE)
crd.netR2 <- nb2lines(crd.nbR2, coords=coordinates(crd.dataR))

Click for Variable II: Rook Neighbours Map for Renter % of Population

$\label{fig:figs}Figure 4b: CRD 2016 Census Tracts Renter % of Population Map of Rook Neighbours$

Figure 4b: CRD 2016 Census Tracts Renter % of Population Map of Rook Neighbours

Click for Variable II: Rook & Queen Neighbours Map for Renter % of Population

$\label{fig:figs}Figure 3c: CRD 2016 Census Tracts Renter % of Population Map of Rook & Queen Neighbours$

Figure 3c: CRD 2016 Census Tracts Renter % of Population Map of Rook & Queen Neighbours

ToC

VII. Creating a Neighbourhood Weights Matrix

Using the spdep package, the function nb2listw uses the neighbours list crd.nb previously created in step VI, for the Queen and Rook Neighbours weighting scheme (we will be using the result from the Queen Neighbour weighting scheme), and adds the weight to create a new variable crd.lw, which will be used to map the Lagged Mean in section VIII.

crd.lw <- nb2listw(crd.nb, zero.policy = TRUE, style = "W")
print.listw(crd.lw, zero.policy = TRUE)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 74 
## Number of nonzero links: 424 
## Percentage nonzero weights: 7.742878 
## Average number of links: 5.72973 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 74 5476 74 29.46931 322.5924

Interpreting the output for MdInc Neighbourhood Weights Matrix: The output summary table shows the statistics for the weights list object, using only non-zero weight values. The weights list notes 424 links, with an average of 5.73 links for each polygon. The same weight matrix will be used regardless of the variable used.

Click for Variable II: Weight Matrix for Renter % of Population

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 74 
## Number of nonzero links: 424 
## Percentage nonzero weights: 7.742878 
## Average number of links: 5.72973 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 74 5476 74 29.46931 322.5924

Interpreting Variable II Renters % of Population Neighbourhood Weights Matrix output: As in the Defining the Neighbourhoods section, all polygons, will have the same shape and number of neighbours regardless of variable attribute used to analyze, thus the results of the Neighbourhood Weights Matrix for both variables will be the same.

ToC

VIII. Mapping the Lagged Means

The lagged means calculates the sum of the value of the weighted mean of the median income value of each of the neighbouring polygons identified in steps VI and VII, and creates a new column $IncLagMeans with these values, using the function lag.listw() from the spdep package. The weighting for each identified neighbour is done by dividing the median income value by the number of neighbours identified in each neighbourhood weighting scheme, thus a Rook's 4 neighbours would be weighted 1/4 of their median income value, and a Queen's 8 neighbours would be weighted 1/8 of their median income value.

The lagged means map visualizes the sum of the weighted value of all the identified neighbours, after the neighbours receive their proportionate weights of their polygon's median income in the previous step. The Jenks classification scheme colors are based on the values of this new column, using the tmap package.

Note for Variable II: The column name needs to be changed to the different variable name. The title for the legend and tm_layout will also need to be changed to the new variable name.

# Calculate Lag Means
crd.data$IncLagMeans = lag.listw(crd.lw, crd.data$MdInc, zero.policy = TRUE)

map_LagMean <- tm_shape(crd.data) + 
  tm_polygons(col = "IncLagMeans", 
              title = "Median 2015 $ Income\nLagged Means", 
              style = "jenks", 
              palette = "viridis", n = 6) +
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLagged Means", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

map_LagMean

$\label{fig:figs}Figure 4a: CRD Census Map of Median 2015 $ Income Lagged Means$

Figure 4a: CRD Census Map of Median 2015 $ Income Lagged Means

# create png of map LagMean Median Income CRD net 2
png("map_LagMean_MdInc.png")  
map_LagMean
dev.off()

Interpreting the output of Lagged Means Map for Md Inc: The largest area of same values is in the $40,203 - $42,617 range, with a 3 small isolated areas of yellow, the highest income range $42,617 - $45,824, all in areas of the most expensive housing units. The lagged means is the weighted average of the neighbouring polygons, not the polygons themselves, which is shown on the chloropleth map.

Click for Variable II: Lagged Means Map for Renter % of Population

$\label{fig:figs}Figure 4b: CRD Census Map of Renter % Population Lagged Means$

Figure 4b: CRD Census Map of Renter % Population Lagged Means

Interpreting Variable II Rental % of Population output: The lagged weighted means for Rental % of Population of the neighbours of the colored polygons show many areas of similar population proportion grouped together, which supports the high values of statistical significance for spatial autocorrelation.

ToC

IX. Global Moran's I Test

Global Moran's I Tests to see if the spatial distribution of a group of polygons is any different from that of a random distribution, in order to test the autocorrelation of how different or similar the data in one polygon is to itself. This global test creates one statistical I value for every polygon as if they were all one polygon, and all the data is summarized by this one statistic. The Global Moran's I test uses the crd.lw variable created in the Neighbourhood Weights Matrix in section VII, to calculate the I statistic, the expected I statistic, the Variance, as well as the z value and p-value (which needs to be double to compliment the 95% confidence 2-tailed test).

A calculation is done to test first the random Global Moran's I value expected range of -0.8193252 to 1.0821054.

# moran's I test
mi <- moran.test(crd.data$MdInc, crd.lw, zero.policy = TRUE)
mi

##  Moran I test under randomisation
## 
## data:  crd.data$MdInc  
## weights: crd.lw    
## 
## Moran I statistic standard deviate = 2.2665, p-value = 0.01171
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.143170949      -0.013698630       0.004790316

moran.range <- function(lw) {
  wmat <- listw2mat(lw)
  return(range(eigen((wmat + t(wmat))/2)$values))
}
moran.range(crd.lw)

## Global Moran's I Range for MdInc
## [1] -0.8193252  1.0821054

In order to test the significance of this I value, the z value is calculated using the values of the Global Moran's I, the estimated value of Global Moran's I, and the Variable. The z value is Global Moran's I less the Estimated Global Moran's I, divided by the square root of the variance.

## calculate z value
mI <- mi$estimate[[1]]
eI <- mi$estimate[[2]]
var <- mi$estimate[[3]]
z <- (mI - eI) / sqrt(var)

These data variables are used to create a table of values.

# Add Stats data objects to data.frame ########
data.for.table1 = data.frame(mI, eI, var, z, )
data.for.table1

##          mI          eI         var        z
## 1 0.1431709 -0.01369863 0.004790316 2.266505

Interpreting the MdInc output for Global Moran's I: Deciding on a level of confidence of 95%, a z-value of 1.96 was chosen to determine if the Global Moran's I value is significantly greater than would be expected. As the calculated z-value was 2.266505 which is greater than 1.96, the Global Moran's I value for autocorrelation is significant, and this is confirmed by the 2-tailed p-value of 0.02342 which is less than 0.05 chance of error of making the wrong decision about the significance, so the distribution can be now be evaluated to determine if it is dipsersed or clustered by comparing the value of the measured Global Moran's mI value to the estimated Global Moran's eI value. As mIis > than eI, there is a positive clustered autocorrelation.

Click for Variable II: Global Moran's I Table for Renter % of Population

## Global Moran's I Range for Renter % of Population
## [1] -0.8193252  1.0821054

##        mIR         eIR        varR      z_R   p-value
## 1 0.595416 -0.01369863 0.004916714 8.686834   2.2e-16

Interpreting Variable II Renters % of Population output for Global Moran's I: The z value of the Renter % of Population variable is a very high 8.686834, and the 2-tailed p-value is a very low 4.4e-16 chance of this being in error, and the mIR is less than the eIR, so the autocorrelation is negative and dispersed.

The table values for the Global Moran's I can be saved in a csv file. Note for Variable II: change the file name to reflect the new variable.

# write csv file with Global Moran's results for I, E, Var, Z, P
write.csv(data.for.table1, "moransGlobal_MdInc.csv", row.names = FALSE)

ToC

X. Local Moran's i Test (Lisa Test)

The Local Moran's i test is used for a local scale indicator of spatial association (Lisa), by calculating separate i values for every polygon in order to test how spatially autocorrelated each polygon is, if the distribution is clustered or dispersed, and to determine if there are any outliers. If the local moran's i value is positive, the polygon will have a similar value to its neighbours, and be considered clustered. If the local moran's i value is negative, the polygon will have a different value to its neighbours, and be considered dispersed.

The lisa test creates 5 different variables for local moran's i, the local moran's i, the Expected i, the Variance of i, the Z value, and a p value. These values are assigned to new columns and given new column names.

Note for Variable II: Change the new variable column name after the $, $NewColumnName.

# Lisa Test
lisa.test <- localmoran(crd.data$MdInc, crd.lw)

crd.data$Ii <- lisa.test[,1]
crd.data$E.Ii<- lisa.test[,2]
crd.data$Var.Ii<- lisa.test[,3]
crd.data$Z.Ii<- lisa.test[,4]
crd.data$P<- lisa.test[,5]

Calculating the statistical significance values of autocorrelation is done by targeting a z-value of 1.96 as the 95% significance level, where if the z-value is less than 1.96, the null hypothesis holds true that there is no difference between the measured local Moran's i value and the expected randomly distributed local Moran's i value. However if the z-value is greater than 1.96, then the alternate hypothesis holds true that with a 95% level of confidence, there is a significant spatial autocorrelation. Using a 2-tailed distribution, if the Ii value is less than the Ei value, then it would be a dispersed negative spatial autocorrelation distribution, and if the Ii value is greater than the Ei value, it would be a clustered positive spatial autocorrelation distribution. The p-value is the probability of making a wrong decision about the 95% significance level, and is calculated as 100% - 95% = 5%, so if the p-value is less than 0.05 it confirms the 95% confidence limit with only 5% or less chance of being wrong. The p-value calculated in the R lisa test is a one-tailed test, so the value would need to be doubled to be used as a reliable test for error.

# if z value is less than 95% significance level, value is FALSE
crd.data$Z.Ho <- crd.data$Z.Ii < 1.96
# if Ii < 0, value is negative
crd.data$Ii.Neg <- crd.data$Ii < 0
# P-value is one-tailed, need to multiply p-value *2 to correspond to 95% 2 -tailed if P *2>0.05 - higher chance of wrongly deciding significance
crd.data$Po <- (crd.data$P * 2) > 0.05

Add the data value variables from step X.3 to a data frame to create a new table of values for the Local Moran’s Ii. The GUID is included to make it easier to identify specific polygons of interest. The table is available to view by clicking the drop-down accordion button below.

# make a data frome from the Lisa Test Data
data.for.table2 = data.frame(crd.data$GUID, crd.data$Ii, crd.data$E.Ii, crd.data$Var.Ii, crd.data$Z.Ii, crd.data$GUID, crd.data$Ii.Neg, crd.data$Z.Ho, crd.data$P.Ho, crd.data$P)

Click for Table of Lisa Test Data for Median 2015 $ Income

##    crd.data.GUID   crd.data.Ii crd.data.E.Ii crd.data.Var.Ii crd.data.Z.Ii
## 1        9350151  0.2744151305   -0.01369863      0.14693811   0.751617311
## 2        9350151  0.0557309861   -0.01369863      0.22656398   0.145864407
## 3        9350160 -0.2349448907   -0.01369863      0.14693811  -0.577176596
## 4        9350160  0.3326591723   -0.01369863      0.22656398   0.727661741
## 5        9350001  0.1990156088   -0.01369863      0.17878846   0.503068148
## 6        9350002 -0.0338955771   -0.01369863      0.14693811  -0.052688823
## 7        9350004  0.0128136506   -0.01369863      0.17878846   0.062701416
## 8        9350009  0.7919282302   -0.01369863      0.17878846   1.905303636
## 9        9350005 -0.0464760893   -0.01369863      0.17878846  -0.077518533
## 10       9350006  0.4657065327   -0.01369863      0.12418786   1.360389058
## 11       9350007  0.0067879072   -0.01369863      0.10712518   0.062592570
## 12       9350133  0.2947025748   -0.01369863      0.09385420   1.006674841
## 13       9350170 -0.0923032394   -0.01369863      0.46544159  -0.115216696
## 14       9350171 -0.0285935895   -0.01369863      0.30618985  -0.026918070
## 15       9350010  0.4126788509   -0.01369863      0.09385420   1.391769798
## 16       9350011 -0.2187477999   -0.01369863      0.10712518  -0.626487255
## 17       9350012  0.4433240328   -0.01369863      0.12418786   1.296875124
## 18       9350100  1.2205874502   -0.01369863      0.30618985   2.230593514
## 19       9350101  0.4616826274   -0.01369863      0.17878846   1.124274380
## 20       9350102  0.2638695873   -0.01369863      0.12418786   0.787644346
## 21       9350103  2.1078481863   -0.01369863      0.22656398   4.457149340
## 22       9350003 -0.0124306146   -0.01369863      0.17878846   0.002998851
## 23       9350126  0.0980896658   -0.01369863      0.08323742   0.387469072
## 24       9350127 -0.0231942136   -0.01369863      0.22656398  -0.019949234
## 25       9350104  0.1047856106   -0.01369863      0.12418786   0.336218041
## 26       9350110  0.0246750827   -0.01369863      0.10712518   0.117243303
## 27       9350128  0.1900350829   -0.01369863      0.14693811   0.531490704
## 28       9350131  0.0019320024   -0.01369863      0.08323742   0.054177288
## 29       9350120  0.9284834191   -0.01369863      0.17878846   2.228255999
## 30       9350122  0.2032721811   -0.01369863      0.17878846   0.513134921
## 31       9350124  0.1259530137   -0.01369863      0.12418786   0.396283943
## 32       9350008  0.4535832667   -0.01369863      0.17878846   1.105119431
## 33       9350152  0.0760594107   -0.01369863      0.14693811   0.234156457
## 34       9350153  0.6622247407   -0.01369863      0.30618985   1.221524176
## 35       9350123  0.2429889826   -0.01369863      0.17878846   0.607064965
## 36       9350003  0.0356433844   -0.01369863      0.17878846   0.116693626
## 37       9350129 -0.0004154355   -0.01369863      0.17878846   0.031414691
## 38       9350129 -0.0193496543   -0.01369863      0.12418786  -0.016035688
## 39       9350151  0.0074249521   -0.01369863      0.17878846   0.049957170
## 40       9350111  0.1856654215   -0.01369863      0.22656398   0.418843150
## 41       9350111  0.0595949852   -0.01369863      0.12418786   0.207982392
## 42       9350121 -0.2359649540   -0.01369863      0.12418786  -0.630716351
## 43       9350121 -0.0579399062   -0.01369863      0.14693811  -0.115414512
## 44       9350121  0.1041228751   -0.01369863      0.30618985   0.212926233
## 45       9350123  0.1458425508   -0.01369863      0.14693811   0.416203354
## 46       9350160 -0.1574332175   -0.01369863      0.17878846  -0.339931605
## 47       9350155  0.0536021511   -0.01369863      0.22656398   0.141391946
## 48       9350132  0.6232149028   -0.01369863      0.14693811   1.661549369
## 49       9350150  0.5484882380   -0.01369863      0.10712518   1.717650982
## 50       9350156 -0.0440773555   -0.01369863      0.02923030  -0.177685867
## 51       9350160 -1.9458534186   -0.01369863      0.30618985  -3.491777155
## 52       9350180 -0.3484095342   -0.01369863      0.14693811  -0.873177696
## 53       9350160 -0.0252503561   -0.01369863      0.12418786  -0.032779876
## 54       9350013  1.0843288469   -0.01369863      0.14693811   2.864481232
## 55       9350013  0.2836125755   -0.01369863      0.10712518   0.908375690
## 56       9350121  0.0384246510   -0.01369863      0.22656398   0.109505596
## 57       9350130  0.0025467591   -0.01369863      0.14693811   0.042380189
## 58       9350130 -0.0346348845   -0.01369863      0.22656398  -0.043984894
## 59       9350180 -3.7243153785   -0.01369863      0.94319681  -3.820717593
## 60       9350150  0.0117550365   -0.01369863      0.09385420   0.083085168
## 61       9350150  0.2614551227   -0.01369863      0.22656398   0.578069434
## 62       9350156  0.1399328018   -0.01369863      0.46544159   0.225189161
## 63       9350180  0.4384667710   -0.01369863      0.46544159   0.662772886
## 64       9350160 -0.3612599654   -0.01369863      0.94319681  -0.357874121
## 65       9350154  0.2206921127   -0.01369863      0.14693811   0.611467288
## 66       9350132  0.4853407163   -0.01369863      0.14693811   1.301869827
## 67       9350125  0.3210323517   -0.01369863      0.22656398   0.703235000
## 68       9350125 -0.0802313849   -0.01369863      0.14693811  -0.173567448
## 69       9350014  0.3159098132   -0.01369863      0.12418786   0.935316835
## 70       9350014  0.6415313555   -0.01369863      0.12418786   1.859320199
## 71       9350155  0.4949972138   -0.01369863      0.17878846   1.203063216
## 72       9350156 -0.0146686830   -0.01369863      0.30618985  -0.001753073
## 73       9350154  0.2927142780   -0.01369863      0.09385420   1.000184696
## 74       9350132  1.0828675345   -0.01369863      0.30618985   1.981707007
##    crd.data.GUID.1 crd.data.Ii.Neg crd.data.Z.Ho crd.data.P.Ho
## 1          9350151           FALSE          TRUE          TRUE
## 2          9350151           FALSE          TRUE          TRUE
## 3          9350160            TRUE          TRUE          TRUE
## 4          9350160           FALSE          TRUE          TRUE
## 5          9350001           FALSE          TRUE          TRUE
## 6          9350002            TRUE          TRUE          TRUE
## 7          9350004           FALSE          TRUE          TRUE
## 8          9350009           FALSE          TRUE          TRUE
## 9          9350005            TRUE          TRUE          TRUE
## 10         9350006           FALSE          TRUE          TRUE
## 11         9350007           FALSE          TRUE          TRUE
## 12         9350133           FALSE          TRUE          TRUE
## 13         9350170            TRUE          TRUE          TRUE
## 14         9350171            TRUE          TRUE          TRUE
## 15         9350010           FALSE          TRUE          TRUE
## 16         9350011            TRUE          TRUE          TRUE
## 17         9350012           FALSE          TRUE          TRUE
## 18         9350100           FALSE         FALSE         FALSE
## 19         9350101           FALSE          TRUE          TRUE
## 20         9350102           FALSE          TRUE          TRUE
## 21         9350103           FALSE         FALSE         FALSE
## 22         9350003            TRUE          TRUE          TRUE
## 23         9350126           FALSE          TRUE          TRUE
## 24         9350127            TRUE          TRUE          TRUE
## 25         9350104           FALSE          TRUE          TRUE
## 26         9350110           FALSE          TRUE          TRUE
## 27         9350128           FALSE          TRUE          TRUE
## 28         9350131           FALSE          TRUE          TRUE
## 29         9350120           FALSE         FALSE         FALSE
## 30         9350122           FALSE          TRUE          TRUE
## 31         9350124           FALSE          TRUE          TRUE
## 32         9350008           FALSE          TRUE          TRUE
## 33         9350152           FALSE          TRUE          TRUE
## 34         9350153           FALSE          TRUE          TRUE
## 35         9350123           FALSE          TRUE          TRUE
## 36         9350003           FALSE          TRUE          TRUE
## 37         9350129            TRUE          TRUE          TRUE
## 38         9350129            TRUE          TRUE          TRUE
## 39         9350151           FALSE          TRUE          TRUE
## 40         9350111           FALSE          TRUE          TRUE
## 41         9350111           FALSE          TRUE          TRUE
## 42         9350121            TRUE          TRUE          TRUE
## 43         9350121            TRUE          TRUE          TRUE
## 44         9350121           FALSE          TRUE          TRUE
## 45         9350123           FALSE          TRUE          TRUE
## 46         9350160            TRUE          TRUE          TRUE
## 47         9350155           FALSE          TRUE          TRUE
## 48         9350132           FALSE          TRUE          TRUE
## 49         9350150           FALSE          TRUE          TRUE
## 50         9350156            TRUE          TRUE          TRUE
## 51         9350160            TRUE          TRUE          TRUE
## 52         9350180            TRUE          TRUE          TRUE
## 53         9350160            TRUE          TRUE          TRUE
## 54         9350013           FALSE         FALSE         FALSE
## 55         9350013           FALSE          TRUE          TRUE
## 56         9350121           FALSE          TRUE          TRUE
## 57         9350130           FALSE          TRUE          TRUE
## 58         9350130            TRUE          TRUE          TRUE
## 59         9350180            TRUE          TRUE          TRUE
## 60         9350150           FALSE          TRUE          TRUE
## 61         9350150           FALSE          TRUE          TRUE
## 62         9350156           FALSE          TRUE          TRUE
## 63         9350180           FALSE          TRUE          TRUE
## 64         9350160            TRUE          TRUE          TRUE
## 65         9350154           FALSE          TRUE          TRUE
## 66         9350132           FALSE          TRUE          TRUE
## 67         9350125           FALSE          TRUE          TRUE
## 68         9350125            TRUE          TRUE          TRUE
## 69         9350014           FALSE          TRUE          TRUE
## 70         9350014           FALSE          TRUE          TRUE
## 71         9350155           FALSE          TRUE          TRUE
## 72         9350156            TRUE          TRUE          TRUE
## 73         9350154           FALSE          TRUE          TRUE
## 74         9350132           FALSE         FALSE         FALSE
##      crd.data.P
## 1  2.261406e-01
## 2  4.420142e-01
## 3  7.180899e-01
## 4  2.334103e-01
## 5  3.074582e-01
## 6  5.210101e-01
## 7  4.750021e-01
## 8  2.837031e-02
## 9  5.308945e-01
## 10 8.685342e-02
## 11 4.750455e-01
## 12 1.570455e-01
## 13 5.458633e-01
## 14 5.107375e-01
## 15 8.199606e-02
## 16 7.345023e-01
## 17 9.733708e-02
## 18 1.285403e-02
## 19 1.304483e-01
## 20 2.154524e-01
## 21 4.152837e-06
## 22 4.988036e-01
## 23 3.492045e-01
## 24 5.079581e-01
## 25 3.683532e-01
## 26 4.533336e-01
## 27 2.975394e-01
## 28 4.783970e-01
## 29 1.293172e-02
## 30 3.039285e-01
## 31 3.459478e-01
## 32 1.345539e-01
## 33 4.074318e-01
## 34 1.109438e-01
## 35 2.719039e-01
## 36 4.535514e-01
## 37 4.874694e-01
## 38 5.063970e-01
## 39 4.800783e-01
## 40 3.376654e-01
## 41 4.176214e-01
## 42 7.358870e-01
## 43 5.459417e-01
## 44 4.156923e-01
## 45 3.386306e-01
## 46 6.330460e-01
## 47 4.437802e-01
## 48 4.830158e-02
## 49 4.293015e-02
## 50 5.705152e-01
## 51 9.997601e-01
## 52 8.087169e-01
## 53 5.130749e-01
## 54 2.088465e-03
## 55 1.818399e-01
## 56 4.564007e-01
## 57 4.830978e-01
## 58 5.175418e-01
## 59 9.999335e-01
## 60 4.668919e-01
## 61 2.816086e-01
## 62 4.109161e-01
## 63 2.537380e-01
## 64 6.397812e-01
## 65 2.704451e-01
## 66 9.648044e-02
## 67 2.409547e-01
## 68 5.688973e-01
## 69 1.748125e-01
## 70 3.149088e-02
## 71 1.144759e-01
## 72 5.006994e-01
## 73 1.586106e-01
## 74 2.375602e-02

Note for Variable II: change the file name to reflect the new variable when saving the table results to a csv file.

# write csv file with Lisa Test results for I, E, Var, Z, P
write.csv(data.for.table2, "LisaTest_MdInc.csv", row.names = FALSE)

Interpreting the output: If the value of Z is less than 1.96, the value of the Z.Ho null hypothesis will be TRUE, but if the value of Z is greater than 1.96, the Z.Ho value will be FALSE, meaning that with 95% confidence there is a spatial autocorrelation with this particular polygon. If the P.Ho value is also false, the 2-tailed P value is less than 0.05 which helps confirms the Z-value statistic. To determine how the autocorrelation is distributed, if the Ii.Neg is TRUE, the distribution is negatively dispersed, but if the Ii.Neg is FALSE, the distribution is positively clustered.

The MdInc variable analysis had 5 polygons of spatial autocorrelation, with 95% confidence levels and less than 5% chance of error, that have a positively clustered autocorrelation distribution (the polygon GUID 9350100, 9350103, 9350120, 9350013, 9350132 are highlighted on the scatter plot in figure 6a, section XII).

Click for Table of Lisa Test Data for Renter % of Population

##    crd.dataR.GUID crd.dataR.Ii crd.dataR.E.Ii crd.dataR.Var.Ii
## 1         9350151  0.233821626    -0.01369863       0.15045712
## 2         9350151  0.032943097    -0.01369863       0.23218304
## 3         9350160  0.549070589    -0.01369863       0.15045712
## 4         9350160  0.503692530    -0.01369863       0.23218304
## 5         9350001  0.060472308    -0.01369863       0.18314749
## 6         9350002  1.556366078    -0.01369863       0.15045712
## 7         9350004  3.391433333    -0.01369863       0.18314749
## 8         9350009  1.553256264    -0.01369863       0.18314749
## 9         9350005  1.140736503    -0.01369863       0.18314749
## 10        9350006 -0.071622535    -0.01369863       0.12710685
## 11        9350007  1.333661488    -0.01369863       0.10959416
## 12        9350133  0.360821928    -0.01369863       0.09597317
## 13        9350170  0.205149532    -0.01369863       0.47736081
## 14        9350171  0.316206463    -0.01369863       0.31390896
## 15        9350010  3.643380638    -0.01369863       0.09597317
## 16        9350011  1.379862981    -0.01369863       0.10959416
## 17        9350012  1.839610501    -0.01369863       0.12710685
## 18        9350100  0.359894929    -0.01369863       0.31390896
## 19        9350101 -0.045775803    -0.01369863       0.18314749
## 20        9350102 -0.018460853    -0.01369863       0.12710685
## 21        9350103  0.640176636    -0.01369863       0.23218304
## 22        9350003  1.897336263    -0.01369863       0.18314749
## 23        9350126  0.098397443    -0.01369863       0.08507638
## 24        9350127 -0.026062633    -0.01369863       0.23218304
## 25        9350104  0.147756323    -0.01369863       0.12710685
## 26        9350110  0.431022202    -0.01369863       0.10959416
## 27        9350128  0.356600204    -0.01369863       0.15045712
## 28        9350131  0.304032189    -0.01369863       0.08507638
## 29        9350120  0.577861864    -0.01369863       0.18314749
## 30        9350122 -0.168322406    -0.01369863       0.18314749
## 31        9350124 -0.001621923    -0.01369863       0.12710685
## 32        9350008  4.474925356    -0.01369863       0.18314749
## 33        9350152  0.064613726    -0.01369863       0.15045712
## 34        9350153 -0.904832954    -0.01369863       0.31390896
## 35        9350123  0.011019807    -0.01369863       0.18314749
## 36        9350003  3.129918431    -0.01369863       0.18314749
## 37        9350129  0.136728114    -0.01369863       0.18314749
## 38        9350129  0.199580888    -0.01369863       0.12710685
## 39        9350151  0.097877099    -0.01369863       0.18314749
## 40        9350111  0.529096109    -0.01369863       0.23218304
## 41        9350111  0.236191920    -0.01369863       0.12710685
## 42        9350121  0.226356428    -0.01369863       0.12710685
## 43        9350121  0.604421772    -0.01369863       0.15045712
## 44        9350121  0.442216620    -0.01369863       0.31390896
## 45        9350123  0.147383189    -0.01369863       0.15045712
## 46        9350160  0.713133316    -0.01369863       0.18314749
## 47        9350155  0.387150645    -0.01369863       0.23218304
## 48        9350132  0.612030700    -0.01369863       0.15045712
## 49        9350150  0.156866528    -0.01369863       0.10959416
## 50        9350156  0.234052890    -0.01369863       0.02964488
## 51        9350160  0.662880165    -0.01369863       0.31390896
## 52        9350180  0.630290771    -0.01369863       0.15045712
## 53        9350160  0.223824443    -0.01369863       0.12710685
## 54        9350013  1.838131703    -0.01369863       0.15045712
## 55        9350013  0.305859518    -0.01369863       0.10959416
## 56        9350121  0.387356307    -0.01369863       0.23218304
## 57        9350130  0.099487976    -0.01369863       0.15045712
## 58        9350130  0.046748357    -0.01369863       0.23218304
## 59        9350180  0.998762311    -0.01369863       0.96771634
## 60        9350150  0.043229883    -0.01369863       0.09597317
## 61        9350150 -0.252114651    -0.01369863       0.23218304
## 62        9350156  0.355471073    -0.01369863       0.47736081
## 63        9350180  0.645739004    -0.01369863       0.47736081
## 64        9350160  0.290854679    -0.01369863       0.96771634
## 65        9350154  0.299887489    -0.01369863       0.15045712
## 66        9350132  0.601525828    -0.01369863       0.15045712
## 67        9350125  0.811880946    -0.01369863       0.23218304
## 68        9350125 -0.037825907    -0.01369863       0.15045712
## 69        9350014  0.151580204    -0.01369863       0.12710685
## 70        9350014  0.262426181    -0.01369863       0.12710685
## 71        9350155  0.584495330    -0.01369863       0.18314749
## 72        9350156  0.157706771    -0.01369863       0.31390896
## 73        9350154  0.104039684    -0.01369863       0.09597317
## 74        9350132  0.768114071    -0.01369863       0.31390896
##    crd.dataR.Z.Ii crd.dataR.GUID.1 crd.dataR.Ii.Neg crd.dataR.Z.Ho
## 1      0.63812297          9350151            FALSE           TRUE
## 2      0.09679644          9350151            FALSE           TRUE
## 3      1.45085485          9350160            FALSE           TRUE
## 4      1.07375144          9350160            FALSE           TRUE
## 5      0.17331386          9350001            FALSE           TRUE
## 6      4.04772671          9350002            FALSE          FALSE
## 7      7.95670883          9350004            FALSE          FALSE
## 8      3.66147450          9350009            FALSE          FALSE
## 9      2.69754721          9350005            FALSE          FALSE
## 10    -0.16247006          9350006             TRUE           TRUE
## 11     4.06995858          9350007            FALSE          FALSE
## 12     1.20892885          9350133            FALSE           TRUE
## 13     0.31675212          9350170            FALSE           TRUE
## 14     0.58882625          9350171            FALSE           TRUE
## 15    11.80482233          9350010            FALSE          FALSE
## 16     4.20951901          9350011            FALSE          FALSE
## 17     5.19832443          9350012            FALSE          FALSE
## 18     0.66680297          9350100            FALSE           TRUE
## 19    -0.07495414          9350101             TRUE           TRUE
## 20    -0.01335750          9350102             TRUE           TRUE
## 21     1.35699942          9350103            FALSE           TRUE
## 22     4.46547986          9350003            FALSE          FALSE
## 23     0.38431373          9350126            FALSE           TRUE
## 24    -0.02565924          9350127             TRUE           TRUE
## 25     0.45286305          9350104            FALSE           TRUE
## 26     1.34336421          9350110            FALSE           TRUE
## 27     0.95465395          9350128            FALSE           TRUE
## 28     1.08931840          9350131            FALSE           TRUE
## 29     1.38228846          9350120            FALSE           TRUE
## 30    -0.36130652          9350122             TRUE           TRUE
## 31     0.03387381          9350124             TRUE           TRUE
## 32    10.48848459          9350008            FALSE          FALSE
## 33     0.20189424          9350152            FALSE           TRUE
## 34    -1.59052798          9350153             TRUE           TRUE
## 35     0.05775911          9350123            FALSE           TRUE
## 36     7.34563180          9350003            FALSE          FALSE
## 37     0.35149939          9350129            FALSE           TRUE
## 38     0.59822515          9350129            FALSE           TRUE
## 39     0.26071694          9350151            FALSE           TRUE
## 40     1.12647195          9350111            FALSE           TRUE
## 41     0.70091499          9350111            FALSE           TRUE
## 42     0.67332754          9350121            FALSE           TRUE
## 43     1.59355372          9350121            FALSE           TRUE
## 44     0.81373362          9350121            FALSE           TRUE
## 45     0.41527918          9350123            FALSE           TRUE
## 46     1.69837476          9350160            FALSE           TRUE
## 47     0.83188991          9350155            FALSE           TRUE
## 48     1.61317003          9350132            FALSE           TRUE
## 49     0.51522464          9350150            FALSE           TRUE
## 50     1.43893589          9350156            FALSE           TRUE
## 51     1.20758171          9350160            FALSE           TRUE
## 52     1.66024565          9350180            FALSE           TRUE
## 53     0.66622560          9350160            FALSE           TRUE
## 54     4.77413642          9350013            FALSE          FALSE
## 55     0.96528642          9350013            FALSE           TRUE
## 56     0.83231672          9350121            FALSE           TRUE
## 57     0.29180227          9350130            FALSE           TRUE
## 58     0.12544675          9350130            FALSE           TRUE
## 59     1.02921058          9350180            FALSE           TRUE
## 60     0.18376167          9350150            FALSE           TRUE
## 61    -0.49478917          9350150             TRUE           TRUE
## 62     0.53432153          9350156            FALSE           TRUE
## 63     0.95444377          9350180            FALSE           TRUE
## 64     0.30959168          9350160            FALSE           TRUE
## 65     0.80844497          9350154            FALSE           TRUE
## 66     1.58608780          9350132            FALSE           TRUE
## 67     1.71334055          9350125            FALSE           TRUE
## 68    -0.06220166          9350125             TRUE           TRUE
## 69     0.46358861          9350014            FALSE           TRUE
## 70     0.77449915          9350014            FALSE           TRUE
## 71     1.39778875          9350155            FALSE           TRUE
## 72     0.30593041          9350156            FALSE           TRUE
## 73     0.38005189          9350154            FALSE           TRUE
## 74     1.39540689          9350132            FALSE           TRUE
##    crd.dataR.P.Ho  crd.dataR.P
## 1            TRUE 2.616968e-01
## 2            TRUE 4.614440e-01
## 3            TRUE 7.341014e-02
## 4            TRUE 1.414671e-01
## 5            TRUE 4.312024e-01
## 6           FALSE 2.585874e-05
## 7           FALSE 8.833791e-16
## 8           FALSE 1.253839e-04
## 9           FALSE 3.492619e-03
## 10           TRUE 5.645321e-01
## 11          FALSE 2.351075e-05
## 12           TRUE 1.133451e-01
## 13           TRUE 3.757159e-01
## 14           TRUE 2.779889e-01
## 15          FALSE 1.842843e-32
## 16          FALSE 1.279575e-05
## 17          FALSE 1.005465e-07
## 18           TRUE 2.524490e-01
## 19           TRUE 5.298744e-01
## 20           TRUE 5.053287e-01
## 21           TRUE 8.739070e-02
## 22          FALSE 3.994474e-06
## 23           TRUE 3.503730e-01
## 24           TRUE 5.102354e-01
## 25           TRUE 3.253237e-01
## 26           TRUE 8.957703e-02
## 27           TRUE 1.698764e-01
## 28           TRUE 1.380067e-01
## 29           TRUE 8.344157e-02
## 30           TRUE 6.410648e-01
## 31           TRUE 4.864889e-01
## 32          FALSE 4.879043e-26
## 33           TRUE 4.199997e-01
## 34           TRUE 9.441421e-01
## 35           TRUE 4.769703e-01
## 36          FALSE 1.023947e-13
## 37           TRUE 3.626069e-01
## 38           TRUE 2.748449e-01
## 39           TRUE 3.971554e-01
## 40           TRUE 1.299829e-01
## 41           TRUE 2.416780e-01
## 42           TRUE 2.503695e-01
## 43           TRUE 5.551801e-02
## 44           TRUE 2.078988e-01
## 45           TRUE 3.389688e-01
## 46           TRUE 4.471853e-02
## 47           TRUE 2.027355e-01
## 48           TRUE 5.335379e-02
## 49           TRUE 3.031980e-01
## 50           TRUE 7.508434e-02
## 51           TRUE 1.136041e-01
## 52           TRUE 4.843252e-02
## 53           TRUE 2.526335e-01
## 54          FALSE 9.024001e-07
## 55           TRUE 1.672007e-01
## 56           TRUE 2.026151e-01
## 57           TRUE 3.852189e-01
## 58           TRUE 4.500849e-01
## 59           TRUE 1.516904e-01
## 60           TRUE 4.271002e-01
## 61           TRUE 6.896255e-01
## 62           TRUE 2.965595e-01
## 63           TRUE 1.699295e-01
## 64           TRUE 3.784357e-01
## 65           TRUE 2.094172e-01
## 66           TRUE 5.635970e-02
## 67           TRUE 4.332495e-02
## 68           TRUE 5.247989e-01
## 69           TRUE 3.214713e-01
## 70           TRUE 2.193178e-01
## 71           TRUE 8.108826e-02
## 72           TRUE 3.798288e-01
## 73           TRUE 3.519534e-01
## 74           TRUE 8.144659e-02

Interpreting Variable II output: Renter % of population variable analysis had 12 polygons with 95% confidence levels and less than 5% chance of error that they were positively clustered autocorrelation distribution. Though only 10 are considered highly significant enough to be mapped as outliers in the scatter plot of figure 6a in section XII., with the two lowest significant z values of 2.6975 and 3.6614 not included (9350002, 9350004, 9350007, 9350010, 9350011, 9350012, 9350003, 9350008, 9350003, 9350013).

ToC

XI. Mapping the Local Moran's i

The Local Moran's i test is used to analyze the autocorrelation of each individual polygon, as opposed to the Global Moran's I which treats all polygons together as one.

Note for Variable II: Change the following values tm_polygons(col = "NewColumnName", title= "New Title") and tm_layout(title= "New Title"). For the new variable, the crd.data will reflect the new data column as created in Step IV. after removing the na values from the new variable column. When saving a file, the file name needs to be changed to reflect the different variable name.

# Map Lisa
map_LISA <- tm_shape(crd.data) + 
  tm_polygons(col = "Ii", 
              title = "Local Moran's i", 
              style = "jenks", 
              palette = "viridis", n = 6)  + 
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLocal Moran's i", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))

$\label{fig:figs}Figure 5a: CRD Census Map of Median 2015 $ Income Local Moran's i$

Figure 5a: CRD Census Map of Median 2015 $ Income Local Moran’s i

# create png of Map LISA MdInc
png("map_LISA_MdInc.png")  
map_LISA
dev.off()

Interpreting the output: The Local Moran's i map shows how the values are distributed through the different polygons.

Click to View Local Moran’s i Map for Renter % of Population

$\label{fig:figs}Figure 5b: CRD Census Map of Renter % Population Local Moran's i$

Figure 5b: CRD Census Map of Renter % Population Local Moran’s i

Interpreting Variable II output: The Local Moran's i map shows how the values are distributed through the different polygons, and interestingly seems to have some areas of similar groupings exactly opposite to the Median Income Local Moran's i map.

ToC

XII. Local Moran's i Scatter Plot

The Local Moran's i Scatter Plot plots each of the separate polygon Local Moran's i value against the weighted lagged mean value of that polygons neighbours. The plot is created using the census data from the MdInc variable, and the Queen's Neighbourhood Weighted value (crd.lw), ignoring the 0 values of polygons not considered as neighbours.

Note for Variable II: When running the code for the second variable, the new column name will need to replace the item crd.data$MdInc, the xlab="New Variable Label Name", the ylab = "New Variable Label Name" and main = "New VariableName title".

# Scatter Plot
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income", 
           ylab="Spatially Lagged Mean Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")

$\label{fig:figs}Figure 6a: CRD Census Plot of Median 2015 $ Income Local Moran's i$

Figure 6a: CRD Census Plot of Median 2015 $ Income Local Moran’s i

Note for Variable II: Change the saved file name to reflect the new variable. When running the code for the second variable, the new column name will need to replace the item crd.data$MdInc, the xlab="New Variable Label Name", the ylab = "New Variable Label Name", and main = "New VariableName title".

# create png of Moran scatter plot
png("moran.plot_MdInc.png")  
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income", 
           ylab="Spatially Lagged Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
dev.off()

Interpreting the output: The scatter plot of the Local Moran's i values are plotted against the weighted lagged means values of the neighbouring polygons. As the weighted lagged means is a value created by the Queen Neighbourhood weighting scheme, the value will depend on how the neighbours are chosen. The centre of the dashed grid lines represent the 0 value, from where the values are placed in relation to each other. The solid trend line from the lower lelft to the upper right signifies a positive autocorrelation, with 5 highly significant outlier values portrayed by the diamond shapes, and related to the 5 points of significance found in the table of section X.3

Click for Variable II: Local Moran's i Plot for Renter % of Population

$\label{fig:figs}Figure 6b: CRD 2016 Census Plot of Renter % Population Local Moran's i$

Figure 6b: CRD 2016 Census Plot of Renter % Population Local Moran's i

Interpreting Variable II output: The Local Moran's plot of the lagged mean of renter % of population shows a positive autocorrelation, with 10 points with significantly higher z values, between 4.06 and 11.8 (see table of values in section X.).

ToC

XIII. Summary

Mapping data classifications with chloropleth maps provides a good vizualization of how groups of polygons are distributed, however, when statistical tests are applied to this data, spatial autocorrelation patterns can be stated with 95% confidence that they are statistically significant, and might warrant further study.

R code can be reused, and modified to accomodate available datasets, and work towards producing reproducible results for specific information needs. However, if more than 2 variables were to be used, an automated variable name replacement coding scheme would be useful, but for first learning these techniques, less complication results from having identifiable variable names to work with until more familiar with how the spatial autocorrelation analysis works in R. This could be the topic of another R tutorial, if there was enough future interest.

Census data not only gives a snapshot of a sample location's household attributes, but when 95% confidence statements can be made about the clustering or dispersal of similar household income levels, or patterns of rental housing, or any other household attributes, this information can assist regional and local planners, as well as different levels of government, use this data to allocate funding for specific needs of the population.

Spatial autocorrelation analysis can help make more sense of patterns of similarities and differences in the Capital Regional District, and due to the cyclical nature of census data collection every 5 years, could produce regular studies to look at trends and changes over time.

XIV. The Full Code

Click for The Full Code for One Variable

## ----Set_Working_Directory-----------------
## Set working directory
dir <- "/Working/Data"
setwd(dir)
getwd()


## ----Install_Libraries---------------------------------------------------
# remove has# to run code packages need to be installed
# install.packages(c("plyr", "dplyr", "spdep", "GISTools", "raster", "maptools", "rgdal", "tmap", "BAMMtools", "shinyjs"))


## ----Load_Libraries------------------------
library("plyr")
library("dplyr")
library("spdep")
library("GISTools")
library("raster")
library("maptools")
library("rgdal")
library("tmap")
library("BAMMtools")
library("shinyjs")


## ----Clean_Data_Read_shp-------------------------------------------------
tracts <- readOGR(dsn = ".", layer = "Vic_Census")
tracts


## ----Clean_Data_Read_csv-------------------------------------------------
census.16 <- read.csv("CensusTractData.csv")


## ----Summary---------------------------------
summary(census.16)


## ----Find_common_variable---------------------------------------------
tracts@data


## ----Merge_shp_csv-------------------------------------------------------
crd.data <- merge(tracts, census.16, by = "GUID")


## ----Remove_na-----------------------------------------------
crd.data <- crd.data[!is.na(crd.data$MdInc),]


## ----summary_crddata-------------------------
class(crd.data)
summary(crd.data)


## ----Colour_Palette------------------------------------------------------
# remove has# to run code
# tmaptools::palette_explorer()


## ----Make_Chloropleth_map------------------------------------------------
map_CensVar <- tm_shape(crd.data) + 
  tm_polygons(col = "MdInc", 
              title = "Median 2015 $ Income", 
              style = "jenks", 
              palette = "viridis", n = 6) + 
  
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  
  tm_compass(position = c("RIGHT", "TOP")) + 
  
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))


## ----Create_Map-----------------------------------------------------------
map_CensVar

# create png of Median Income 
png("map_CensVar_MdInc.png")  
map_CensVar
dev.off()


## ----Defining_Queen_Neighbourhoods---------------------------------------
crd.nb <- poly2nb(crd.data)
crd.net <- nb2lines(crd.nb, coords=coordinates(crd.data))


## ----Map_Queen_Neighbours-------------------------------------------------
map_crd.net <- tm_shape(crd.data) + tm_borders(col='lightgrey') + 
  tm_shape(crd.net) + tm_lines(col='red') +
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nQueen Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))


## ----Defining_Rook_Neighbourhoods----------------------------------------
crd.nb2 <- poly2nb(crd.data, queen = FALSE)
crd.net2 <- nb2lines(crd.nb2, coords=coordinates(crd.data))


## ----Mapping_Rook_Neighbours-----------------------------------------------
map_crd.net2 <- tm_shape(crd.data) + tm_borders(col='lightgrey') + 
  tm_shape(crd.net) + tm_lines(col='red', lwd = 2) +
  tm_shape(crd.net2) + tm_lines(col='blue', lwd = 2) + 
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nRook (blue) & Queen (blue & red) Neighbours", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))


## ----create_Neighbourhood_map---------------------------------------------
map_crd.net


## ----Rook_Neighbour-------------------------------------------------------
map_crd.net2


## ----Map Neighbours--------------------------------------------------------
# create png of Median Income CRD net
png("map_crd.net_MdInc.png")  
map_crd.net
dev.off()


# create png of Median Income CRD net 2
png("map_crd.net2_MdInc.png")  
map_crd.net2
dev.off()


## ----Weight_Matrix-------------------------------------------------------
crd.lw <- nb2listw(crd.nb, zero.policy = TRUE, style = "W")
print.listw(crd.lw, zero.policy = TRUE)


## ----Calculate_Lag_Means-------------------------------------------------
# Calculate Lag Means
crd.data$IncLagMeans = lag.listw(crd.lw, crd.data$MdInc, zero.policy = TRUE)

map_LagMean <- tm_shape(crd.data) + 
  tm_polygons(col = "IncLagMeans", 
              title = "Median 2015 $ Income\nLagged Means", 
              style = "jenks", 
              palette = "viridis", n = 6) +
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLagged Means", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))


## ----LagMean_Output------------------------------------------------------
map_LagMean

# create png of map LagMean Median Income CRD net 2
png("map_LagMean_MdInc.png")  
map_LagMean
dev.off()


## ----Global morans_I_test--------------------------------------------------
# moran's I test
mi <- moran.test(crd.data$MdInc, crd.lw, zero.policy = TRUE)
mi

moran.range <- function(lw) {
  wmat <- listw2mat(lw)
  return(range(eigen((wmat + t(wmat))/2)$values))
}
moran.range(crd.lw)


## ----Calculate_Z_value for Global Morans-----------------------------------
## calculate z value
mI <- mi$estimate[[1]]
eI <- mi$estimate[[2]]
var <- mi$estimate[[3]]
z <- (mI - eI) / sqrt(var)


## ----add_data_to_table_object for Global Morans---------------------------
# Add Stats data objects to data.frame ########
data.for.table1 = data.frame(mI, eI, var, z)
data.for.table1


## ----write_csv_Globalmorans----------------------------------------------
# write csv file with Global Moran's results for I, E, Var, Z, P
write.csv(data.for.table1, "moransGlobal_MdInc.csv", row.names = FALSE)


## ----Lisa_Test Local Moran's i--------------------------------------------
# Lisa Test
lisa.test <- localmoran(crd.data$MdInc, crd.lw)

crd.data$Ii <- lisa.test[,1]
crd.data$E.Ii<- lisa.test[,2]
crd.data$Var.Ii<- lisa.test[,3]
crd.data$Z.Ii<- lisa.test[,4]
crd.data$P<- lisa.test[,5]


## ----Calculate_significance_values  Local Moran's i-----------------------
# if z value is less than 95% significance level, value is FALSE
crd.data$Z.Ho <- crd.data$Z.Ii < 1.96
# if Ii < 0, value is negative
crd.data$Ii.Neg <- crd.data$Ii < 0
# P-value is one-tailed, need to multiply p-value * 2 to correspond to 95% 2 -tailed if P *2>0.05 - higher chance of wrongly deciding significance
crd.data$Po <- (crd.data$P * 2) > 0.05



## ----data_frame_Localmorans----------------------------------------------
# make a data frome from the Lisa Test Data
data.for.table2 = data.frame(crd.data$Ii, crd.data$E.Ii, crd.data$Var.Ii,  crd.data$Z.Ii, crd.data$Ii.Neg, crd.data$Z.Ho, crd.data$Po, crd.data$P)


## ----table2_Lisa_LocalMorans-----------------------------------------------
data.for.table2


## ----Write_csv_Localmorans-----------------------------------------------
# write csv file with Lisa Test results for I, E, Var, Z, P
write.csv(data.for.table2, "LisaTest_MdInc.csv", row.names = FALSE)


## ----Map_Lisa--------------------------------------------------------------
# Map Lisa
map_LISA <- tm_shape(crd.data) + 
  tm_polygons(col = "Ii", 
              title = "Local Moran's i", 
              style = "jenks", 
              palette = "viridis", n = 6)  + 
  # add scale bar - first value is TOP, second value is BOTTOM
  tm_scale_bar(width = 0.22, position = c("RIGHT", "BOTTOM")) + 
  # add compass
  tm_compass(position = c("RIGHT", "TOP")) + 
  tm_layout(title = "CRD Census Tracts Median 2015 $ Income\nLocal Moran's i", title.position = c("LEFT", "TOP"), inner.margins = c(.08, .03, .08, .03))


## ----png_Localmorans-------------------------------------------------------
map_LISA

# create png of Map LISA MdInc
png("map_LISA_MdInc.png")  
map_LISA
dev.off()

## ---Scatter_Plot_mdInc-----------------------------------------------------
# Scatter Plot
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income", 
           ylab="Spatially Lagged Mean Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")


## ----png_Scatter_Plot_Localmorans----------------------------------------
# create png of Moran scatter plot
png("moran.plot_MdInc.png")  
moran.plot(crd.data$MdInc, crd.lw, zero.policy=NULL, spChk=NULL, labels=NULL, xlab="Median 2015 $ Income", 
           ylab="Spatially Lagged Median 2015 $ Income", quiet=NULL, main ="Local Moran's i Plot for CRD Median 2015 $ Income")
dev.off()

ToC

XV. References

Statistics Canada. (2017). Census Profile, 2016 Census Capital, Regional district, British Columbia. https://www12.statcan.gc.ca/ ↳
Wickham, Hadley.(2016). plyr v1.8.4. https://www.rdocumentation.org/packages/plyr/versions/1.8.4 ↳
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller. (2015). dplyr v.0.8.3. https://dplyr.tidyverse.org/ ↳
RStudio. (2019). dplyr v.0.7.0. https://github.com/rstudio/cheatsheets/blob/master/data-transformation.pdf ↳
Bivand, Roger, et.al. (2019). spdep v.1.1-3. https://cran.r-project.org/web/packages/spdep/index.html ↳
Bivand, Roger, et.al. (2019). Package 'spdep', v.1.1-3. https://cran.r-project.org/web/packages/spdep/spdep.pdf ↳
Brunsdon, Chris, & Hongyan Chen, (2015). Package 'GISTools', v..7-4. https://cran.r-project.org/web/packages/GISTools/GISTools.pdf ↳
Hijmans, Robert J., et.al. (2019). Package ‘raster’, 3.0-7. https://cran.r-project.org/web/packages/raster/raster.pdf ↳
Bivand, Roger, et.al. (2019). Package 'maptools', v.0.9-8. https://cran.r-project.org/web/packages/maptools/maptools.pdf ↳
Bivand, Roger, et.al. (2019). Package 'rgdal', v.1.4-6. https://cran.r-project.org/web/packages/rgdal/rgdal.pdf ↳
Tennekes, Martijn, et.al. (2019). Package 'tmap', v.2.3-1. https://cran.r-project.org/web/packages/tmap/tmap.pdf ↳
Rabosky, Dan, et.al. (2017). Package'BAMMtools', v.2.1.6. https://cran.r-project.org/web/packages/BAMMtools/BAMMtools.pdf ↳
Attali, Dean. (2016). shinyjs. https://deanattali.com/shinyjs/overview ↳

ToC

Built with R v.3.6.1 and R Markdown in RStudio v.1.1.463. Exported HTML edited in Sublime Text v.3.0-3143.

Analyzing Spatial Autocorrelation Using 2016 CRD Census Tracts Data: An R Tutorial

Wendy Anthony

2019/10/13

Table of Contents