PM2.5 Air Quality Analysis for South California

Introduction

Deliverable: Provide a scientific report as a web page using the following writing guidelines:

"you have been hired by an environmental consultant agency to assist with a project in describing air pollution problems in Southern California." your responsibility is to make recommendations on how to develop a continuous surface of particulate matter." "conduct a study to evaluate different spatial interpolation methods, and then provide a report on your findings."

"Using the air pollution data on particulate matter, provide a critical analysis of four spatial interpolation methods: thiessen polygons, inverse distance weighting, polynomial trends, and kriging. using Southern California air basin as study site."

Paragraph "explaining problems with air pollution in urban areas." cite 3. "conclude by explaining why we need to develop continuous surfaces of air pollution."

paragraph "explaining geostatistics and how they can be used in the context of your study." cite 3 to provide context (different from above paragraph). "Describe that there are several methods, each having different parameters that can influence the outcome of the surface interpolation." "explain the necessity of this analysis in which you will evaluate different methods and their parameters in order to provide recommendations about the appropriate method and its details."

Audience: scientific community without background to use these specific methods ... explain & demonstrate use; no unnecessary info; looking for facts & results, not opinions or bias

Tone: should attract audience; formal, active tone, informative; PAST TENSE - experiment already done; no personal 'I' 'you' pronouns - don't assume you know who reader is; simple language; use 3rd person - 1st too informal, 2nd no distance between writer & reader; be clear & direct & readable, concise; cite facts; don't generalize; purpose for writing paper/conducting study;

Context: general problem - accessible & interesting; why is study needed - what have other researchers done [cite]; justify why to do this study [cite background knowledge - credit & credibility]

ID Problem Statement / Research Objective: a clear statement - goal/objective is . . . describe how . . . purpose for experiment; hypothesis of end result; summary of achievement; propose an arguement - support with evidence - Citations; create outline with key points not to miss

Intro Marking Matrix

Description of pollution in urban areas with citations [5]
Description of the use of spatial interpolation to address topic [5]
Research objective and describing the goal of the study [5]

WHAT IS PM2.5 . . . is fine particulate matter

"Therefore, the objective this research is to....".

Goal to evaluate 4 different interpolation methods to assess which would be best to use to evaluate the PM2.5 fine particulate matter air pollution in the South California Air Basin.

Citation formatting

[Xie 2017].
(Xie, Semanjski, Gautama, Tsiligianni, Deligiannis, Rajan, Pasveer and Philips, 2017).
(Xie, et.al., 2017).

[Wong 2004].
(Wong, Yuan, Perlin, 2004).
(Wong, 2004).

[Tobler 1970]
(Tobler, 1970).
"I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things."

[Li 2014]
(Li, Gong, Jieping, 2014)

[Li 2008]
(Li, Heap, 2008)

[Li 2016]
(Li, Zhou, Kalo, and Piltner, 2016)

Top

Study Site: Southern California

Write a paragraph explaining the study site (area, average temperature, pollution levels, etc.). This paragraph should make the reader familiar with the study area.

Provide a corresponding study area map with the necessary map elements. This map should be created in R.

California is divided into 15 different air basins, with a total of 296 monitoring stations. The study area of Southern California . . . (see figure 1).

https://ww3.arb.ca.gov/desig/airbasins/60100-60114.pdf

Describe study site, why is research important at this site?

Citation formatting
[#StudySite1].
[#StudySite2].

Figure 1. Map of California Air Basin Areas and Air Monitoring Stations. The Air Basin Area polygons are outlined in light grey, while the study area of Southern California South Coast Air Basin is fill-coloured in pink, and each of the Air Monitoring Stations are noted with a red dot.

Top

Data

Write a paragraph explaining /describe your datasets, including a sentence for each of: where / when they were collected, type of format, their projections, and any other relevant metadata; any formatting necessary before analysis? Reproducible?

Dataset 1 : The csv file M25HR_PICKDATA_2016-4-30.csv contains tabular data from all 296 of California's Air Monitoring Station Sites, for a 24 hour period of hourly data on April 30, 2016, measuring the value of the PM2.5 fine particulate matter in Micrograms/Cubic Meter (ug/m³).

Dataset 2: The Active Air Monitoring Stations from 2002-2004, in the shape file CaAirBasin.shp, includes the data points of all the air monitoring stations, including data for Longitude and Latitude coordinates, ADAMSITEID, AIRBASIN, AIRDISTRICT, SITENAME, SITEADDRESS, ZIPCODE, COUNTY, X & Y Coordinates. The projection for this file datum is NAD27.

California Air Resources Board. (2004). Air Monitoring Stations. [shp file].

airmonitoringstations.shp.xml

Dataset 3: The California Air Basin boundary polygons of all 15 Air Basins are included in the airmonitoringstations.shp shape file. The projection for this file is Transverse Mercator, with a GRS80 Ellipsoid.

CaAirBasin.shp.xml

Top

Methods

Describe each method used for spatial interpolation (in chronological order). Provide enough information so that someone else can replicate your methods. Again, do not make this software-specific. Describe how the methods work and include formulas where necessary, and provide details on the different parameters that you will be testing, including the parameter values that you will test. Feel free to use sub-headings to divide the methods.

Reproducible? without being to specific to software (software can be mentioned at end of methods section)

Methods Marking Matrix

Study Site and data description [5]
Description of methods used [5]
Description of testing procedures [5]

Data Preparation

The three datasets were prepared for the analysis by aggregating the mean PM25 values, subsetting the areas to the South California Air Basin area, and creating a common projection, before mapping the data points on the subset base map.

Dataset 1: PM 2.5 values are hourly for one day, the 30th April, 2016. In order to calculate both the mean and max PM2.5 values, all the hours of the day for each of the Air Monitoring Station sites are aggregated together. To add these new mean and max values to their respective monitoring stations they will be merged based on a common column name for site.

Dataset 2: Active Air Monitoring Stations are extracted from the Air Basins column for only the South Coast Air Basin. The location coordinates for the South Coast Air Basin subset were transformed to NAD83 UTM Zone 10 (epsg:26911). As the air monitoring stations site column is different than the air basins spatial data, that column will need to be renamed to be the same name: site, before the PM2.5 mean and max values can be merged with their monitoring station's data. Any na (not a numerical value) will also need to be omitted, as calculations cannot be done on any non-numerical values.

Dataset 3: California Air Basin spatial dataset. The data are extracted from the Air Basins column for only the South Coast Air Basin. The location coordinates for the South Coast Air Basin subset were transformed to NAD83 UTM Zone 10 (epsg:26911).

I. Thiessen Polygons Spatial Interpolation Method

A dirichlet tessellated surface is created by covering the whole study area with common-sided polygons, each of which contains one of the mean PM25 values, which is closer to all of the other values within the polygon than to any other point outside the polygon. During the tessellation process, both the projection information and the mean PM2.5 values will not be transferred, so the projection will be borrowed from the original mean data and the tessellated surface will be reprojected, and the mean PM2.5 attribute values will be added back with a spatial join. The tessellated surface will then need to be clipped to the boundaries of the South California Air Basin raster file, to include only the spatial and data attributes for this one air basin, and a final map of the thiessen polygon spatial interpolation is created.

II. Inverse Distance Weight (IDW) Spatial Interpolation Method

A grid is created for the area of the spatial data frame, which forms the basis of the size of the neighbourhood. This grid size variable will be changed to determine what the optimal neighbourhood size is. The same projection from the South California Air Monitoring stations is used for the newly created empty grid cells dataframe. The idp power exponent value is also changed to experiment with the variable to see how the smoothing function changes, while trying to reduce the root mean square error to a minimal value. A map of the predicted values of PM2.5 is created using the neighbourhood grid cell size and power exponent from IDW.

A validation routine is performed by leaving out one of the values, and comparing the observed value to what the surface has predicted the value would be. A plot is made of the differences between the observed and predicted values.

A method to create a 95% confidence interval around each of the predicted values of the unsampled points is called the Jackknife technique. A map of these confidence intervals is created.

IWD Spatial Interpolation Formula
Ẑ_i = ∑ⁿ_j=1 z_j / d_ij^p
__________
∑ⁿ_j=1 1 / d_ij^p

III. Polynomials Spatial Interpolation Method

In order to determine any trends, two different orders of the polynomial equation are applied to the mean PM2.5 data points.

. . .

IV. Kriging Spatial Interpolation Method

A Kriging spatial interpolation method is applied to each of the two different orders from the polynomial equation. A variogram is fit to the data, changing the values for the sill, nugget and range, as well as the model (spherical, exponential or guassian), in order to find the best fit. The kriging surface was clipped to the South California Air Basin, before mapping the predicted values, the variance, and the 95% confidence interval.

Maps were created for the predicted PM2.5 value, the variance and the confidence interval for each of the different polynomial orders.

. . .

Kriging Spatial Interpolation Formula
Ẑ_i = w₁z₁ + w₂z₂ + w₃z₃ + w₄z₄ + . . . + w_nz_n = ∑ⁿ_j=1 w_jz_j

Top

Results

Provide the results from the spatial interpolation methods. Provide all plots and maps, and interpret them so that the reader understands what they are showing. However, do not make any conclusions at this point about the appropriateness of the method for creating a SWE surface. Organize the results in a way that is professional, well-organized, and easy to interpret.

in same chronological order as methods section; describe info for interpreting results, but don't yet talk about importance of results

Figures: supply all figures to explain methods results - use only what's necessary; describe each figure in a sentence - anything helpful e.g. graph axis, different parameters used for different results; don't repeat what figure shows

Results Marking Matrix

Results from each method [10]
Description of results [10]

Data Preparation

. . . (see figure 2).

Map of Southern California Air Basin Area and Air Monitoring Stations, with PM2.5 data.

Figure 2. Map of Southern California Air Basin Area and Air Monitoring Stations. All three data sets are vizualized here, with a base map of the Southern California Air Basin Area, and each Air Monitoring Station is classified by the values of the PM2.5 data, in Micrograms/Cubic Meter (ug/m³). The PM2.5 data classification is from a low, light pink value of 6-8 ug/m³ to a high, dark red value of 14-16 ug/m³.

I. Thiessen Polygons

Thiessen Polygons are an example of a global interpolation method.

The greater the number of grid cells, the less area is included in each of the grid cells, resulting in smaller neighbourhoods

. . .

Figure 3. The study area before clipping the spatial extent to the South California Air Basin boundaries.

Figure 4. After clipping the spatial extent to the South California Air Basin boundaries, the Thiessen Polygons have been formed around the mean data points in the study area.

Figure 5. The Thiessen Polygons have been classified by the value of the PM2.5, with the smallest values in the light pink, and the highest values in the dark red.

II. Inverse Distance Weight (IDW)

IDW is an example of a global interpolation method.

When a grid is used to create the area of a spatial dataframe, the more the number of grid cells used, the smaller the neighbouhood area that is located within each of the grid cells.

The power exponent values define how the weight changes over distance.

Smaller neighbourhoods take less computer power to search for predicted values.

When different neighbourhood sizes are used, with the same power exponent value, the larger neighbourhood sizes will make the data appear more pixelated, and the smaller neighbourhood sizes will make the data appear smoother.

When the same neighbourhood sizes are used, the one with the lowest power exponent value will show less definition in areas of higher values.

The 95% confidence interval is plotted for the range that each estimated unknown value would be located. More confidence is related to the lighter colours, and more uncertainty is related to the darker colours.

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp7.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp5.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp3.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp1.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp0.1.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp0.01.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp0.001.png. .

Figure 2. plot_IWD_Observed_Predicted_PM25_n5000_idp0.0001.png. .

Figure 2. . .

Table 1. Minimum Square Root Error (MSRE). MSRE are compared at different idp Power exponents for different IDW weighting trials. The point is to have the smallest error.

III. Polynomials

Polynomials is an example of a global interpolation method.

. . .

Figure 2. 1st order Polynomial Interpolation Method Results. .

Figure 2. 2nd order Polynomial Interpolation Method Results. .

IV. Kriging

Kriging is a global interpolation method.

. . .

Figure 2. . .

Top

Discussion

Write a very brief paragraph on the general (very general) findings of the study.

Write a brief paragraph to explain both the benefits and drawbacks of the surfaces created by each interpolation method. Refer to your figures from the results section to help your discussion. You should provide a different paragraph for each method.

Write a paragraph explain which method you selected to use for estimating a pollution surface dataset and your reason for selecting it. You need to be convincing and demonstrate that you have made an educated decision.

Describe your findings in the context of the existing literature. Using at least three peer-reviewed papers, explain how your decision aligns or conflicts with what others have done regarding spatial interpolation for creating SWE or similar datasets.

importance of results; most important section - show reader objective achieved / research question answered; "The results from this research have shown that...."; 'a single, clear, message'; After statement, give specifics - how results contribute to message; connect to results; refer to figures & tables; one paragraph per result; explain results in relation to other researchers findings: consistent with other results? - CITE other result findings; if not, why different? [CITE]; any new knowledge provided?

Shortcomings: positive, constructive, explain what to avoid if others do this; recommend how to overcome issue

Final Summary Sentence: why was study performed? what was found? what was contributed? - Main message of results

Discussion Marking Matrix

General findings from study [5]
Findings related to each method [20]
Justification of most appropriate method for this specific application [5]
Describe findings in context of the literature [10]

Citation formatting
[#Discuss1].
[#Discuss2]
[#Discuss3]

Data Preparation

. . .

I. Thiessen Polygons

. . .

II. Inverse Distance Weight (IDW)

. . .

III. Polynomials

Trends can depend on the time period that sample data is acquired, as well as any spatial changes that may produce a location-related trend.

Higher order polynomials show variability trends more for local areas. The best polynomial order was chosen first, before doing the kriging interpolation method.

. . .

IV. Kriging

Kriging interpolation is a global method that may not be appropriate to use if there are local trends that influence the observed and expected results. Kriging removed the influence of any trends.

. . .

Top

References

List of References Marking Matrix

List of references [5] - no websites; use peer-reviewed, credible references
Cite 3 (explaining problems with air pollution in urban areas) + Cite 3 (explaining geostatistics and how they can be used in the context of your study) in Intro
Cite 3 (Describe your findings in the context of the existing literature. Using at least three peer-reviewed papers, explain how your decision aligns or conflicts with what others have done regarding spatial interpolation for creating SWE or similar datasets) in Discussion

Xie 2017 Xie, Xingzhe, Ivana Semanjski, Sidharta Gautama, Evaggelia Tsiligianni, Nikos Deligiannis, Raj Thilak Rajan, Frank Pasveer and Wilfried Philips. (2017). A Review of Urban Air Pollution Monitoring and Exposure Assessment Methods. In International Journal of GeoInformation, 2017, 6, 389. [pdf]. Retrieved 2019-10-17 from https://www.mdpi.com/2220-9964/6/12/389/pdf
(Xie, Semanjski, Gautama, Tsiligianni, Deligiannis, Rajan, Pasveer and Philips, 2017).
(Xie, et.al., 2017).

Wong 2004. Wong, David W., Lester Yuan, Susan A. Perlin. (2004). Comparison of spatial interpolation methods for the estimation of air quality data. [pdf]. Retrieved 2019-10-26 from https://www.nature.com/articles/7500338
(Wong, Yuan, Perlin, 2004).
(Wong, 2004).

Tobler 1970 Tobler W. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2): 234-240. [book]. Retrieved 2019-10-30 from https://www.jstor.org/stable/i207818
(Tobler, 1970).
"I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things."

Li 2014. Li, Longxiang, Jianhua Gong, Jieping Zhou. (2014). Spatial Interpolation of Fine Particulate Matter Concentrations Using the Shortest Wind-Field Path Distance. [pdf]. Retrieved 2019-10-28 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4010455/pdf/pone.0096111.pdf
(Li, Gong, Jieping, 2014)

Li 2008. Li, Jin, and Andrew D. Heap. (2008). A Review of Spatial Interpolation Methods for Environmental Scientists. [pdf]. Retrieved 2019-10-30 from https://www.researchgate.net/profile/Jin_Li32/publication/246546630_A_Review_of_Spatial_Interpolation_Methods_for_Environmental_Scientists
(Li, Heap, 2008)

Li 2016. Li, Lixin, Xiaolu Zhou, Marc Kalo, and Reinhard Piltner. (2016). Spatiotemporal Interpolation Methods for the Application of Estimating Population Exposure to Fine Particulate Matter in the Contiguous U.S. and a Real-Time Web Application. [pdf]. Retrieved 2019-10-28 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4997435/
(Li, Zhou, Kalo, and Piltner, 2016)

#StudySite1. . (). . []. Retrieved 2019-10-26 from

#StudySite2. . (). . []. Retrieved 2019-10-26 from

#Discuss1. . (). . []. Retrieved 2019-10-26 from

#Discuss2. . (). . []. Retrieved 2019-10-26 from

#Discuss3. . (). . []. Retrieved 2019-10-26 from

Presentation Marking Matrix

Spelling, grammar, and overall presentation [5]

Top