Title: | Geographical Ecology and Conservation Knowledge Online |
---|---|
Description: | Includes a collection of geographical analysis functions aimed primarily at ecology and conservation science studies, allowing processing of both point and raster data. Future versions will integrate species threat datasets developed by the authors. |
Authors: | Vasco V. Branco [cre, aut] , Pedro Cardoso [aut] , Luís Correia [ctb] |
Maintainer: | Vasco V. Branco <[email protected]> |
License: | GPL-2 |
Version: | 1.0.0 |
Built: | 2024-11-17 04:56:13 UTC |
Source: | https://github.com/vascobranco/gecko |
Crop raster layers to minimum size possible and uniformize NA
values across layers.
clean(layers)
clean(layers)
layers |
SpatRaster. As defined in package terra, see |
Excludes all marginal rows and columns with only NA
values and change values to NA
if they are NA
in any of the layers.
SpatRaster. Same class as layers.
region = gecko.data("layers") terra::plot(clean(region))
region = gecko.data("layers") terra::plot(clean(region))
Create a confusion matrix for any multiclass set of predicted vs observed labels in a classification problem.
confusion.matrix(actual, predicted)
confusion.matrix(actual, predicted)
actual |
dataframe. Original labels. |
predicted |
dataframe. Predicted labels. |
data.frame. Predicted labels (rows) x Observed labels (cols).
x = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE") y = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE") confusion.matrix(x, y)
x = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE") y = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE") confusion.matrix(x, y)
Create a layer depicting eastness based on an elevation layer.
create.east(layers)
create.east(layers)
layers |
SpatRaster. A layer of elevation (a digital elevation model - DEM).
As defined in package terra, see |
Using elevation, aspect can be calculated. Yet, it is a circular variable (0 = 360) and has to be converted to northness and eastness to be useful for modelling.
SpatRaster.
region = gecko.data("layers") terra::plot(create.east(region[[3]]))
region = gecko.data("layers") terra::plot(create.east(region[[3]]))
Create a layer depicting latitude based on any other.
create.lat(layers)
create.lat(layers)
layers |
SpatRaster. As defined in package terra, see |
Using latitude (and longitude) in models may help limiting the extrapolation of the predicted area much beyond known areas.
SpatRaster.
region = gecko.data("layers") terra::plot(create.lat(region[[1]]))
region = gecko.data("layers") terra::plot(create.lat(region[[1]]))
Create a layer depicting longitude based on any other.
create.long(layers)
create.long(layers)
layers |
SpatRaster. As defined in package terra, see |
Using longitude (and latitude) in models may help limiting the extrapolation of the predicted area much beyond known areas.
SpatRaster.
region = gecko.data("layers") terra::plot(create.long(region))
region = gecko.data("layers") terra::plot(create.long(region))
Create a layer depicting northness based on an elevation layer.
create.north(layers)
create.north(layers)
layers |
SpatRaster. A layer of elevation (a digital elevation model - DEM).
As defined in package terra, see |
Using elevation, aspect can be calculated. Yet, it is a circular variable (0 = 360) and has to be converted to northness and eastness to be useful for modelling.
SpatRaster.
region = gecko.data("layers") terra::plot(create.north(region[[3]]))
region = gecko.data("layers") terra::plot(create.north(region[[3]]))
Creates a layer depicting distances to records using the minimum, average, distance to the minimum convex polygon or distance taking into account a cost surface.
distance(longlat, layers, type = "minimum")
distance(longlat, layers, type = "minimum")
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
type |
character. text string indicating whether the output should be the "minimum", "average" or "mcp" distance to all records. "mcp" means the distance to the minimum convex polygon encompassing all records. |
Using distance to records in models may help limiting the extrapolation of the predicted area much beyond known areas.
SpatRaster.
userpar <- par(no.readonly = TRUE) region = gecko.data("layers") alt = region[[3]] localities = gecko.data("records") par(mfrow=c(3,2)) terra::plot(alt) points(localities) terra::plot(distance(localities, alt)) terra::plot(distance(localities, alt, type = "average")) par(userpar)
userpar <- par(no.readonly = TRUE) region = gecko.data("layers") alt = region[[3]] localities = gecko.data("records") par(mfrow=c(3,2)) terra::plot(alt) points(localities) terra::plot(distance(localities, alt)) terra::plot(distance(localities, alt, type = "average")) par(userpar)
Load data included in the package. This includes records, a matrix of longitude and latitude (two columns) occurrence records for Hogna maderiana (Walckenaer, 1837); range, a SpatRaster object, as defined by package terra, of the geographic range of Hogna maderiana (Walckenaer, 1837); layers, a SpatRaster object with layers representing the average annual temperature, total annual precipitation, altitude and landcover for Madeira Island (Fick & Hijmans 2017, Tuanmu & Jetz 2014); threat, a layer of mean fire occurence in Madeira between 2006 and 2016; and worldborders is a simplified version of the vector of world country borders created by Victor Cazalis.
gecko.data(data = NULL)
gecko.data(data = NULL)
data |
character. String of one of the data names mentioned in the description, e.g.: |
This function is inspired by palmerpanguins::path_to_file()
which in turn is based on readxl::readxl_example()
.
## Not run: gecko.data() gecko.data("range") ## End(Not run)
## Not run: gecko.data() gecko.data("range") ## End(Not run)
Read directory where GIS files are stored.
gecko.getDir()
gecko.getDir()
Reads a txt file pointing to where the world GIS files are stored.
Setup directory where GIS files are stored.
gecko.setDir(gisPath = NULL)
gecko.setDir(gisPath = NULL)
gisPath |
Path to the directory where the gis files are stored. |
Writes a txt file in the red directory allowing the package to always access the world GIS files directory.
Download the latest version of worldclim to your gecko work directory.
If you have not yet setup a work directory, it will be be setup as if running
gecko::gecko.setDir()
with gisPath = NULL
.
This is a large dataset that is prone to fail by timeout if downloaded
through R. Instead of using this function you can run gecko.setDir() (if you
haven't yet) and download the files at
https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_30s_bio.zip or
https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_10m_bio.zip.
Unzip their contents correspondingly to the folders "./worldclim/1 km" or
"./worldclim/10 km" inside the folder returned by gecko.getDir().
gecko.worldclim(res)
gecko.worldclim(res)
res |
character. Specifies the resolution of environmental data used. |
Reads a txt file pointing to where the world GIS files are stored.
## Not run: gecko.worldclim("10 km") ## End(Not run)
## Not run: gecko.worldclim("10 km") ## End(Not run)
Identifies and moves presence records to cells with environmental values.
move(longlat, layers, buffer = 0)
move(longlat, layers, buffer = 0)
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
buffer |
numeric. Maximum distance in map units that a record will move. If 0 all |
Often records are in coastal or other areas for which no environmental data is available. This function moves such records to the closest cells with data so that no information is lost during modelling.
A matrix with new coordinate values.
region <- terra::rast(matrix(c(rep(NA,100), rep(1,100), rep(NA,100)), ncol = 15)) presences <- cbind(runif(100, 0, 0.55), runif(100, 0, 1)) terra::plot(region) points(presences) presences <- move(presences, region) terra::plot(region) points(presences)
region <- terra::rast(matrix(c(rep(NA,100), rep(1,100), rep(NA,100)), ncol = 15)) presences <- cbind(runif(100, 0, 0.55), runif(100, 0, 1)) terra::plot(region) points(presences) presences <- move(presences, region) terra::plot(region) points(presences)
Normalize a raster file according to one three methods, 'standard', 'range' or 'rank'.
normalize(layer, method = "standard", filepath = NULL)
normalize(layer, method = "standard", filepath = NULL)
layer |
SpatRaster. Object with a single layer as defined by package terra. |
method |
character. Specifying |
filepath |
character. Optional, specifies a path to the output file. |
The three options, "standard" standardizes data to a mean = 0 and sd = 1, "range" standardizes to a range of 0 to 1, and "rank" similarly standardizes to a range of 0 to 1 but does so after ranking all points.
A raster layer.
## Not run: region = gecko.data("layers")[[1]] ranked_region = normalize(region, method = "rank") ## End(Not run)
## Not run: region = gecko.data("layers")[[1]] ranked_region = normalize(region, method = "rank") ## End(Not run)
This function generates pseudo-abscences from an input data.frame containing latitude and longitude coordinates by using environmental data and then uses both presences and pseudo-absences to train a SVM model used to flag possible outliers for a given species.
outliers.detect( longlat, training = NULL, hi_res = TRUE, crop = FALSE, threshold = 0.05, method = "all" )
outliers.detect( longlat, training = NULL, hi_res = TRUE, crop = FALSE, threshold = 0.05, method = "all" )
longlat |
data.frame. With two columns containing latitude and longitude, describing the locations of a species, which may contain outliers. |
training |
data.frame. With the same formatting as |
hi_res |
logical. Specifies if 1 KM resolution environmental data should be used.
If |
crop |
logical. Indicates whether environmental data should be cropped to
an extent similar to what is given in |
threshold |
numeric. Value indicating the threshold for classifying
outliers in methods |
method |
A string specifying the outlier detection method. |
Environmental data used is WorldClim and requires a long download, see
gecko::gecko.setDir()
This function is heavily based on the methods described in Liu et al. (2017).
There the authors describe SVM_pdSDM, a pseudo-SDM method similar to a
two-class presence only SVM that is capable of using pseudo-absence points,
implemented with the ksvm function in the R package kernlab.
It is suggested that, for each set of "n"
occurence
records, "2 * n"
pseudo-absences points are generated.
Whilst using it keep in mind works highlighting limitations such as such as
Meynard et al. (2019). See References section.
list if method = "all"
, containing whether or not a given point
was classified as TRUE
or FALSE
along with the confusion matrix
for the training data. If method = "geo"
or
method = "env"
a data.frame is returned.
Liu, C., White, M. and Newell, G. (2017) ‘Detecting outliers in species distribution data’, Journal of Biogeography, 45(1), pp. 164–176. doi:10.1111/jbi.13122.
Meynard, C.N., Kaplan, D.M. and Leroy, B. (2019) ‘Detecting outliers in species distribution data: Some caveats and clarifications on a virtual species study’, Journal of Biogeography, 46(9), pp. 2141–2144. doi:10.1111/jbi.13626.
## Not run: new_occurences = gecko.data("records") old_occurences = data.frame(X = runif(10, -17.1, -17.05), Y = runif(10, 32.73, 32.76)) outliers.detect(new_occurences, old_occurences) ## End(Not run)
## Not run: new_occurences = gecko.data("records") old_occurences = data.frame(X = runif(10, -17.1, -17.05), Y = runif(10, 32.73, 32.76)) outliers.detect(new_occurences, old_occurences) ## End(Not run)
Draws plots of sites in geographical (longlat) and environmental (2-axis PCA) space.
outliers.visualize(longlat, layers)
outliers.visualize(longlat, layers)
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
Erroneous data sources or errors in transcriptions may introduce outliers that can be easily detected by looking at simple graphs of geographical or environmental space.
data.frame. Contains coordinate values and distance to centroid in pca. Two plots are drawn for visual inspection. The environmental plot includes row numbers for easy identification of possible outliers.
localities = gecko.data("records") region = gecko.data("layers") outliers.visualize(localities, region[[1:3]])
localities = gecko.data("records") region = gecko.data("layers") outliers.visualize(localities, region[[1:3]])
Calculate the performance of a model through a comparison
between predicted and observed labels. Available metrics are accuracy
,
F1
and TSS
.
performance.metrics(actual, predicted, metric)
performance.metrics(actual, predicted, metric)
actual |
dataframe. Same formatting as |
predicted |
dataframe. Same formatting as |
metric |
character. String specifying the metric used, one of |
The F-score or F-measure (F1) is: , with
Accuracy is:
The Pierce's skill score (PSS), Bookmaker's Informedness (BM) or True Skill Statistic (TSS) is: ,
with being the True Positive Rate, positives correctly labelled
as such and
, the True Negative Rate, the rate of negatives correctly
labelled, such that:
Take in consideration the fact that the F1 score is not a robust metric in datasets with class imbalances.
numeric.
PSS: Peirce, C. S. (1884). The numerical measure of the success of predictions. Science, 4, 453–454.
observed = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE") predicted = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE") performance.metrics(observed, predicted, "TSS")
observed = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE") predicted = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE") performance.metrics(observed, predicted, "TSS")
Reduce the number of layers by either performing a PCA on them or by eliminating highly correlated ones.
reduce(layers, method = "pca", n = NULL, thres = NULL)
reduce(layers, method = "pca", n = NULL, thres = NULL)
layers |
SpatRaster. As defined in package terra, see |
method |
character. Either Principal Components Analysis ("pca", default) or Pearson's correlation ("cor"). |
n |
numeric. Number of layers to reduce to. |
thres |
numeric. Value for pairwise Pearson's correlation above which one of the layers (randomly selected) is eliminated. |
Using a large number of explanatory variables in models with few records may lead to overfitting. This function allows to avoid it as much as possible. If both n and thres are given, n has priority. If method is not recognized and layers come from read function, only landcover is reduced by using only the dominating landuse of each cell.
SpatRaster.
Downloads SPECTRE segments according to a bounding box selection.
spectre.area( index, ext = c(-180, 180, -60, 90), normalize = FALSE, filepath = NULL )
spectre.area( index, ext = c(-180, 180, -60, 90), normalize = FALSE, filepath = NULL )
index |
numeric. A vector of integers specifying the layers. Refer to the list. |
ext |
numeric or SpatExtent. A vector of |
normalize |
character or logical. Either logical on whether data should be normalized
for the given interval or a character specifying a type of normalization. Type
default to "standard". Check |
filepath |
character. An optional user defined path for the final output. If |
SpatRaster.
## Not run: regional_threats = spectre.area(3, terra::ext(-17.3,-16.6,32.6,32.9), normalize = FALSE) terra::plot(regional_threats[[1]], main = "Human Density") ## End(Not run)
## Not run: regional_threats = spectre.area(3, terra::ext(-17.3,-16.6,32.6,32.9), normalize = FALSE) terra::plot(regional_threats[[1]], main = "Human Density") ## End(Not run)
Generate in-text citations for a selection of SPECTRE layers.
spectre.citations(index)
spectre.citations(index)
index |
numeric. A vector of integers specifying the layers. Refer to the Details section. |
The current layers in SPECTRE are:
MINING_AREA. Mining density based on the number of known mining properties (pre-operational, operational, and closed) in a 50-cell radius (1x1 km cells).
HAZARD_POTENTIAL. Number of significant hazards (earthquakes, volcanoes, landslides, floods, drought, cyclones) potentially affecting cells based on hazard frequency data.
HUMAN_DENSITY Continuous metric of population density.
BUILT_AREA Percentage metric indicating the built-up presence.
ROAD_DENSITY. Continuous metric of road density.
FOOTPRINT_PERC. Percentage metric indicating anthropogenic impacts on the environment.
IMPACT_AREA. Classification of land into very low impact areas (1), low impact areas (2) and non-low impact areas (3).
MODIF_AREA. Continuous 0-1 metric that reflects the proportion of a landscape that has been modified.
HUMAN_BIOMES. Classification of land cover into different anthropogenic biomes of differing pressure such as dense settlements, villages and cropland.
FIRE_OCCUR. Continuous metric of mean fire occurrence during the years of 2006 and 2016.
CROP_PERC_UNI. Percentage metric indicating the proportion of cropland in each cell.
CROP_PERC_IIASA. Percentage metric indicating the proportion of cropland in each cell.
LIVESTOCK_MASS. Estimated total amount of livestock wet biomass based on global livestock head counts.
FOREST_LOSS_PERC. Continuous -100 to 100 metric of forest tree cover loss between 2007 and 2017.
FOREST_TREND. Classification metric of 0 (no loss) or a discrete value from 1 to 17, representing loss (a stand-replacement disturbance or change from a forest to non-forest state) detected primarily in the year 2001-2019, respectively.
NPPCARBON_GRAM. Quantity of carbon needed to derive food and fiber products (HANPP).
NPPCARBON_PERC. HANNP as a percentage of local Net Primary Productivity.
LIGHT_MCDM2. Continuous simulated zenith radiance data.
FERTILIZER_LGHA. Continuous metric of kilograms of fertilizer used per hectare.
TEMP_TRENDS. Continuous metric of temperature trends, based on the linear regression coefficients of mean monthly temperature for the years of 1950 to 2019.
TEMP_SIGNIF. Continuous metric of temperature trend significance, the temperature trends divided by its standard error.
CLIM_EXTREME. Continuous metric calculated as whatever is the largest of the absolute of the trend coefficients of the months with the lowest or highest mean temperatures.
CLIM_VELOCITY. Continuous metric of the velocity of climate change, the ratio between TEMP_TRENDS and a local spatial gradient in mean temperature calculated as the slope of a plane fitted to the values of a 3x3 cell neighbourhood centered on each pixel.
ARIDITY_TREND. Continuous metric of aridity trends, based on the linear regression coefficients of aridity for the years of 1990 to 2019, i.e: MPET/(MPRE+1).
list. Contains two elements, both characters: the first a single
character containing the in-text citations, the second a character of
length x
with the bibliographic citations.
sources = c(2,3) out = spectre.citations(sources)
sources = c(2,3) out = spectre.citations(sources)
Downloads SPECTRE layer data according to a selection of points.
spectre.points(index, points)
spectre.points(index, points)
index |
numeric. A vector of integers specifying the layers. Refer to the documentation of
|
points |
data.frame or matrix. Containing point data coordinates, organized in longitude, latitude (longlat). |
data.frame or matrix. Contains both the points given as well as their respective values for each layer specified.
## Not run: localities = gecko.data("records") local_threats = spectre.points(c(2,3), localities) ## End(Not run)
## Not run: localities = gecko.data("records") local_threats = spectre.points(c(2,3), localities) ## End(Not run)
Download the raster template for SPECTRE layers to your gecko work directory.
If you have not yet setup a work directory, it will be be setup as if running
gecko::gecko.setDir()
with gisPath = NULL
.
This is a large dataset that is prone to fail by timeout if downloaded
through R. Instead of using this function you can run gecko.setDir() (if you
haven't yet) and download the file at
https://github.com/VascoBranco/spectre.content/raw/main/spectre.template.zip.
Unzip its contents to a folder "./spectretemplate" inside the folder returned by gecko.getDir().
spectre.template()
spectre.template()
Reads a txt file pointing to where the world GIS files are stored.
## Not run: spectre.template() ## End(Not run)
## Not run: spectre.template() ## End(Not run)
Transform a given raster object to the resolution, datum, projection and extent used in SPECTRE.
spectrify(layers, continuous = TRUE, filepath = NULL)
spectrify(layers, continuous = TRUE, filepath = NULL)
layers |
SpatRaster. A raster object that you would like to be SPECTRE compatible. |
continuous |
logical. Whether the data present in |
filepath |
character. Optional file path to where the final raster layer
should be saved, in the format "folder/file.tif". If |
SpatRaster.
## Not run: # For the sake of demonstration we will transform our raster layer "range". distribution = gecko.data("range") standard_dist = spectrify(distribution) terra::plot(standard_dist) ## End(Not run)
## Not run: # For the sake of demonstration we will transform our raster layer "range". distribution = gecko.data("range") standard_dist = spectrify(distribution) terra::plot(standard_dist) ## End(Not run)
Split a dataset for model training while keeping class representativity.
splitDataset(data, proportion)
splitDataset(data, proportion)
data |
dataframe. Containg some sort of classification data. The last column must contain the label data. |
proportion |
numeric. A value between 0 a 1 determining the proportion of the dataset split between training and testing. |
list. First element is the train data, second element is the test data.
# Binary label case my_data = data.frame(X = runif(20), Y = runif(20), Z = runif(20), Label = c(rep("presence", 10), rep("outlier", 10)) ) splitDataset(my_data, 0.8) # Multi label case my_data = data.frame(X = runif(60), Y = runif(60), Z = runif(60), Label = c(rep("A", 20), rep("B", 30), rep("C", 10)) ) splitDataset(my_data, 0.8)
# Binary label case my_data = data.frame(X = runif(20), Y = runif(20), Z = runif(20), Label = c(rep("presence", 10), rep("outlier", 10)) ) splitDataset(my_data, 0.8) # Multi label case my_data = data.frame(X = runif(60), Y = runif(60), Z = runif(60), Label = c(rep("A", 20), rep("B", 30), rep("C", 10)) ) splitDataset(my_data, 0.8)
Return a set of descriptive statistics of the given layer, either a specific one (minimum, q1, median, q3, maximum, median absolute deviation (mad), mean, standard deviation (sd)) or all of them.
stats(layer, plot = FALSE)
stats(layer, plot = FALSE)
layer |
SpatRaster. Raster object, as defined by package terra, with a single layer. |
plot |
logical. If TRUE, a histogram of raster values is drawn. |
data.frame. If plot is TRUE, also outputs a histogram of the layer.
region = gecko.data("layers") stats(region[[1]])
region = gecko.data("layers") stats(region[[1]])
Thinning of records with minimum distances either absolute or relative to the species range.
thin(longlat, distance = 0.01, relative = TRUE, runs = 100)
thin(longlat, distance = 0.01, relative = TRUE, runs = 100)
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
distance |
numeric. Distance either in relative terms (proportion of maximum distance between any two records) or in raster units. |
relative |
logical. If |
runs |
numeric. Number of runs |
Clumped distribution records due to ease of accessibility of sites, emphasis of sampling on certain areas in the past, etc. may bias species distribution models. The algorithm used here eliminates records closer than a given distance to any other record. The choice of records to eliminate is random, so a number of runs are made and the one keeping more of the original records is chosen.
A matrix of species occurrence records separated by at least the given distance.
userpar <- par(no.readonly = TRUE) occ_points <- matrix(sample(100), ncol = 2) par(mfrow=c(1,2)) graphics::plot(occ_points) occ_points <- thin(occ_points, 0.1) graphics::plot(occ_points) par(userpar)
userpar <- par(no.readonly = TRUE) occ_points <- matrix(sample(100), ncol = 2) par(mfrow=c(1,2)) graphics::plot(occ_points) occ_points <- thin(occ_points, 0.1) graphics::plot(occ_points) par(userpar)