Apr 30, 2016

A couple of months ago, ESRI released a bridge library to connect ArcGIS and R. This library was developed with the purpose of facilitating management and processing of ArcGIS data for R users, and at the same time making easier for ArcGIS users to incorporate all the power of R analysis tools into their workflows.

This bridge library sounds quite promising so I’ve written a brief tutorial to learn, and also to test, the capabilities of this bridge between two of the leading players in the fields of GIS and data analysis. Below I describe how to install the required library and how to create and execute a tool that combines Arc datasets and functionalities from R packages for addressing species distribution modeling within ArcGIS environment. Let’s start!


Install the R ArcGIS Bridge

First, go to https://r-arcgis.github.io/ and enter into r-bridge-install. Click the ‘Download ZIP’ button to download the repository content into the ‘r-bridge-install-master.zip’ file and then unzip it in a local folder.

Start ArcMap (for ArcGIS 10.3.1 or later) as administrator and use the Catalog window to navigate to the folder where the unzipped file is located. You must see the R Integration Python Toolbox with four scripts in it. Double click the ‘Install R bindings’ script and then just click OK to run it. This script downloads and installs the latest version of an R package called ‘arcgisbinding’ which provides classes and functions for importing, managing and exporting ArcGIS datasets in R. To check that the R ArcGIS bridge has been successfully installed, run the ‘R Version’ and ‘R Installation Details’ scripts.

If you open R or RStudio you will find that the ‘arcgisbinding’ package has been installed to your R library. Note that the R ArcGIS bridge can also be installed using ArcGIS Pro 1.1 (or later).

The video below shows the step-by-step installation of the R ArcGIS bridge:


Also note that, alternatively, you can download the Windows binary zip file of the ‘arcgisbinding’ package and install it into R manually.

Create an R script

We are going to follow the templates in the ‘scripts’ folder in the r-sample-tools repository to create an script for species distribution modeling using functionality provided by the ‘arcgisbinding’ and the dismo R packages. First, we need to load the required packages and read the input and output parameters:

tool_exec <- function(in_params, out_params)
{
  if (!requireNamespace("dismo", quietly = TRUE))
    install.packages("dismo")
  if (!requireNamespace("raster", quietly = TRUE))
    install.packages("raster")
  require(dismo)
  require(raster)
  
  occurrence_dataset = in_params[[1]]
  continuous_rasters_folder = in_params[[2]]
  biome_raster = in_params[[3]]
  model = in_params[[4]]
  
  out_raster = out_params[[1]]
  out_table = out_params[[2]]
  out_shp = out_params[[3]]


Next, we are going to open the input (points) shapefile dataset (e.g., occurrence data of the studied species) using the arc.open function from the ‘arcgisbinding’ package, and convert the resulting dataset to an SpatialPointsDataFrame object using arc.select and arc.data2sp:

  d <- arc.open(occurrence_dataset)
   occurrence <- arc.data2sp(arc.select(d))


Then we’ll proceed to read the raster files. As arc.open does not open raster datasets yet (up to version 1.0.0.118 of ‘arcgisbinding’), I’m following the next work-around for this tutorial: a set of continuous rasters files (e.g., temperature, precipitation) are read from a folder that contains all of them, while a categorical raster (e.g., biome) is read from a separate file.

  # read the continuous raster files from a folder
  rfiles1 <- list.files(path = continuous_rasters_folder, full.names = TRUE)  
  rasters1 <- stack(rfiles1[-grep(".aux", rfiles1)])

  # read the categorical raster (biome) from a file 
  raster2 <- raster(gsub("/", "\\\\", biome_raster))


Next we need to create a RasterStack object with all the continuous and categorical raster layers that are going to be used as predictors in the model, and then extract the values from the RasterStack at the (point) locations of the species occurrence:

  predictors <- stack(rasters1, raster2)
   presvals <- as.data.frame(extract(predictors, occurrence))


There are several models that can be implemented for species distribution modeling. For the ‘bioclim’ model, for example, we can use the following code:

  if(model == "bioclim"){
    fitmodel <- bioclim(subset(presvals, select = -c(biome)))
    p <- predict(predictors, fitmodel)
  }


For the implementation of ‘domain’, ‘glm’ (generalized linear models) and ‘mahal’ (Mahalanobis) models, see the complete script in this link. Other modeling methods such as MaxEnt or randomForests can also be easily implemented in this script.

Finally, we’ll use arc.write to export an output table and a shapefile with data resulting from the ‘extract’ step above. To export the predicted species distribution raster object, we’ll have to use writeRaster from the ‘raster’ package:

  if (!is.null(out_raster) && out_raster != "NA")
    writeRaster(p, out_raster)
  if (!is.null(out_table) && out_table != "NA")
    arc.write(out_table, presvals)
  if (!is.null(out_shp) && out_shp != "NA")
    arc.write(out_shp, presvals, coords = coordinates(occurrence), shape_info = arc.shapeinfo(d))
  return(out_params)
}


The script above shows the usage of some of the R ArcGIS bridge functions available for loading and managing datasets between R and ArcGIS, such as arc.open, arc.select, arc.data2sp, arc.shapeinfo and arc.write. For info about additional functionalities, please refer to the ‘arcgisbinding’ package documentation.

Create an ArcGIS toolbox

Now let’s create a toolbox in ArcGIS for an interactive execution of our script. In the Catalog window, navigate to ‘Toolboxes’ - ‘My Toolboxes’ and right-click ‘My Toolboxes’. Go to ‘New’, click ‘Toolbox’ and provide a name for your new toolbox.

Then right-click the new toolbox, and go to ‘Add’ - ‘Script…’. In the ‘Add Script’ windows, provide a name and a label, and select the script we created previously.

In the last ‘Add Script’ window, we are required to provide the properties of the input and output parameters of our script. In this case, we have four inputs and three outputs. Enter the ‘Display Name’ and the ‘Data Type’ properties of each parameter as shown in the picture below. Also be sure to define the ‘Direction’ property of each parameter (i.e., ‘Input’ or ‘Output’) appropriately. When done, click ‘Finish’.


Load the data and run the R ArcGIS tool

For this tutorial we are going to use sample data consisting of a set of eight continuous raster files (i.e., bioclimatic variables from the WorldClim database, including temperature and precipitation), one categorical raster file (corresponding to terrestrial biomes data from WWF), and a points shapefile containing occurrence records (presence) of three-toed sloths (Bradypus sp.), a tree-living mammal species found in South and Central America. This shapefile was generated from data installed with the ‘dismo’ package. The goal of this exercise will be to model the spatial distribution of Bradypus sp. using climate and biome data as predictors.

After loading the data in ArcMap, go to the previously created toolbox in the Catalog window, right-click the script and click ‘Open…’ (or just double-click it). In the open interface, select the occurrence dataset (i.e. the bradypus feature layer), the folder containing the continuous rasters, and the biome raster file. Then enter the model you want to run and provide the parameters for the output files (i.e., the predicted raster, and the table and shapefile outputs). Execute the script by clicking OK. You should find the expected output files in the paths you defined previously.

That’s it! For more info, see the creation and execution of the R ArcGIS species distribution modeling tool in the video below:


Final remarks

The R ArcGIS bridge offers several interesting features that enable the integration between ArcGIS and the R language. First, package installation can be conducted from the R ArcGIS tool, eliminating additional work for users. A second attractive functionality is the ability of producing several outputs at once, which improves the efficiency of these tools by allowing the creation of several tables, graphs and datasets through the same script. Additionally, debugging the script is facilitated by errors being printed and shown in the typical Arc tools execution window.

Perhaps the main drawback of the R ArcGIS bridge to date is the lack of native support for opening and writing raster datasets through arc.open and arc.write respectively, although there are work-arounds as the one shown above. This is a much needed feature that I expect developers at ESRI incorporate in future versions of the ‘arcgisbinding’ package. On the other hand, creating a graphic interface for an R script in QGIS is much simpler than in ArcGIS, but Arc toolboxes are not complicated to create and offer a great flexibility.

To learn more about species distribution modeling and the models implemented in the script, I invite you to read the Species distribution modeling with R vignette from the ‘dismo’ package. You can find the complete R script for this post in this link and download the toolbox and the datasets used in this tutorial from this GitHub repository.

The ‘arcgisbinding’ package is still in beta release and some issues may be found, so let me know about your experience with the R ArcGIS bridge if you try this tutorial. Have fun!


You may also be interested in:

* Integrating QGIS and R: A stratified sampling example


Share this post:   



Subscribe to my blog and get the '50 best QGIS plugins of 2016' ebook completely free!