Oct 31, 2015

QGIS, a cross-platform free and open-source software, has become one of the leading GIS in the market in recent years. Thanks to the work of an active group of developers, QGIS provides geoprocessing modules similar to standard tools found in privative GIS, supports most of the vector and raster file formats and provides an interface to databases such as PostgreSQL/PostGIS, SpatiaLite and MySQL.

One of the most compelling features of QGIS is its integration with other open-source GIS and statistical packages. Currently, QGIS supports SAGA, Orfeo Toolbox, GRASS GIS and R, which greatly expands QGIS’ core functionality. In this post I will focus on the integration between QGIS and R and explain how to configure the QGIS processing framework for executing an external R algorithm from the QGIS processing toolbox.

R processing configuration in QGIS

For running an R algorithm in QGIS, we first have to activate R in QGIS and indicate where the R binaries are located. In the QGIS interface, go to the ‘Processing’ - ‘Options…’ menu. In the ‘Processing options’ window (see screenshot below), go to ‘R scripts’ and mark the checkbox next to ‘Activate’. In the ‘R folder’ option, browse and select the folder where R is installed in your computer. Also, note the ‘R Scripts folder’; this is the folder where your R scripts will be saved. Click OK to close the window.

Now open the Processing Toolbox by clicking ‘Toolbox’ in the ‘Processing’ menu in QGIS. You should see the ‘Processing Toolbox’ panel open on the right of QGIS interface. Notice the dropdown menu at the bottom of the Processing Toolbox indicating that you are seeing the ‘Simplified interface’. Switch to the “Advanced interface”.

How to run an R script in QGIS

In the next example I am going to show how to run in QGIS an R script that samples point locations within a multipolygon shapefile following a stratified random sampling. The sampling size is taken from a table (a csv file) that indicates the number of points to be created for each category of a thematic map (in this case, land cover type). The following screenshot shows the land cover map and the table loaded in QGIS using the Browser panel:

For this example we need the ‘sp’ R package. If this package is not installed in your R library yet, please read in this previous post how to install it.

Now let’s create an R script in QGIS. In the Processing Toolbox, go to ‘R scripts’ - ‘Tools’ and click “Create new R script”. A new ‘Script editor’ window must be displayed:

Copy and paste the following code in the Script editor:

##polyg_category_id=field polyg
##table_category_id=field sizes_table
##sampling_size=field sizes_table
##output=output vector
i <- 1
category <- unique(polyg[[polyg_category_id]])[i]
categorymap <- polyg[polyg[[polyg_category_id]] == category,]
n <- sizes_table[which(sizes_table[[table_category_id]] == category), sampling_size]
spdf1 <- SpatialPointsDataFrame(spsample(categorymap, n, "random"), data = data.frame(category = rep(category, n)))

for (i in 2:length(unique(polyg[[polyg_category_id]]))){
  category <- unique(polyg[[polyg_category_id]])[i]
  categorymap <- polyg[polyg[[polyg_category_id]] == category,]
  n <- sizes_table[which(sizes_table[[table_category_id]] == category), sampling_size]
  spdf1 <- rbind(spdf1, SpatialPointsDataFrame(spsample(categorymap, n, "random"), 
                                               data = data.frame(category = rep(category, n)))
output = spdf1

The first six lines, which start with a double pound sign (##), indicate QGIS the inputs and the outputs of the algorithm and are used to create the graphic user interface. This information is also used to create the corresponding R variables that can be used later as input for R commands.

In this case, we are telling QGIS we are defining an input called ‘polyg’ of type ‘vector’ and another input called ‘sizes_table’ of type ‘table’ (in the first and third lines, respectively). ‘polyg_category_id’ and ‘table_category_id’ are fields from the vector map table and the csv table, respectively, which are the primary key (aka key field) that will be used to match the two tables. For this example, this key field represents a unique identifier for each category (land cover type).

After defining the inputs, the ‘sp’ package is loaded, the sample point locations are created for the first category, and then the sampling points for the second to the last categories are created and merged to the previously created set of sampling points.

You can run the algorithm by clicking the ‘Run algorithm’ button in the Script editor. You should see a new dialog with three tabs: ‘Parameters’, ‘Log’ and ‘Help’. In the ‘Parameters’ tab select the vector map, the key field in the vector map, the table (e.g. a csv file), the key field in the table and the field in the table indicating the sampling size for each category. If you want you can save the output to a permanent file (e.g., un shapefile) or generate a temporary file (just leave the ‘output’ field blank). Click the ‘Run’ button to execute the algorithm.

The output sample locations must be displayed after the script execution is finished:

To save the R script click the ‘Save’ icon in the Script editor. In the new window you will see that you are located at the rscripts folder (e.g., C:\Users\Guest.qgis2\processing\rscripts) and that the file will be saved as a ‘Processing R script’. Provide a name for the file and click ‘Save’. The file will be saved with a ‘.rsx’ extension.

Once saved, the algorithm can also be run in batch mode. You should see your saved algorithm in the ‘Processing Toolbox’ within the ‘User R scripts’ group. Right-click your algorithm and select ‘Execute as batch process’. A ‘Batch Processing’ window is open, as shown below.

In the ‘Parameters’ tab, you will need to enter the corresponding inputs for the algorithm parameters for each process. When ready, click ‘Run’.

You can see below a video tutorial showing how to perform each of the steps described above for configuring and executing an R script in QGIS:

Final remarks

Running an R algorithm from QGIS can help to make data handling and visualization more efficient. This approach avoids having to write additional lines of code for importing the files and exporting the results to be open later in a GIS. Furthermore, the results can be visualized right after the execution of the script and examined using QGIS’ zoom and pan tools (which R is less suited for).

Regarding the drawbacks, it may be difficult to debug your code if there is an error in the script execution. Therefore it may be advisable to test your code in R first and then move to QGIS when the code is bug-free.

Hope you give a try to the analysis of your spatial data using R scripts in QGIS soon. Let me know how that goes!

You may also be interested in:

* Web mapping with Leaflet and R

Share this post:   

Subscribe to my blog and get the '50 best QGIS plugins of 2016' ebook completely free!