Oct 22, 2016
When I took the courses of the Data Science specialization in Coursera, one of the methods that I found most interesting was model ensembling which aims to increase accuracy by combining the predictions of multiple models together.
These days I’ve been working on processing some Landsat images for my dissertation research, so it has been the perfect time to test this technique and assess whether it can help to improve the results for the multi-category land-cover classification I have to conduct. In the paragraphs below I explain the steps I followed and some of the lessons I learned during my first ensembling experience.Read more ...
Sep 20, 2016
When conducting a supervised classification with machine learning algorithms such as RandomForests, one recommended practice is to work with a balanced classification dataset. However, this recommendation is sometimes overlooked due to unawareness of its relevance or lack of knowledge about how to deal with it.
The purpose of this post is, first, to examine some of the consequences of working with an imbalanced dataset, using an image classification example, and second, to test and suggest some techniques to solve this problem.Read more ...
Aug 7, 2016
As a professional working with spatial data, I’ve found that many of the principles and good practices proposed in Data Science can be incorporated into the GIScience and remote sensing fields for improving our data handling and analyzing processes. Previous posts in my blog, such as those covering machine learning application to image classification and implementation of reproducible spatial analysis, have been written with the intention of accelerating adoption of Data Science practices into the profession.
As part of this continuing effort, and thanks to an invitation by Raul Jimenez, coordinator of the GeoDevelopers community, I recently gave a talk, in webinar format, about how Data Science can be applied to the analysis of spatial information. GeoDevelopers is a very active and friendly online community with more than 800 GIS developers sharing and creating content regarding geospatial apps development, cloud services and data processing, among many other topics.Read more ...
Jun 29, 2016
Reproducibility, the ability of an entire study to be replicated, is one of the core concepts in data science. Although preparing data analyses so they are reproducible is not a trivial task, it can bring many benefits and make a researcher’s life much easier: it can help to save time by allowing reuse of code and results from past studies or by allowing application of previously defined methodologies on new data.
Among the different tools that have been developed for helping (data) scientists to implement reproducible analyses, web-based notebooks are gaining increased popularity. These are interactive computational environments where code snippets, explanatory text, graphics and media can be integrated. In this post I am going to focus on Jupyter Notebooks and particularly on how to use them to create reproducible reports that combine ArcPy- and R-based geospatial analyses. Below I explain how to install and configure Jupyter Notebook to work with ArcPy and R and then I provide a practical example.Read more ...
May 28, 2016
One of the reasons why Twitter has become so popular is because it is a great way to get data for web applications and projects. Through its Search API, it is possible to find content by issuing a query to Twitter based on a supplied string. The results can then be parsed or displayed as preferred using ancillary tools.
The purpose of this tutorial is that you learn how to create an interactive web app that retrieves geolocated tweets and shows them in a map. Sounds cool? For facilitating the app creation, we are going to use Shiny, a web application framework for R. Let’s take a look at how we can easily do that.Read more ...
Apr 30, 2016
A couple of months ago, ESRI released a bridge library to connect ArcGIS and R. This library was developed with the purpose of facilitating management and processing of ArcGIS data for R users, and at the same time making easier for ArcGIS users to incorporate all the power of R analysis tools into their workflows.
This bridge library sounds quite promising so I’ve written a brief tutorial to learn, and also to test, the capabilities of this bridge between two of the leading players in the fields of GIS and data analysis. Below I describe how to install the required library and how to create and execute a tool that combines Arc datasets and functionalities from R packages for addressing species distribution modeling within ArcGIS environment. Let’s start!Read more ...
Nov 28, 2015
The goal of this post is to demonstrate the ability of R to classify multispectral imagery using RandomForests algorithms. RandomForests are currently one of the top performing algorithms for data classification and regression. Although their interpretability may be difficult, RandomForests are widely popular because of their ability to classify large amounts of data with high accuracy.
In the sections below I show how to import into R a Landsat image and how to extract pixel data to train and fit a RandomForests model. I also explain how to speed up image classification through parallel processing. Finally I demonstrate how to implement this R-based RandomForests algorithms for image classification in QGIS.Read more ...
Oct 31, 2015
QGIS, a cross-platform free and open-source software, has become one of the leading GIS in the market in recent years. Thanks to the work of an active group of developers, QGIS provides geoprocessing modules similar to standard tools found in privative GIS, supports most of the vector and raster file formats and provides an interface to databases such as PostgreSQL/PostGIS, SpatiaLite and MySQL.
One of the most compelling features of QGIS is its integration with other open-source GIS and statistical packages. Currently, QGIS supports SAGA, Orfeo Toolbox, GRASS GIS and R, which greatly expands QGIS’ core functionality. In this post I will focus on the integration between QGIS and R and explain how to configure the QGIS processing framework for executing an external R algorithm from the QGIS processing toolbox.Read more ...
Aug 11, 2015
An alternative way is to create the web map in the R environment using an R package called leaflet, developed by the guys from RStudio, which allows controlling and integrating Leaflet maps in R. In this post I show how to read a vector map in shapefile format and how to create a leaflet web map customizing the way the vector map is displayed. I will also show how to add a legend, a layers control and popups for displaying attribute data.Read more ...
Aug 9, 2015
Hi there! Welcome to my blog!
I’ll be writing about several topics related to GIS, remote sensing and the application of programming languages to the processing, visualization and analysis of spatial data, especially using the R language.Read more ...
Subscribe to my blog and get the '50 best QGIS plugins' ebook completely free!
See ALL POSTS >>>