Introduction to Species Distribution Models

By: Qing Zhao

January 24, 2018 Meetup Presentation

Check out Dr. Zhao’s presentation: http://www.r-gators.com/slides/zhao. This presentation was made using RMarkdown. The Markdown code is included below, however outputs from R (i.e. graphs) will only be displayed in the presentation.

Download the RMarkdown file and an R script for generating data to test these scripts at the GitHub repository: https://github.com/ufrmeetup/zhao. Note the RMarkdown script relies on some private data, so it will not run on it’s own, but can be run using the simulated data.

What are species distribution models (SDMs)?

“Species distribution models (SDMs) are numerical tools that combine observations of species occurrence or abundance with environmental estimates. They are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.” – Elith & Leathwick 2009

Elith, J. and Leathwick, J.R., 2009. Species distribution models: ecological explanation and prediction across space and time. Annual review of ecology, evolution, and systematics, 40, pp.677-697.

Types of SDMs

Type of statistical models
- Frequentist
- Bayesian
- Machine learning

Types of SDMs, cont’d

Type of response variables
- Presence/absence (occurrence)
- Count (abundance)
Number of response variables
- One
  - single species
  - Uni-directional interspecific interactions (B -> A)
- Two or more: multiple species
  - Interspecific similarity
  - Undirectional interspecific interactions (A - B)
  - Bi-directional interspecific interactions (A -> B & B -> A)

Types of SDMs, cont’d

Temporal processes represented
- No: static
- Yes: dynamic
Spatial processes represented
- No
- Implicitly (spatial autocorrelation)
- Explicitly (dynamic)

Basic frequetist SDMs (that no one uses)

Occurrence: logistic regression \[y \sim Bernoulli(\pi)\] \[logit(\pi) = X\beta\]
Abundance: Poisson regression \[N \sim Poisson(\lambda)\] \[log(\lambda)=X\beta\]

Two directions of extension

	Machine learning (non-parametric)	Bayesian (parametric)
Non-linear relationship	Easy to detect	Hard to detect
Observation error	Hard to account	Easy to account

Today’s focus

Occurrence
Single species
Static, non-spatial

Two issues to consider

Non-linear relationships
Observation error

Two approaches for one goal

Multivariate adaptive regression spline (MARS)
- Detect non-linear relationship
Occupancy model
- Account for detection error

What is MARS

Developed by Friedman (Friedman, J.H., 1991. Multivariate adaptive regression splines. The annals of statistics, pp.1-67.)
Connect a bunch of straight segment to represent non-linear relationships

What are occupancy models

Developed by MacKenzie and colleagues (MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Andrew Royle, J. and Langtimm, C.A., 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83(8), pp.2248-2255.)
It is a hierarchical Bayesian model that includes an observation sub-model and a process sub-model

Observation:
\[y_{i,j} \sim Binomial(z_{i} \times p, J_{i})\] Process: \[z_{i} \sim Bernoulli(\pi_{i})\] \[logit(\pi_{i}) = x^T \beta\]

Case study

Lion beetle in Alberta, Canada
Relate lion beetle occurrence with temperature and forest cover
Predict lion beetle distribution

Study area and survey sites

Look at the data

# Slide found here http://www.r-gators.com/slides/zhao#15 
load('c:/A. UFL/rmeetup/rmeetup data.RData')
head(data, n=3)

Things can get complicated quickly

Distribution-environment relationship	Model
Linear, additive	Temperature
	Forest
	Temperature + Forest
Non-linear, additive	Temperature ^ 2
	Forest ^ 2
	Temperature ^ 2 + Forest
	Temperature + Forest ^ 2
	Temperature ^ 2 + Forest ^ 2

And more

Distribution-environment relationship	Model
Interaction	Temperature * Forest
	(Temperature ^ 2) * Forest
	Temperature * (Forest ^ 2)
	(Temperature ^ 2) * (Forest ^ 2)

Use MARS to make things easier

library(earth) # This is the library for MARS

# Slide found here http://www.r-gators.com/slides/zhao#18 
# Create "occupancy" data for MARS
data$y01 <- ifelse(rowSums(data[,1:4])==0, 0, 1)
head(data, n=3)

Use MARS to analyze the data

Slide for these chunks found here http://www.r-gators.com/slides/zhao#19

# Model one, only additive relationships are considered
fit1 <- earth(y01 ~ temperature + forest, data=data, 
              glm=list(family='binomial'), # for occurrence data
              degree=1, # for additive model
              pmethod='backward', nfold=20, keepxy=TRUE) # for k-fold cross validation

# Model two, consider interactions between covariates
fit2 <- earth(y01 ~ temperature + forest, data=data, 
              glm=list(family='binomial'),
              degree=2, # include level-one interactions
              pmethod='backward', nfold=20, keepxy=TRUE)

Model comparison

Look at the AUC values of cross-validation - slide found here http://www.r-gators.com/slides/zhao#20

auc <- cbind(fit1$cv.auc.tab[1:nfold,1], fit2$cv.auc.tab[1:nfold,1], 
boxplot(auc, axes=F, ylim=c(0,1), xlab='Model', ylab='AUC')
axis(side=1, at=1:2, labels=c('Temp + Forest','Temp x Forest'))
axis(side=2, at=seq(0,1,.5))
box()

The importance of covariates

# Look at slideshow http://www.r-gators.com/slides/zhao#21
evimp(fit2)

Look at the response curves

# Slide found here http://www.r-gators.com/slides/zhao#22
npred <- 100
forest.highlow <- c(0, 1)
ypred <- matrix(0, npred, 2)
for (i in 1:2) {
  xpred <- cbind(seq(-3, 3, length.out=npred), 
                 rep(forest.highlow[i], npred))
  ypred[,i] <- inv.logit(predict(object=fit2, newdata=xpred))
}
xseq <- seq(-3, 3, length.out=npred)
par(mfrow=c(1,1))
par(mar=c(5,5,1,1))
plot(ypred[,1] ~ xseq, type='n', ylim=c(0,1), 
     xlab='Temperature', ylab='Prob. of Occu.')
for (i in 1:2) {lines(ypred[,i] ~ xseq, col=2^i)}
legend('topright', bty='n', lty=1, col=2^c(1:2), 
       legend=c('low','high'), title='Forest')

# Look at presentation for figure http://www.r-gators.com/slides/zhao#23 

load('c:/A. UFL/rmeetup/ypred.RData')
npred <- 100
xseq <- seq(-3, 3, length.out=npred)
par(mfrow=c(1,1))
par(mar=c(4,4,1,1))
plot(ypred[,1] ~ xseq, type='n', ylim=c(0,1), 
     xlab='Temperature', ylab='Prob. of Occu.')
for (i in 1:2) {lines(ypred[,i] ~ xseq, col=2^i)}
legend('topright', bty='n', lty=1, col=2^c(1:2), 
       legend=c('low','high'), title='Forest')

What we learned from MARS

An interaction between temperature and forest is needed
A quandratic form of temperature is needed

Issues yet to solve

Observation error

Conduct occupancy modeling

library(unmarked) # This is the library for occupancy models

# Prepare data for "unmarked"
unmarked.data <- unmarkedFrameOccu(y=data[,1:4], 
                                siteCovs=data[,c('temperature','forest')], 
                                obsCovs=NULL)

Run “unmarked” to fit the MacKenzie et al. (2002) Occupancy Model

fit <- occu(~1 ~temperature*forest + I(temperature^2)*forest, 
            data=unmarked.data, knownOcc=which(rowSums(y)==1))

Look at the parameter estimates

# slide found here http://www.r-gators.com/slides/zhao#28
summary(fit)

Overall detection probability

# slide found here http://www.r-gators.com/slides/zhao#29
library(boot)
pobs <- inv.logit(0.007790955)
pobs.grand <- 1 - (1 - pobs) ^ 4
pobs
pobs.grand

Check the response curve

rm(list=ls())
library(boot)
load('c:/A. UFL/rmeetup/est.RData')
n <- 100
xseq <- seq(-3, 3, length.out=n)
pi0 <- inv.logit(est[1,1] + est[2,1] * xseq + est[4,1] * xseq ^ 2)
pi1 <- inv.logit(est[1,1] + est[2,1] * xseq + est[4,1] * xseq ^ 2 + 
                est[3,1] + est[5,1] * xseq + est[6,1] * xseq ^ 2)
par(mfrow=c(1,1))
par(mar=c(5,5,1,1))
plot(pi0 ~ xseq, type='l', ylim=c(0,1), col=2, 
     xlab='Temperature', ylab='Prob. of Occu.')
lines(pi1 ~ xseq, col=4)
legend('topleft', bty='n', lty=1, col=2^c(1:2), 
       legend=c('low','high'), title='Forest')

Predict the distribution of lion beetles

library(boot)

load('c:/A. UFL/rmeetup/est.RData')
load('c:/A. UFL/2. Alberta/data/original/EnvData.km2.RData')
pi <- inv.logit(est[1,1] + est[2,1] * cov$MAT + est[4,1] * cov$MAT ^ 2 + 
              est[3,1] * cov$forest + est[5,1] * cov$MAT * cov$forest + 
              est[6,1] * cov$MAT ^ 2 * cov$forest)
pi.scale <- round((pi - min(pi)) / (max(pi) - min(pi)) * 9) + 1

par(mfrow=c(1,1))
par(mar=c(0,0,1,0))
par(oma=c(1,1,1,1))

plot(cov$lat ~ cov$lon, pch=19, cex=.01, axes=F, 
     xlab='', ylab='', 
     col=rev(heat.colors(10))[pi.scale])

What did occupancy do for us?

Estimate detection probability
Explain beetle-environment relationships
Predict beetle distribution

R-Gators

Activity hub for R programming at the University of Florida

Introduction to Species Distribution Models

What are species distribution models (SDMs)?

Types of SDMs

Types of SDMs, cont’d

Types of SDMs, cont’d

Basic frequetist SDMs (that no one uses)

Two directions of extension

Today’s focus

Two issues to consider

Two approaches for one goal

What is MARS

What are occupancy models

Case study

Study area and survey sites

Look at the data

Things can get complicated quickly

And more

Use MARS to make things easier

Use MARS to analyze the data

Model comparison

The importance of covariates

Look at the response curves

What we learned from MARS

Issues yet to solve

Conduct occupancy modeling

Run “unmarked” to fit the MacKenzie et al. (2002) Occupancy Model

Look at the parameter estimates

Overall detection probability

Check the response curve

Predict the distribution of lion beetles

What did occupancy do for us?

Further reading

Further reading

Further reading

Further reading

Further reading

Thank you