Generalized Linear Model for Community Ecology

Pedro Peres-Neto, Concordia University
BIOS2 workshop, May 17 to 21, 2021
This document was put together for the first time for this workshop.
Let me know if you have suggestions or find any issues in the document.

Tentative schedule

Phylosophy: We can’t cover everything with extreme details. I’ve chosen a level that should be interesting enough and cover many different important aspects of GLMs applied to community ecology.

Note: I mostly apply here base functions so that participants without strong knowledge of certain packages (e.g., ggplot, dplyr) can follow the code more easily.

Questions: Participants should feel free to ask questions either directly or in the zoom chat. I’ve also set a good doc where participants can put questions there during the week when we are not connected. I’ll read them and try to provide an answer or cover the question somehow:

https://docs.google.com/document/d/17GQvGkBFs9MmLv6Yn473_Dr1t1Ps03VdHOHMhbZhBKk/edit


Simulating data

Simulating a single species

One way to develop good intuition underlying quantitative methods is to be able to simulate data according to certain desired characteristics. We can then apply methods (GLMs here) to see how well they retrieve the data characteristics.

Let’s start with a very simple GLM, the logistic regression for one single species. Here, for simplicity, we considered one predictor. In many ecological simulations, this single predictor is considered an “environmental gradient” containing many environmental predictors. We can consider more gradients and we will discuss that later on in the workshop.

set.seed(100) # so that we all have the same results
n.sites <- 100
X <- rnorm(n.sites)
b0 <- 0.5 # controls the max prob. values
prob.presence <- 1./(1+exp(-(b0+3*X)))
plot(prob.presence ~ X)

This model is pretty simple and its form is:

\[p=\frac{1}{1+e^{-(\beta_0+\beta_1X_1)}} = \frac{1}{1+e^{-(0.5+3X_1)}}\]

Now let’s generate presences and absences according to the logistic model expectation. Since is a logistic model, we use rbinom,i.e., binomial trials:

Distribution <- rbinom(n.sites,1,prob.presence)
View(cbind(prob.presence,Distribution))

Let’s model the data using logistic regression:

model <- glm(Distribution ~ X,family=binomial(link=logit))
coefficients(model)
## (Intercept)           X 
##  0.09200133  3.66168492
View(cbind(prob.presence,Distribution,model$fitted.values))

Plotting the predicted versus the observed presence-absence values:

plot(model$fitted.values ~ Distribution)

At this point, we won’t cover model diagnostics. Data were simulated according to the model and, as such, assumptions hold well. Plus, this is a single-species model; and this workshop is about community data, i.e., multiple species :). This is a good blog explaining how to check for assumptions of logistic regressions:

http://www.sthda.com/english/articles/36-classification-methods-essentials/148-logistic-regression-assumptions-and-diagnostics-in-r/

Simulating a more realistic single species

Species don’t tend to respond linearly to environmental features: