Generalized Linear Model for Community Ecology

Pedro Peres-Neto, Concordia UniversityBIOS2 workshop, May 17 to 21, 2021

This document was put together for the first time for this workshop.

Let me know if you have suggestions or find any issues in the document.

**Tentative schedule**

Day 1:

Introduction to types of data and approaches using glm in community ecology.

Types of patterns in species distributions involving trait and environmental variation.

Simulating data as a path to understand GLMs in community ecology.

The simplest GLM: widely used bivariate correlations.

The challenges of statistical inference regarding linking different types of information from communities and species.

Understanding estimators and their properties in GLMs.Day 2:

From bivariate correlations to a variety of more complex GLMs: the case of Binomial and Poisson.

The role of latents in specifying GLMs for community ecology.

The issues underlying autocorrelation in ecological data: the cases of spatial and phylogenetic autocorrelation.

Simple GLMM approaches (Generalized Linear Mixed Models).Day 3:

More complex GLMM approaches.

Potential approaches for incorporating intraspecific data on traits.

Discussion with participants: your research interests, your questions or your data (or anything really).

**Phylosophy:** We can’t cover everything with extreme details. I’ve chosen a level that should be interesting enough and cover many different important aspects of GLMs applied to community ecology.

**Note:** I mostly apply here base functions so that participants without strong knowledge of certain packages (e.g., ggplot, dplyr) can follow the code more easily.

**Questions:** Participants should feel free to ask questions either directly or in the zoom chat. I’ve also set a good doc where participants can put questions there during the week when we are not connected. I’ll read them and try to provide an answer or cover the question somehow:

https://docs.google.com/document/d/17GQvGkBFs9MmLv6Yn473_Dr1t1Ps03VdHOHMhbZhBKk/edit

Simulating a single species

One way to develop good intuition underlying quantitative methods is to be able to simulate data according to certain desired characteristics. We can then apply methods (GLMs here) to see how well they retrieve the data characteristics.

Let’s start with a very simple GLM, the logistic regression for one single species. Here, for simplicity, we considered one predictor. In many ecological simulations, this single predictor is considered an “environmental gradient” containing many environmental predictors. We can consider more gradients and we will discuss that later on in the workshop.

```
set.seed(100) # so that we all have the same results
n.sites <- 100
X <- rnorm(n.sites)
b0 <- 0.5 # controls the max prob. values
prob.presence <- 1./(1+exp(-(b0+3*X)))
plot(prob.presence ~ X)
```

This model is pretty simple and its form is:

\[p=\frac{1}{1+e^{-(\beta_0+\beta_1X_1)}} = \frac{1}{1+e^{-(0.5+3X_1)}}\]

Now let’s generate presences and absences according to the logistic model expectation. Since is a logistic model, we use `rbinom`

,i.e., binomial trials:

`Distribution <- rbinom(n.sites,1,prob.presence)`

`View(cbind(prob.presence,Distribution))`

Let’s model the data using logistic regression:

```
model <- glm(Distribution ~ X,family=binomial(link=logit))
coefficients(model)
```

```
## (Intercept) X
## 0.09200133 3.66168492
```

`View(cbind(prob.presence,Distribution,model$fitted.values))`

Plotting the predicted versus the observed presence-absence values:

`plot(model$fitted.values ~ Distribution)`

At this point, we won’t cover model diagnostics. Data were simulated according to the model and, as such, assumptions hold well. Plus, this is a single-species model; and this workshop is about community data, i.e., multiple species :). This is a good blog explaining how to check for assumptions of logistic regressions:

Simulating a more realistic single species

Species don’t tend to respond linearly to environmental features: