Tutorial 6: ANCOVA

February 17, 2026 Analysis of Covariance

CRITICAL: Regular practice with R and RStudio, the statistical software used in BIOL 322 and introduced during tutorial sessions, and consistent engagement with tutorial exercises are essential for developing strong skills in Biostatistics.

EXERCISES: Tutorial reports are independent exercises and are not submitted for grading; however, students are strongly encouraged to complete them. Some tutorials include solutions at the end to support self-assessment and review. Other tutorials do not require model answers because they are procedural and can be easily self-assessed by verifying that the code runs correctly and produces the expected type of output.

Here we will use the overcompensation study involving the Grazing experiment covered in class. Data come from Belgelson and Crawley (1992) - Effects of grazing on fruit production.

Download the Grazing data

Load the data:

regrowth <- read.csv("GrazingAncova.csv")
View(regrowth)

Even though the variable that we want to treat as a factor is Grazing, it’s stored as a character (string) variable. As such, R won’t treat automatically as a categorical predictor in the model. Let’s first check the structure and types of variables that the data contain.

str(regrowth)

As we can see, Grazing is a character. Therefore, we do need to manually convert it to a factor in this case:

regrowth$Grazing <- as.factor(regrowth$Grazing)
str(regrowth)

Now, Grazing has been converted into a factor.

Plotting data organized in groups (series)

One important aspect in biostatistics is to represent data in a way that depicts the research question you are tackling. In these data, we have two continuous and one categorical predictor. The categorical predictor has two groups (treatments).

Let’s plot these data using ggplot and learn some more about this system for creating graphs.

We should start by loading the package. Make sure that is installed (as seen in tutorial 3 on factorial ANOVA)

library(ggplot2)

ggplots are based on a series of elements (in the form of functions) that can be added to one another using +. We can save the plot into objects that can be then used later to add more functionality. We will start with the basic graph and then add elements. The function ggplot sets the basic plot with the data that will be used; the function geom_point is then added to state that it will be a scatter plot and to determine the size of the points and the labels

grazing.plot <- ggplot(data=regrowth, aes(x=Root, y=Fruit, color=Grazing)) + geom_point(aes(shape=Grazing), size=2.5) + xlab("Root size") + ylab("Fruit production")
grazing.plot

The default for ggplots are based on a grey grid as background. I prefer a more classic look but this is totally personal. Here is how to generate the graph in a more “classic” way: We add the initial plot saved in the object grazing.plot. We can either keep the same name for the new graph or create a new object. I will do the latter here so that we can see how the plot “evolved”:

grazing.plot.classicLook <- grazing.plot +  theme_classic()
grazing.plot.classicLook

We can also add regression lines for each group as follows. We can either use the classic theme (“look”) or the default.

grazing.plot.classicReg <- grazing.plot.classicLook + geom_smooth(method="lm",formula=y~x,se=FALSE)
grazing.plot.classicReg

Or we could had used also the default theme, which was saved in the object grazing.plot.

grazing.plot.default <- grazing.plot + geom_smooth(method="lm",formula=y~x,se=FALSE)
grazing.plot.default

From now on I’ll use the classif theme but you can easily adapt the code to use the default theme. We may want to change the colors to our favourite ones:

grazing.plot.col <- grazing.plot.classicReg + scale_color_manual(values=c('blue','firebrick'))
grazing.plot.col

Finally, we may also want to plot the mean values of fruit production for each group; for doing that we need also to plot the average value for root size. We start by calculating these mean values:

mean.treatment.fruit <- aggregate(Fruit ~ Grazing,FUN=mean,data=regrowth)
mean.treatment.fruit
mean.treatment.root <- aggregate(Root ~ Grazing,FUN=mean,data=regrowth)
mean.treatment.root

And the plot them:

grazing.plot.col.mean <- grazing.plot.col + 
           geom_point(data = data.frame(x=mean.treatment.root[,2],
           y=mean.treatment.fruit[,2],Grazing=regrowth$Grazing), aes(x=x, y=y),size=3,col="black")
grazing.plot.col.mean

And because we saved the plots using different names, we can now see their “evolution”:

grazing.plot
grazing.plot.classicLook
grazing.plot.classicReg
grazing.plot.default
grazing.plot.col
grazing.plot.col.mean

Analysis of Covariance (ANCOVA)

As discussed in class, the first statiscal hypothesis that needs to be tested is whether the covariate (initial root size) can predict the response variable (fruit production). If fruit production is independent of initial size, then there is no need to adjust the values of fruit production and a simple ANOVA can be conducted.

anova(lm(Fruit ~ Root,data=regrowth))

The p-value is quite small, serving as strong evidence that fruit production can be predicted by root size.
The second relevant hypothesis is to test whether the groups (levels) of the factor (grazed and non-grazed) share a common slope (i.e., they do not differ significantly). If the interaction between the categorical (Grazing) and continuos variable (root size) is non-significant, then we can assume that the two (grazed/non-grazed) share a common slope for fruit production on root size that can be then used to predict a mean value that is common to both treatments:

anova(lm(Fruit ~ Root * Grazing,data=regrowth))

The p-value for the interaction term (Root × Grazing) is large; therefore, we do not reject the null hypothesis of equal slopes. This suggests that there is no strong statistical evidence that the relationship between fruit production and root size differs between grazed and ungrazed treatments.

In this case, it is reasonable to proceed with a reduced model that excludes the interaction term, while retaining both the covariate (root size) and the factor (grazing). This allows us to assess whether grazing affects fruit production after adjusting for initial root size (i.e., testing for differences in adjusted means under a common slope assumption).

We can now contrast two possible ANOVA results: one in which Root is entered first and then Grazing, and another in which Grazing is entered first and then Root:

anova(lm(Fruit ~ Root + Grazing,data=regrowth))
anova(lm(Fruit ~ Grazing + Root,data=regrowth))

Note how the sums of squares (Sum Sq), F-values, and p-values change when we change the order of terms in the model, as we discussed in class. This is the key feature of Type I (sequential) sums of squares: when predictors are not orthogonal, the “effect” of a term depends on what has already been entered in the model.

In this example, in both orders, the Grazing effect is significant; meaning that grazed and ungrazed treatments differ in fruit production after adjusting for initial root size (under the common-slope assumption). However, this need not always happen: when Grazing and root size are associated (i.e., not orthogonal), it is possible for Grazing to appear significant in one sequential order but not the other.

The issue is that Grazing and root size are not independent in these data (e.g., the grazed and ungrazed plants may occupy different ranges of root size), so sequential sums of squares can attribute shared variation differently depending on order. In such cases, we typically prefer Type II or Type III sums of squares (partial tests) rather than relying on the default Type I output from

library(car)

The flexible command in package car is Anova and not anova as in the standard R (i.e., capital A for the first letter). Before running the Type III Anova, we need to understand how to code the factor Grazing into the model. When we include a categorical variable (e.g., Grazing) in a linear model, R cannot work directly with words such as “Grazed” and “Ungrazed”. The model is mathematical. It needs numbers.

Therefore, R internally converts factor levels into numerical variables. This conversion is called contrast coding. Different contrast codings do not change the biological data, but they do change how the model partitions variation and what hypothesis is being tested.

By default, R uses treatment contrasts (contr.treatment). For two levels (e.g., Grazed vs Ungrazed), this typically means: - One group becomes the reference - The other group is compared against it - The intercept equals the mean of the reference group

Although this is perfectly fine for many purposes, type III sums of squares are not designed around treatment contrasts. For this analysis, we need to set the appropriate contrasts as follows:

options(contrasts = c("contr.sum", "contr.poly"))

The first option, contr.sum, applies to unordered factors. This is the relevant case here (Grazed vs. Ungrazed). Sum-to-zero coding is typically recommended when using Type III sums of squares, because it aligns the hypothesis test with deviations from the grand mean rather than comparisons to an arbitrary reference level.

The second option, contr.poly, applies only to ordered factors. Ordered factors are categorical variables that have a natural ranking (order), such as low, medium, high drug doses or increasing temperature levels. In those cases, R uses polynomial contrasts to test linear, quadratic, and higher-order trends across the ordered levels.

Importantly, both contrast options are not used simultaneously for the same variable. R uses the first option for unordered factors and the second option for ordered factors.

Now, before assuming anything, we should verify how the variable Grazing is currently defined. By default, a factor created with factor() is unordered, but it is always good practice to check:

is.ordered(regrowth$Grazing)

This allows us to confirm how R is interpreting the variable before fitting the model, i.e., is.ordered(regrowth$Grazing)=FLASE.

Type I sum of squares asks: “What does this variable explain given what came before it?” Type III sum of squares asks: “What does this variable explain after accounting for all other variables?”

As such, Type I ANOVA (sequential sums of squares) does not depend on contrast choice in the same way that Type III does. Type I depends on the order in which variables enter the model, not on contrast coding (for the overall F-tests of terms). This can be understood more intuitively as follows. In Type I ANOVA, the model proceeds step by step: 1. The first variable is entered and is given credit for all the variation it explains. 2. The second variable is then entered and is credited only for the additional variation it explains beyond the first. 3. The process continues sequentially for subsequent terms. We saw the above in our Lecture 10 about sum of squares.

If two predictors share variation (i.e., are not orthogonal), the one entered first “absorbs” part of that shared variation. That is why the results depend on order.

In contrast, Type III ANOVA does not depend on the order of entry, but it does depend on how the categorical variables are coded. This is because Type III tests each term after accounting for all other terms in the model simultaneously. Since the hypothesis being tested depends on how the factor is numerically represented, contrast coding matters. That is why, when performing Type III tests, we set appropriate contrasts beforehand: options(contrasts = c("contr.sum", "contr.poly")).

Using contr.sum ensures that the factor is coded with sum-to-zero contrasts, which aligns the Type III test with deviations from the grand mean rather than comparisons to an arbitrary reference level.

The statistical theory underlying contrasts is quite rich and mathematically detailed. For our purposes, the key idea is simply this:

Type I depends on order. Type III depends on coding.

Understanding this distinction is often sufficient for correctly applying these methods in practice. Let’s assume that you have at hands an ordered factor. Let’s assume that we have a factor called temperature in the regrowth experiment that we want to treat it as ordered. Then we can set as below; not that this won’t work as regrowth$temperature doesn’t exist in the data, but you can understand how it would have been coded.

# this code is for demonstrations only as it won't work - but you will use it later for the exercise
regrowth$temperature <- factor(regrowth$temperature,
                        levels = c("Low", "Medium", "High"),
                        ordered = TRUE)

How to check if it is ordered:

# this code is for demonstrations only as it won't work - but you will use it later for the exercise
is.ordered(regrowth$temperature)

if it says TRUE, then is ordered.

Let’s get back to our original problem.

result.Grazing.first <- lm(Fruit ~ Grazing + Root,data=regrowth)
Anova(result.Grazing.first,type = "III")

Note that it gives the same results (F-values and P-values) as the reversed order of factor + covariate:

result.Root.first <- lm(Fruit ~ Root + Grazing,data=regrowth)
Anova(result.Root.first,type = "III")

Now you can see that there is no difference in the order of predictors in the model.

Now that we know that Grazing significantly affects fruit production independently of initial root size, the next important question underlies the direction of change, i.e., is grazing still related to greater fruit production (as for the initial analysis) OR once the covariate is controlled for, is fruit production greater in the absence of grazing?

To estimate the directionality of change, we start by calculating a common mean for the covariate, i.e., initial root size.

com.root.size<-mean(regrowth$Root)
com.root.size

Now, build a regression object with the model:

lm.regrowth<-lm(Fruit ~ Grazing+Root,data=regrowth)
lm.regrowth

Prepare a data frame for prediction, i.e., we want to predict what is the fruit production given a common mean (7.18115) for the initial root size. The predict data have to have the same names as is in the original model that will be used to predict otherwise the function predict won’t work:

data.predict<-data.frame(Grazing=c("Grazed","Ungrazed"),Root=c(7.18115,7.18115))
data.predict

Let’s assign row names to predicted values first, and then predict:

rownames(data.predict) <- data.predict$Grazing
predicted.values <- predict(lm.regrowth, data.predict)
predicted.values

So, the predicted mean fruit growth value for Grazed is 41.35888 and for Ungrazed is 77.46212. Here are the original uncorrected (by initial root size) fruit growth values for comparison (as you calculated earlier):

mean.treatment.fruit

So the initial uncorrected mean values had fruit growth in the Grazed larger than Ungrazed. BUT, by adjusting the mean values for growth potential (i.e., initial root size), the outcome is the reverse (i.e., Ungrazed larger than Grazed). And the difference in corrected means is significant as tested earlier in Anova(lm(Fruit~Root+Grazing,data=regrowth),type = "III")

Let’s plot these adjusted mean values on the plot. We will simply add these as points in our ggplot

grazing.plot.col.mean + 
           geom_point(data = data.frame(x=c(7.18115,7.18115),
           y=predicted.values), aes(x=x, y=y),size=3,col="green")

Model adequacy steps

1) Assess normality; most data points fall on the line so we will assume normality.

plot(lm.regrowth,which=2)

To remove the caption ‘Normal Q-Q’ from the plot, you set caption as empty; this allows for making appropriate legends in reports and manuscripts in which a top title is usually not desired.

plot(lm.regrowth,which=2,caption="")

2) Assess residual heteroscedasticity:

plot(lm.regrowth,which=3,caption="")

The line looks “flat”, which suggests no tendency in increase in residual variation with predicted values.

We can test this assumption more formally using the Breusch-Pagan test against heteroscedasticity available in the package lmtest. The Levene’s test, as used previously in ANOVA, is appropriate when there are only categorical factors. Here we also have a continuous covariate (initial root size) so we can’t use the Levene’s test. You can think of the Breusch-Pagan test as testing the correlation between residual and predicted values.

install.packages("lmtest")
library(lmtest)

To run the Breusch-Pagan test:

bptest(lm.regrowth)

The P-values (0.4261) is reasonably large (greater than alpha=0.05), so the assumption of residual homoscedasticity holds for these data.

Selecting between Type I versus Type III sum of squares - examples when Type I be preferred over Type III

It is important to understand that Type I sums of squares are not inferior to Type III. They simply answer a different question. Type I ANOVA is sequential. It asks how much variation a variable explains given what has already been entered into the model. Therefore, the interpretation depends on the order of entry, and that order should reflect the scientific logic of the study.

Consider a common ecological example (see a genomic example next) in which we model a biological response as a function of environmental variables and spatial structure:

Species_distribution = Environment + Space

In many ecological systems, environmental gradients are viewed as primary drivers of species distributions, whereas spatial structure may represent additional processes such as dispersal limitation or historical contingency. If we enter Environment first and then Space, Type I ANOVA attributes to Environment all the variation it explains. Space is then tested only for the additional variation it explains after environmental effects have already been accounted for. The question being asked is therefore: does spatial structure explain variation beyond what is already captured by environmental gradients?

If we reverse the order and enter Space first, then shared variation between space and environment is attributed to space, and environment is tested only for what remains. The data have not changed. What has changed is the question. Sequential sums of squares assign shared variation to whichever predictor enters first.

This is not a flaw. It is appropriate when there is a conceptual or hierarchical ordering among predictors. Type I is therefore well suited to situations in which we are building explanations in stages, reflecting hypothesized ecological processes.

Type III, in contrast, removes this hierarchy. It tests each variable after accounting for all others simultaneously. That approach is useful when no predictor has conceptual priority, but it does not reflect a staged or hierarchical explanation.

So the choice between Type I and Type III is not about correctness. It is about which scientific question we wish to ask.

Now, consider a genomics example where we are modelling gene expression levels for a particular gene across individuals. Suppose we are interested in understanding variation in expression as a function of environmental exposure and genetic background:

Expression = Genotype + Environment

In many biological contexts, genotype represents a fundamental structural property of the organism, whereas environmental exposure (e.g., stress, diet, temperature) represents a condition acting on that genetic background. If we enter Genotype first and then Environment in a Type I ANOVA, the model attributes to Genotype all the variation in expression it explains. Environment is then tested only for the additional variation it explains beyond genotype.

The question being asked is therefore:

Does environmental exposure affect gene expression after accounting for genetic differences?

That is often a meaningful biological question. We may want to know whether an environmental factor modifies expression beyond what is expected from genetic variation alone.

If we reverse the order:

Expression = Environment + Genotype

then shared variation between genotype and environment (which can occur, for example, if certain genotypes are more common in certain environments) is attributed to Environment. Genotype is then tested only for what remains unexplained. The numerical results may change, not because the biology changed, but because the statistical question changed.

This illustrates why Type I sums of squares are appropriate when there is a logical hierarchy among predictors. If we consider genotype as primary and environment as acting upon it, then entering genotype first reflects that scientific reasoning. Type I allows the model to follow the conceptual structure of the biological hypothesis.

Type III, in contrast, would test genotype and environment simultaneously, without giving conceptual priority to either. That is useful when no variable is considered foundational, but it does not reflect a staged or hierarchical explanation of variation.

Exercise

Ordered factor in ANCOVA: testing linear trends across temperature levels

In this exercise, you will simulate a simple biological experiment. Suppose fruit production depends on initial root size (a continuous covariate) and temperature treatment. Temperature has three naturally ordered levels: Low, Medium, and High.

It may be a little challenging to solve this exercise if you don’t have yet some experience in using R to simulate data, even though the previous tutorials provided you with the knowledge of the code necessary to solve this problem; so, it’s fine if you take a pick on the solution below. This example will also allow you to understand how to code for ordered factors.

Assume that fruit production increases with root size and also increases linearly from Low to Medium to High temperature.

Your tasks are: 1. Simulate the data. 2. Convert Temperature into an ordered factor. 3. Fit an ANCOVA model. 4. Test the effect of Temperature using Type III sums of squares. 5. Interpret the linear trend.

Answer:

Step 1 - Simulate simple data:

Temperature <- rep(c("Low","Medium","High"), each = 20)

Root <- rnorm(60, mean = 7.2, sd = 1)

# Simple linear increase across temperature levels
Fruit <- 30 + 6*Root +
         c(rep(0,20), rep(10,20), rep(20,20)) +
         rnorm(60, 0, 5)

datT <- data.frame(Fruit, Root, Temperature)
View(datT)
str(datT)

At this stage, Temperature is a character variable.

Step 2 - Convert Temperature to ordered factor:

datT$Temperature <- factor(datT$Temperature,
                           levels = c("Low","Medium","High"),
                           ordered = TRUE)

is.ordered(datT$Temperature)

If TRUE, it is correctly defined as ordered.

Step 3- Set contrasts and fit the model:

options(contrasts = c("contr.sum","contr.poly"))

mT <- lm(Fruit ~ Root + Temperature, data = datT)
summary(mT)

Because Temperature is ordered, the model decomposes it into components: Temperature.L → linear trend Temperature.Q → quadratic trend

The linear term should be significant, reflecting a systematic increase from Low to Medium to High. The quadratic term should not be significant, because we simulated a straight (linear) increase. However, we are conducting multiple tests and it could happen that the quadratic trend is significant.

Step 4 - Type III test:

library(car)
Anova(mT, type = "III")

Interpretation: Root should be significant. This confirms that fruit production increases with root size. Temperature should also be significant.

In the Type III output produced by Anova(), you will notice that the ordered factor appears as a single line labelled simply as Temperature. You do not see separate rows for Temperature.L and Temperature.Q. This is not a mistake, and it is not because the linear and quadratic components disappeared. It is because Anova() is testing the factor as a whole rather than testing each contrast separately.

When we define Temperature as an ordered factor, R represents it internally using polynomial contrasts. With three ordered levels, this generates two components: a linear component (L) and a quadratic component (Q). These are simply numerical variables that together represent the ordered factor in the regression model. When we run summary(lm()), R reports the coefficients for each of these components separately, because summary() focuses on individual parameters.

However, Type III ANOVA is not testing individual coefficients. It tests entire model terms. In this case, Temperature is a single term in the model, even though it is represented internally by two contrast variables. The Type III test therefore asks whether Temperature, as a factor, explains variation in Fruit after accounting for Root. Mathematically, it is testing whether both the linear and quadratic components are simultaneously equal to zero.

So the single F-test reported for Temperature in the Type III table is a joint test of both L and Q together. If that test is significant, it tells us that at least one of the components (linear or quadratic) contributes to explaining variation. To understand whether the pattern is primarily linear or curved, we then look at the individual coefficients in the summary(lm()) output.

This distinction reflects an important principle: regression summaries report parameter-level tests, whereas ANOVA tables report term-level tests.