# Tutorial 10: ANOVAs

**Analysis of variance (ANOVA) and important related statistical frameworks: Levene’s test of homogeneity of variances and the Tukey-Kramer tests between all pairs of means**

Week of November 14, 2022

(10th week of classes)

**How the tutorials work**

The DEADLINE for your report is always at the end of the tutorial. Problems for this report are spread out throughout this tutorial.

The INSTRUCTIONS for this report is found at the end of the tutorial.

Students may be able after a while to complete their reports without attending the synchronous lab/tutorial sessions. That said, we highly recommend that you attend the tutorials as your TA is not responsible for assisting you with lab reports outside of the lab synchronous sections.

The REPORT INSTRUCTIONS (what you need to do to get the marks for this report) is found at the end of this tutorial.

**Your TAs**

Section 101 (Wednesday): 10:15-13:00 - John Williams (j.p.w@outlook.com)

Section 102 (Wednesday): 13:15-16:00 - Hammed Akande (hammed.akande@mail.mcgill.ca)

Section 103 (Friday): 10:15-13:00 - Michael Paulauskas (michael.paulauskas@mail.mcgill.ca)

Section 104 (Friday): 13:15-16:00 - Alexandra Engler (alexandra.engler@hotmail.fr)

**Levene’s test of homogeneity of variances**

When conducting an Analysis of Variance (ANOVA), we assume that the samples from all groups were drawn from populations with the same variances. This is because the F-distribution is used in ANOVA assumes that variance within groups do not vary significantly (i.e., can be assumed the same). As seen previously, the F distribution can be understood as representing the sampling distribution of the ratios of sample variances sampled from normally distributed populations with the same variances. It is known that ANOVA results are affected by differences in variances among groups in the same way that the results based on the standard two-sample t-test are.

As such, before conducting an ANOVA, we need first to generate evidence that this assumption is met. So, the first assessment when conducting an ANOVA is to test for the equality (homogeneity) of variances (commonly referred as to homoscedasticity).

Here the H_{0} is that samples (groups) come from populations with the same variances. If this H_{0} is rejected, we cannot use the standard ANOVA. To test this H_{0}, we will use the Levene’s test. There are other tests to assess homoscedasticity, but this is commonly used. Let’s proceed with the Levene’s test and determine whether we should use ANOVA to analyse the circardian data seen in lecture 17.

The standard R instalation (called R base) does not contain the `Levene's test`

and a special packaged called `car`

needs to be installed. There are other packages that contain the `Levene's test`

but `car`

is commonly used. To do that, simply:

`install.packages("car",dependencies=TRUE)`

During the installation process, R may ask you `Do you want to install from sources the package which needs compilation? (Yes/no/cancel)`

; if that happens (it doesn’t always), simply write `no`

and press enter.

You may also need to install this package to run the package `car`

as it seems that `car`

depends on it but was not made part of
the installation process of `car`

even setting `dependencies=TRUE`

:

`install.packages("openxlsx")`

Now, call the package `car`

:

`library(car)`

Download the circadian data file

Now upload and inspect the data:

```
circadian <- read.csv("chap15e1KneesWhoSayNight.csv")
View(circadian)
```

Now we can run the Levene’s test as follows:

`leveneTest(shift ~ factor(treatment), data=circadian)`

**Problem 1:**

1a) Using a significance level (alpha) equal to 0.05, should we reject or not the null hypothesis of homoscedasticity?

1b) Can we apply ANOVA to analyse these data?

In your file identify problem 1:

`#`

Problem 1: write your answer.

`#`

continue your answer

**Graphical and table representation of group means**

Let’s produce a stripchart of the data to observe their differences in a graphical format. The argument `pch=1`

sets the data points to be graphed as circle (open circles).

`stripchart(shift ~ treatment, data = circadian, vertical = TRUE,pch=1,col="firebrick",xlab="light treatment",ylab="shift in circadian rhythm (h)")`

The order of the groups (treatments) appear on alphabetical order. As discussed in class, because the `eyes`

treatment has the most negative shift in circadian rhythm (i.e., highest production of melatonin), one may prefer to display it in the third column of the graph so that the contrast among groups (treatments) is more obvious. This can be done by changing the order of the column treatment in the data:

`circadian$treatment <- factor(circadian$treatment, levels = c("control", "knee", "eyes")) `

Now plot again the stripchart:

`stripchart(shift ~ treatment, data = circadian, vertical = TRUE,pch=1,col="firebrick",xlab="light treatment",ylab="shift in circadian rhythm (h)")`

Let’s produce a table of descriptive statistics (mean and standard deviation) and sample size for each treatment. This table is commonly used to report data considering multiple groups:

```
meanShift <- tapply(circadian$shift, circadian$treatment, mean)
sdevShift <- tapply(circadian$shift, circadian$treatment, sd)
n <- tapply(circadian$shift, circadian$treatment, length)
data.frame(mean = meanShift, std.dev = sdevShift, n = n)
```

The function `data.frame`

(used above) is extremely useful but won’t be covered in details in BIOL322. It suffices to say that `data.frame`

allows to place together different variables into a common data structure. In the above case these variables were the mean, standard deviation and sample size (n).

**Problem 2:**

Create code to generate a similar table to the one above but that also includes the median of each group (just after the mean).

In your file identify problem 2:

`#`

Problem 2: enter your code here.

`#`

continue your answer

**Analysis of variance (ANOVA)**

To conduct an Analysis of Variance (ANOVA), we use the function `aov`

which stands for `analysis of variance`

. The function `aov`

analyzes a response variable (dependent variable) as a function (hence the symbol `~`

) of a categorical predictor. Here we want to analyse the variation in shifts in circadian rhythm (response variable) as a function of light treatment (predictor). The function `anova`

is then used to create the anova table, including the F-value for the analysis and its associated P-value.

```
circadianANOVA <- aov(shift ~ treatment, data = circadian)
anova(circadianANOVA)
```

**Problem 3:**

3.a) Using an significance level (alpha) equal to 0.05, should we reject or not the null hypothesis?

3.b) Provide an alpha value in which the null hypothesis would not be rejected.

In your file identify problem 3:

`#`

Problem 3: write your answer.

`#`

continue your answer

**Post-hoc test: Tukey-Kramer tests between all pairs of means**

`Post-hoc`

means “performed after the event” where the event here is the ANOVA. So, we conduct the ANOVA first, and then conduct the Tukey-Kramer `Post-hoc`

test. `Post-hoc`

is the jargon used in stats books!

Now we will conduct the Tukey-Kramer tests between all pairs of means to evaluate which `contrast`

(any given difference between a pair of means) are significantly different. The function `TukeyHSD`

uses the object created by the function `aov`

earlier to compare all possible mean differences (contrasts) between groups (treatments):

```
posthoc <- TukeyHSD(circadianANOVA, conf.level=0.95)
posthoc
```

The adjusted probability (p adj) is the P-value calculated on the basis that 3 different statistical tests were performed for the same data.

**Problem 4:**

Using a significance level (alpha) equal to 0.05, which pairs of means should be considered significantly different? Provide an explanation of the results in the context of the research problem.

In your file identify problem 4:

`#`

Problem 4: write your answer.

`#`

continue your answer

Submit the RStudio file containing the report via Moodle.