Step-by-step tutorial for doing ANOVA test using completely randomized design in R software

R is an open source statistics program requiring knowledge of computer programming. It can be obtained from the following sources:

  • (Windows)
  • (Mac)
  • (Linux)

Here, I have presented the step by step guide to do Analysis of Variance test, commonly called ANOVA, in R software. R software screenshot is shown below:

R screenshot

In order to put the comments put the pound sign (#) before the statement/term. The comments are not the part of programming. These are used to give information or to remember, why the statements were used.

Adding comments

Importing tables from excel to R:

In R software, tables can easily be imported from the other programs such as excel. You can make table in excel, save the file in .csv format and import the data to the R program. Suppose, you made a file in .csv format and saved on Desktop in C (Local Disk). You can import the data in R program by writing the file directory. In my case, it is as follows:

> read.csv(“C:\\Users\\Usman\\Desktop\\test.csv”)

Import data from CSV

You can also specify a name for this data. In my case, I have given it a name of “test1”.

> test1 = read.csv(“C:\\Users\\Usman\\Desktop\\test.csv”)

Specifying name to the data from csv

After specifying the name, you would be able to get the data directly by writing “test1” as shown in the figure below:

Concatenating the data rows and generating the treatment factors:

Concatenate the data rows (link the data together in a sequence) of test1 into a single vector testy as follows:

> testy = c(t(as.matrix(test1))) # response data

> testy

[1]   223    26     2   234    56   546   332    34  1000   445    23   347

[13]   343    65 20000

Testy screenshot

as.matrix helps to convert an argument into a matrix.

We have three treatment levels – Objects, Notes and Points – and five observations. Now we will assign new variables for treatment levels, number of treatment levels and the number of observations as follows:

> f = c(“Objects “, ” Notes “, ” Points “)   # treatment levels
> k = 3                    # number of treatment levels
> n = 5                    # number of observations per treatment level

Now we create a vector of treatment factors that corresponds to each element of testy with the gl function.

> testx = gl(k, 1, n*k, factor(f))   # matching treatments

> testx

[1] Objects   Notes    Points  Objects   Notes    Points  Objects   Notes

[9]  Points  Objects   Notes    Points  Objects   Notes    Points

Levels: Objects   Notes   Points

testx screenshot

It is the function of gl to generate factors by specifying the pattern of their levels. Here k shows the number of levels, 1 shows the number of replications (the given levels have to be mentioned individually at a time) and n*k shows the length of the result. You can see that three treatment levels are repeated here individually for five times giving a length of fifteen.

ANOVA analysis:

Now we will apply the function aov as follows:

> aov.test1 = aov(testy ~ testx)

> summary(aov.test1)

Df           Sum Sq                 Mean Sq              F value                  Pr(>F)

testx                      2              59013716             29506858              1.159                    0.347

Residuals             12           305574720           25464560

ANOVA table

Basic interpretation of the results:

Here, we see that the p-value (Pr(>F)) of 0.347 is greater than the 0.05 (5%) significance level that is why we do not reject the null hypothesis (H0), i.e. we would not be able to prove our theory.

You can ask questions in the comments.


Usman Zafar Paracha

Usman Zafar Paracha is Assistant Professor, Pharmaceutics, in Hajvery University, Lahore, Pakistan.