Step-by-step tutorial for doing ANOVA test using Factorial Design in R software

R is an open source statistics program requiring knowledge of computer programming. It can be obtained from the following sources:

  • http://cran.r-project.org/bin/windows/base/ (Windows)
  • http://cran.r-project.org/bin/macosx/ (Mac)
  • http://cran.r-project.org/ (Linux)

Here, I have presented the step by step guide to do Analysis of Variance test, commonly called ANOVA, using Factorial Design in R software. Factorial design considers more than one factors and the test subjects are assigned to treatment levels of every factor combinations at random.

R software screenshot is shown below:

R screenshot

[sociallocker]NOTE: In order to put the comments put the pound sign (#) before the statement/term. The comments are not the part of programming. These are used to give information or to remember, why the statements were used.

Adding comments

Importing tables from excel to R:

Firstly, we have to know the nature of table. Suppose, you have three articles – “articlea”, “articleb” and “articlec”. You want 12 newspapers to rate your article from each province, i.e. “Province1” and “Province2”. From Province1, 4 newspapers are asked to rate the “articlea”, another 4 newspapers are asked to rate the “articleb” and another 4 newspapers are asked to rate the “articlec”. Similarly, 12 newspapers are selected randomly for these 3 articles from Province2. You want to check whether the articles have the same popularity or not in different provinces, and you do ANOVA for this checking.

In R software, tables can easily be imported from the other programs such as excel. You can make table in excel, save the file in .csv format and import the data to the R program. Suppose, you made a file in .csv format and saved on Desktop in C (Local Disk). You can import the data in R program by writing the file directory. In my case, it is as follows:

> dt = read.csv(“C:\\Users\\Usman\\Desktop\\dt.csv”)

You can see that I have also named the data, i.e. dt. It means by writing dt in R software, whole table will appear.

> dt

articlea articleb articlec

1       85       50       63

2       65       42       42

3       75       57       82

4       70       65       92

5       60       20       24

6       90       31       63

7       97       86       23

8       52       64       13

Importing data from CSV to R and assigning it a name

This table shows the ratings of articles in “Province1” in the numbers from 1-4 and in “Province2” in the numbers from 5-8.

Concatenation of data rows:

Now we will concatenate the data rows in the table “dt” into a single vector “ratings”, as follows:

> ratings = c(t(as.matrix(dt))) # response data

> ratings

[1] 85 50 63 65 42 42 75 57 82 70 65 92 60 20 24 90 31 63 97 86 23 52 64 13

Ratings concatenate

We have two factor levels, which will be written in R software as follows:

> a1 = c(“articlea”, “articleb”, “articlec”)                                 # 1st factor levels
> p2 = c(“Province1”, “Province2”)                                           # 2nd factor levels

Number of these factors will be assigned as follows:

> k1 = length(a1)                                                                              # number of 1st factors
> k2 = length(p2)                                                                              # number of 2nd factors

We have four observations per factor levels as mentioned above.

> n = 4                                                                                                   # number of observations per treatment level

Now we create a vector of treatment factors that corresponds to each element of “ratings” with 1st factor levels with the gl function as follows:

> fl1 = gl(k1, 1, n*k1*k2, factor(a1))
> fl1

[1] articlea articleb articlec articlea articleb articlec articlea articleb

[9] articlec articlea articleb articlec articlea articleb articlec articlea

[17] articleb articlec articlea articleb articlec articlea articleb articlec

Levels: articlea articleb articlec

Articles presented in a sequence

Now we create a vector of treatment factors that corresponds to each element of “ratings” with 2nd factor levels with the gl function as follows:

> fl2 = gl(k2, n*k1, n*k1*k2, factor(p2))
> fl2

[1] Province1 Province1 Province1 Province1 Province1 Province1 Province1

[8] Province1 Province1 Province1 Province1 Province1 Province2 Province2

[15] Province2 Province2 Province2 Province2 Province2 Province2 Province2

[22] Province2 Province2 Province2

Levels: Province1 Province2

Province presented in a sequence

ANOVA analysis:

Now we will apply the function aov as follows:

> avrf = aov(ratings ~ fl1 * fl2)                                                    # include interaction

> summary(avrf)

Df                           Sum Sq                       Mean Sq              F value                  Pr(>F)

fl1                           2                              2878                       1439.0                   3.388                     0.0564 .

fl2                             1                              1134                       1134.4                   2.671                     0.1196

fl1:fl2                     2                              1931                       965.4                     2.273                     0.1318

Residuals             18                           7645                       424.7

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ANOVA table in factorial design

Basic interpretation of the results:

This shows that the articles are almost equally important but not equally popular in different provinces.

Here, 0.0564 is almost equal to 0.05 and it is almost significant. It shows that we may reject the null hypothesis and can say that the articles are almost equally important.

Here, we see that the p-values (Pr(>F)) of 0.1196 and 0.1318 are greater than the 0.05 (5%) significance level that is why we do not reject the null hypothesis (H0), i.e. we would not be able to prove our theory. This shows that the articles are not equally popular in different provinces and there is not a possible interaction between the articles and newspapers from different provinces.

You can ask questions or give suggestions in the comments.[/sociallocker]

Usman Zafar Paracha

Usman Zafar Paracha is a sort of entrepreneur. He is the author of "Color Atlas of Statistics", and the owner of an Android game "Faily Rocket."

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.