Step-by-step tutorial for doing ANOVA test using Factorial Design in R software
R is an open source statistics program requiring knowledge of computer programming. It can be obtained from the following sources:
- http://cran.r-project.org/bin/windows/base/ (Windows)
- http://cran.r-project.org/bin/macosx/ (Mac)
- http://cran.r-project.org/ (Linux)
Here, I have presented the step by step guide to do Analysis of Variance test, commonly called ANOVA, using Factorial Design in R software. Factorial design considers more than one factors and the test subjects are assigned to treatment levels of every factor combinations at random.
R software screenshot is shown below:
[sociallocker]NOTE: In order to put the comments put the pound sign (#) before the statement/term. The comments are not the part of programming. These are used to give information or to remember, why the statements were used.
Importing tables from excel to R:
Firstly, we have to know the nature of table. Suppose, you have three articles – “articlea”, “articleb” and “articlec”. You want 12 newspapers to rate your article from each province, i.e. “Province1” and “Province2”. From Province1, 4 newspapers are asked to rate the “articlea”, another 4 newspapers are asked to rate the “articleb” and another 4 newspapers are asked to rate the “articlec”. Similarly, 12 newspapers are selected randomly for these 3 articles from Province2. You want to check whether the articles have the same popularity or not in different provinces, and you do ANOVA for this checking.
In R software, tables can easily be imported from the other programs such as excel. You can make table in excel, save the file in .csv format and import the data to the R program. Suppose, you made a file in .csv format and saved on Desktop in C (Local Disk). You can import the data in R program by writing the file directory. In my case, it is as follows:
> dt = read.csv(“C:\\Users\\Usman\\Desktop\\dt.csv”)
You can see that I have also named the data, i.e. dt. It means by writing dt in R software, whole table will appear.
> dt
articlea articleb articlec
1 85 50 63
2 65 42 42
3 75 57 82
4 70 65 92
5 60 20 24
6 90 31 63
7 97 86 23
8 52 64 13
This table shows the ratings of articles in “Province1” in the numbers from 1-4 and in “Province2” in the numbers from 5-8.
Concatenation of data rows:
Now we will concatenate the data rows in the table “dt” into a single vector “ratings”, as follows:
> ratings = c(t(as.matrix(dt))) # response data
> ratings
[1] 85 50 63 65 42 42 75 57 82 70 65 92 60 20 24 90 31 63 97 86 23 52 64 13
We have two factor levels, which will be written in R software as follows:
> a1 = c(“articlea”, “articleb”, “articlec”) # 1st factor levels
> p2 = c(“Province1”, “Province2”) # 2nd factor levels
Number of these factors will be assigned as follows:
> k1 = length(a1) # number of 1st factors
> k2 = length(p2) # number of 2nd factors
We have four observations per factor levels as mentioned above.
> n = 4 # number of observations per treatment level
Now we create a vector of treatment factors that corresponds to each element of “ratings” with 1st factor levels with the gl function as follows:
> fl1 = gl(k1, 1, n*k1*k2, factor(a1))
> fl1
[1] articlea articleb articlec articlea articleb articlec articlea articleb
[9] articlec articlea articleb articlec articlea articleb articlec articlea
[17] articleb articlec articlea articleb articlec articlea articleb articlec
Levels: articlea articleb articlec
Now we create a vector of treatment factors that corresponds to each element of “ratings” with 2nd factor levels with the gl function as follows:
> fl2 = gl(k2, n*k1, n*k1*k2, factor(p2))
> fl2
[1] Province1 Province1 Province1 Province1 Province1 Province1 Province1
[8] Province1 Province1 Province1 Province1 Province1 Province2 Province2
[15] Province2 Province2 Province2 Province2 Province2 Province2 Province2
[22] Province2 Province2 Province2
Levels: Province1 Province2
ANOVA analysis:
Now we will apply the function aov as follows:
> avrf = aov(ratings ~ fl1 * fl2) # include interaction
> summary(avrf)
Df Sum Sq Mean Sq F value Pr(>F)
fl1 2 2878 1439.0 3.388 0.0564 .
fl2 1 1134 1134.4 2.671 0.1196
fl1:fl2 2 1931 965.4 2.273 0.1318
Residuals 18 7645 424.7
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Basic interpretation of the results:
This shows that the articles are almost equally important but not equally popular in different provinces.
Here, 0.0564 is almost equal to 0.05 and it is almost significant. It shows that we may reject the null hypothesis and can say that the articles are almost equally important.
Here, we see that the p-values (Pr(>F)) of 0.1196 and 0.1318 are greater than the 0.05 (5%) significance level that is why we do not reject the null hypothesis (H0), i.e. we would not be able to prove our theory. This shows that the articles are not equally popular in different provinces and there is not a possible interaction between the articles and newspapers from different provinces.
You can ask questions or give suggestions in the comments.[/sociallocker]