A look at some data
Here comes some more about using R, about teeth in fossil fish and more. As none of this has been published yet, the active scientists among you will understand why some information (specimen number, etc…) is not provided. On the other hand, we will be looking at actual real data, microtexture data from the teeth of a fossil fish. This is how sexy it actually looks when I get it:
| Tooth |
Pos1 |
Pos2 |
Rank |
Par1 |
Par2 |
Par3 |
Par4 |
Par5 |
Par6 |
Par7 |
Par8 |
| Tooth8 |
posterior |
inner |
24
|
0.604
|
0.495
|
11.327
|
12.43
|
8.819
|
21.249
|
2.69E+06
|
0.48
|
| Tooth8 |
anterior |
inner |
23
|
0.431
|
0.473
|
6.763
|
1.882
|
1.488
|
3.37
|
1.85E+06
|
0.73
|
| Tooth8 |
anterior |
inner |
23
|
0.306
|
-0.196
|
8.157
|
2.112
|
2.283
|
4.395
|
2.32E+06
|
0.806
|
| Tooth8 |
anterior |
outer |
22
|
0.318
|
-0.71
|
16.741
|
4.555
|
4.979
|
9.535
|
3.41E+06
|
0.586
|
| Tooth7 |
posterior |
inner |
21
|
0.295
|
-1.452
|
9.657
|
1.378
|
2.381
|
3.759
|
2.15E+06
|
0.097
|
| Tooth7 |
central |
central |
20
|
0.249
|
-0.492
|
14.577
|
2.951
|
4.578
|
7.529
|
3.16E+06
|
0.745
|
| Tooth7 |
anterior |
inner |
19
|
0.222
|
-0.676
|
7.371
|
1.883
|
1.616
|
3.499
|
1.61E+06
|
0.752
|
| Tooth6 |
posterior |
outer |
18
|
0.506
|
-0.127
|
6.922
|
3.762
|
3.62
|
7.382
|
2.39E+06
|
0.723
|
| Tooth6 |
posterior |
outer |
18
|
0.416
|
2.006
|
21.118
|
5.462
|
3.661
|
9.122
|
2.82E+06
|
0.237
|
| Tooth6 |
posterior |
inner |
17
|
0.335
|
-0.155
|
9.519
|
2.801
|
3.192
|
5.992
|
1.81E+06
|
0.88
|
| Tooth6 |
anterior |
inner |
16
|
0.244
|
-0.103
|
4.857
|
2.737
|
1.922
|
4.66
|
1.79E+06
|
0.776
|
| Tooth6 |
anterior |
outer |
15
|
0.481
|
0.048
|
9.606
|
2.038
|
3.634
|
5.671
|
2.08E+06
|
0.814
|
| Tooth4 |
posterior |
outer |
14
|
0.269
|
0.145
|
5.837
|
2.586
|
0.925
|
3.511
|
2.47E+06
|
0.746
|
| Tooth4 |
posterior |
inner |
13
|
0.342
|
-0.057
|
5.235
|
2.44
|
1.09
|
3.53
|
2.28E+06
|
0.674
|
| Tooth4 |
anterior |
inner |
12
|
0.147
|
-0.552
|
5.111
|
0.687
|
1.311
|
1.997
|
1.03E+06
|
0.86
|
| Tooth3 |
posterior |
outer |
11
|
0.249
|
-0.79
|
7.007
|
1.277
|
3.059
|
4.335
|
2.10E+06
|
0.75
|
| Tooth3 |
posterior |
inner |
10
|
0.484
|
-1.233
|
13.272
|
2.867
|
4.424
|
7.291
|
1.62E+06
|
0.768
|
| Tooth3 |
central |
central |
9
|
0.362
|
-0.881
|
7.34
|
1.132
|
1.838
|
2.969
|
1.87E+06
|
0.787
|
| Tooth3 |
anterior |
inner |
8
|
0.293
|
-1.381
|
11.455
|
1.572
|
3.303
|
4.874
|
1.92E+06
|
0.624
|
| Tooth2 |
posterior |
outer |
7
|
0.646
|
-3.454
|
25.219
|
1.882
|
4.75
|
6.631
|
2.65E+06
|
0.436
|
| Tooth2 |
posterior |
inner |
6
|
0.3
|
-0.532
|
9.535
|
2.425
|
3.356
|
5.78
|
1.65E+06
|
0.745
|
| Tooth2 |
central |
central |
5
|
0.487
|
-0.681
|
6.548
|
2.25
|
3.906
|
6.157
|
1.71E+06
|
0.755
|
| Tooth2 |
anterior |
inner |
4
|
0.221
|
-0.214
|
5.203
|
1.404
|
1.801
|
3.205
|
1.54E+06
|
0.858
|
| Tooth1 |
posterior |
outer |
3
|
0.394
|
-0.042
|
11.118
|
4.264
|
2.561
|
6.825
|
1.83E+06
|
0.863
|
| Tooth1 |
posterior |
inner |
2
|
0.25
|
-0.795
|
9.868
|
1.921
|
2.775
|
4.696
|
2.08E+06
|
0.665
|
| Tooth1 |
anterior |
inner |
1
|
0.218
|
-0.087
|
11.446
|
2.482
|
2.121
|
4.603
|
1.28E+06
|
0.782
|
There already are a few modifications: Pos1 and Pos2 indicate where the data comes from on the tooth: the anterior, posterior or central part of the inner, outer or central rings of ornamentation. If you have a hard time picturing it, just imagine that a tooth from that fossil fish is similar to a simple target, bull’s eye kind of like, with one outer ring, one inner ring and a central point, and the data comes from top to bottom in the order: top outer ring, top inner ring, central point, bottom inner ring, bottom outer ring. I could not always get good data from everywhere, though, and sometimes I thought I had bad data so I sampled again, in the end I have two measurements for similarly placed areas. Oh, and we have several teeth along a tooth row so… stack them targets one atop another along the wall. And let’s stop mentioning targets, before I start getting confused as to what is the topic here. From the most anterior part to the most posterior, positions have been numbered in the “rank” column.
Just to let you know, I did not collect the data for the fun of it. Nor was it gathered in that manner for trivial reasons, no, we will be testing a hypothesis and further exploring the data here. The hypothesis being: “Microtextural data is similar at any position along the tooth row.” The alternative one: “Microtextural data does change with the sampling position.” Why test that? Well if only a part of the tooth is preserved undamaged, it would be useful to know whether it can be compared with the rest of our sample anyways. Otherwise, one would need to only compare similar positions between them, to compare like with like, also take good notes and be very careful when acquiring data.
First let’s get the numbers: just copy the data frame above in anything that will allow you to save it and read it. For this example and because I write from a computer using the most widespread OS, I will use notepad to paste the copied data and save it in the R folder as “test.txt”. Then one just needs to set it as the working directory for the session. For me, the R folder is on the D:/ drive so in R I just set the directory as:
setwd(“D:/R”)
And I bring the data in, before checking it.
test<-read.table(“test.txt”,header=TRUE)
str(test)
We can now test the hypothesis in several ways. What I will use here is a standard analysis of variance. One could write a lot about the analysis of variance and the many precautionary steps one should take before using it, but this is a topic for another time. We will assume that there is no particular issue to apply it here. Analysis of variance, or anova, can be done in R using a command line such as:
aov(test$Par1~test$Tooth)
Which tells you a lot or pretty little, depending on how much you know about it. But it does not provide what we are interested in: a result telling us if the variance in the parameter 1 (Par1) can be explained by the fact that it comes from different teeth (“Tooth” factor). For that, we need to use the summary function
summary(aov(test$Par1~test$Tooth))
What have we got here? Similar information (degrees of freedom, sum of squares) and also the F-statistic, and the p-value. What the software does is that it compares the F-statistic calculated to that which you would expect if the factor Tooth had no influence on the data whatsoever. The p-value expresses broadly an estimate of the chances of making a mistake when considering that the factor explains the observed values. As a rule of thumb, usually a p-value below 0.05 or 0.01 is used to reject the null hypothesis (according to which the tested variable is not under the influence of the factor). Here you should have a P-value of 0.3333, henceforth, we cannot show that the tooth factor influences the data. Perhaps it does but because of bad luck our data does not show it, in that case we accept the null hypothesis since there is no support for the alternative one.
Then we can do the same test for all parameters, and also look at different factors if we want, don’t hesitate to give a try:
summary(aov(test$Par2~test$Tooth))
summary(aov(test$Par3~test$Tooth))
summary(aov(test$Par1~test$Pos1))
summary(aov(test$Par1~test$Pos2))
Typing it all is a bit tedious, though. One thing that can be done quite easily in R is to have the same commands done on similar series of data. We can do it with “for”. In fact we will use the for function to have the anovas (with the tooth factor) done for columns 5 to 12 (Par1 to Par8), just make sure that it makes sense, so “for any value of “i” in the series 5, 6, 7, 8, 9, 10, 11, 12, perform an anova testing whether the values in column number “i” can be explained by the tooth factor”. In R code this gives:
for(i in 5:12){summary(aov(test[,i]~test$Tooth))}
Surely you typed/entered it and saw nothing happen. The reason is that we asked the software to perform the analysis, not to print the results on the screen:
for(i in 5:12){print(summary(aov(test[,i]~test$Tooth)))}
All right, now we have the results but we do not know for which parameter. Luckily, we just need to have the software print that as well, before the test, i.e. we ask R to perform two tasks, just do it by adding “;” between the commands:
for(i in 5:12){print(colnames(test)[i]);print(summary(aov(test[,i]~test$Tooth)))}
Better, no? The “tooth” factor does not seem to have a significant influence on the data. Let’s see if another one has. Perhaps “Pos1” (position 1) would make sense: could areas from the posterior part of the tooth have a different microtexture than the ones from the anterior part?
for(i in 5:12){print(colnames(test)[i]);print(summary(aov(test[,i]~test$Pos1)))}
Still nothing. What about the actual position along the tooth row? We can do the test with the variable Rank, used as a poor surrogate for the “distance from the anterior part of the jaw”.
for(i in 5:12){print(colnames(test)[i]);print(summary(aov(test[,i]~test$Rank)))}
HAHA! The parameter 7 seems to vary with position along the jaw. That is a result, and as a matter of fact it can be biologically interesting, as it means that from anterior to posterior, the values we can expect to get for it are not identical, and since Par7 is expressing some aspect of the homogeneity of the texture, we can conclude that from anterior to posterior, the different areas are more (or less) heterogeneous in texture. We also know that this is not because the different teeth are worn in different ways. Well they are, but it is not which tooth is considered that explains the values of the data (before, it was assumed that each tooth had its own “microtexture signature”, i.e. that any point on the tooth was as good as another but that the teeth were different in respective signatures), rather the position of the sampled point matters. If you recall the previous entry about R, now is the time to make good use of it to look at the data. Apparently, values get higher as one gets farther from the anterior part of the jaw.
plot(Par7~Rank,data=test,pch=16)
Time to have fun! We can also have one symbol for each position (anterior, posterior, central) and one colour for each considered ring (inner, outer, central point). And do you remember the linear models and predictions as well? No reason not to make use of it. Before that, we will also separate the plot device into several ones in order to check on our progress:
par(mfrow=c(2,2))
plot(Par7~Rank,data=test,pch=c(15,16,17)[Pos1],col=c(“red”,”green”,”blue”)[Pos2],ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”)
Model1<-lm(Par7~Rank,data=test)
Pred1<-predict(Model1,se.fit=TRUE,interval=”confidence”)
par(new=TRUE,bty=”n”,xaxt=”n”,yaxt=”n”)
matplot(test$Rank,cbind(Pred1$fit),lty=c(1,3,3),type=”l”,col=”black”,lwd=2,ylim=c(min(test$Par7),max(test$Par7)),xlab=NA,ylab=NA)
You know what? I would rather have those three lines displayed as on line in the middle of an area, a bit transparent if at all possible. The line part is fairly simple if using what we just typed:
par(xaxt=”l”,yaxt=”l”)#this is just to have the axes drawn
plot(Par7~Rank,data=test,pch=c(15,16,17)[Pos1],col=c(“red”,”green”,”blue”)[Pos2],ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”)
par(new=TRUE,bty=”n”,xaxt=”n”,yaxt=”n”)#and here so as not to have new axes drawn
matplot(test$Rank,Pred1$fit[,1],pch=20,type=”p”,col=rgb(0.1,0.1,0.7,0.5,1),ylim=c(min(test$Par7),max(test$Par7)),xlab=NA,ylab=NA)
The way the colour is set here uses the function rgb (for “red”,”green”,”blue”), the first numbers stand for the values of red, green and blue respectively, the fourth one is for the opacity and the fifth one sets the maximum value for the first four. i.e. here we have a colour set as 10% of the maximum red, 10% maximum green, 70% maximum blue, half-transparent.
You may notice that two points appear more opaque than others. Those are the positions for which two samples were taken, so two points are drawn one over another. We could manage to have only one point, but for the time being it is unnecessary hassle. The fun bit starts here, for we wish to display the confidence interval as an area. For that we will use the function polygon and a little trick. Polygon needs coordinates to display the shape correctly, so we can’t really just give values of X and Y for the upper and lower limits. See for yourselves:
par(xaxt=”l”,yaxt=”l”)
plot(Par7~Rank,data=test,pch=c(15,16,17)[Pos1],col=c(“red”,”green”,”blue”)[Pos2],ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”)
polygon(test$Rank,Pred1$fit[,2])
polygon(test$Rank,Pred1$fit[,3])
And just concatenating the two series of values won’t do, it would just give crossed lines as one goes from an extremity of the X axis to the other:
polygon(rep(test$Rank,2),c(Pred1$fit[,2], Pred1$fit[,3]))
In the end, I find it more convenient to have an object containing the values in the order. Since Rank is in decreasing order in the data frame, it will need sorting:
PolX<-c(sort(test$Rank,decreasing=FALSE),test$Rank)
That’s for the X part of the polygon, for the Y part, it is going to be tricky, since we have 26 values to sort but using the sort function may end up in mismatched results. Actually, here, with the values we have, it could work, but imagine that the slope of the line were not that steep, then sorting the values in increasing or decreasing order would not do. We cannot either use the values in the Rank variable as a factor to order values from 1 to 26 with a factor going from 1 to 24. But one could use a factor made from a sequence from 26 to 1, then the polygon should be fine:
PolY1<-as.numeric(c(Pred1$fit[,2][as.factor(seq(26,1))],Pred1$fit[,3]))
par(xaxt=”l”,yaxt=”l”)
plot(Par7~Rank,data=test,pch=c(15,16,17)[Pos1],col=c(“red”,”green”,”blue”)[Pos2],ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”)
polygon(x=PolX,y=PolY1)

And now to start colouring the polygon, without border, then we add the regression line:
par(xaxt=”l”,yaxt=”l”)
plot(Par7~Rank,data=test,pch=c(15,16,17)[Pos1],col=c(“red”,”green”,”blue”)[Pos2],ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”)
polygon(x=PolX,y=PolY1,border=NA,col=rgb(0.2,0.2,0.6,0.2,1))
par(new=TRUE,bty=”n”,xaxt=”n”,yaxt=”n”)#and here so as not to have new axes drawn
matplot(test$Rank,Pred1$fit[,1],pch=20,type=”p”,col=rgb(0.1,0.1,0.7,0.5,1),ylim=c(min(test$Par7),max(test$Par7)),xlab=NA,ylab=NA)
For further improvement we can also add an area showing the 99% confidence interval instead of the 95% or on top of it.
Pred2<-predict(Model1,se.fit=TRUE,interval=”confidence”,level=0.99)
PolY2<-as.numeric(c(Pred2$fit[,2][as.factor(seq(26,1))],Pred2$fit[,3]))
par(xaxt=”l”,yaxt=”l”)
plot(Par7~Rank,data=test,ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”,type=”n”)
polygon(x=PolX,y=PolY1,border=NA,col=rgb(0.2,0.2,0.6,0.2,1))
par(new=TRUE,bty=”n”,xaxt=”n”,yaxt=”n”)#and here so as not to have new axes drawn
matplot(test$Rank,Pred1$fit[,1],pch=20,type=”p”,col=rgb(0,0,0.7,0.5,1),ylim=c(min(test$Par7),max(test$Par7)),xlab=NA,ylab=NA)
polygon(x=PolX,y=PolY2,border=NA,col=rgb(0,0.6,0,0.2,1))
points(Par7~Rank,data=test,pch=c(21,22,23)[Pos1],bg=c(“red”,”green”,”blue”)[Pos2],col=”black”)

We have had our fair share of graphical fun, and looking at the graph you may have noticed something striking: the green points all seem to have lower values than the blue ones. This is something that was not tested for in the anovas from earlier (though you may have cleverly checked it for yourself). This shows how plotting the data with points and colours according to different factors can give you an idea of where the important information might be:
for(i in 5:12){print(colnames(test)[i]);print(summary(aov(test[,i]~test$Pos2)))}
As you can see, the values we get for the parameter 7 can be explained by the factor considered here. This factor distinguishes the outer, inner and central parts of the teeth. Now we know that the outside part of the tooth is not heterogeneous in the same way as the inner ring, and perhaps not the center part, though for it we have only three points, which is very low, so we will not draw conclusions for that bit. There are actually ways to highlight which categories are different one from another, but this is not our purpose here. Just know that we could explore the data so as to see if the data is under the influence of only the factor Pos2, the distance from the anterior or an interaction with other factors e.g.:
summary(aov(Par7~Pos2*Pos1,data=test)) #only Pos2 has an influence on the data
summary(aov(Par7~Pos2*Tooth,data=test)) #only Pos2 has an influence on the data
summary(aov(Par7~Pos2*Rank,data=test)) #Both Pos2 and Rank influence the data, though each in its own way (their interaction is not significant, btw this is an ancova, analysis of covariance, but more about that another time)
So there is no need to worry, what we have done so far is not useless or uninformative, there is just more to the data. Were the interaction significant, it could be justified to look at separate models, but here the two elements of importance (higher values away from the anterior extremity and higher values for the outer ring) are visible enough. We can conclude that for that one parameter there is ground to pay attention to where the data comes from along the jaw. All other parameters (examined here and many others) actually do not vary with the distance from the anterior extremity or the considered ring of the teeth. Why does the heterogeneity vary with which part of the ornamentation is sampled? I would suggest that the bundles formed of crystals of hydroxylapatite, the hard parts of the tooth enameloid or dentine, are not oriented in a similar way all over the tooth. Actually, in many fish species, they have a radiating pattern in the enameloid (just like rays around the sun on a child’s drawing), so they are oriented more parallel to the long axis of the jaw on the outer ring than on the inner one.
Just a potential explanation, associated with a pretty graph.
par(bg=”transparent”,mfrow=c(1,1),xaxt=”l”,yaxt=”l”,fg=”navajowhite”,col.axis=”navajowhite”,col.main=”navajowhite”,col.lab=”navajowhite”,col.sub=”navajowhite”)
plot(Par7~Rank,data=test,ylim=c(min(test$Par7),max(test$Par7)),main=”Par7~distance from the anterior”,xlab=”Distance from the anterior”,bty=”l”,type=”n”)
polygon(x=PolX,y=PolY1,border=NA,col=rgb(0.4,0.4,0.9,0.5,1))
par(new=TRUE,bty=”n”,xaxt=”n”,yaxt=”n”)#and here so as not to have new axes drawn
matplot(test$Rank,Pred1$fit[,1],pch=20,type=”p”,col=”slateblue”,ylim=c(min(test$Par7),max(test$Par7)),xlab=NA,ylab=NA)
polygon(x=PolX,y=PolY2,border=NA,col=rgb(0,0.8,0,0.4,1))
points(Par7~Rank,data=test,pch=c(21,22,23)[Pos1],bg=c(“mistyrose”,”violet”,”orchid4″)[Pos2])

If you're the sharing kind:
Like this:
Be the first to like this post.