Archive

Posts Tagged ‘hypothesis’

The distribution of rho…

There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library

SuppDists

deals with this problem efficiently.

library(SuppDists)

plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density")
plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue")
plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green")
plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid()

legend("topleft", col=c("black","steelblue","red","green"),lty=1,
		legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre>

This is how it looks like,


Now, let’s construct a table of critical values for some arbitrary or not significance levels.

q=c(.025,.05,.075,.1,.15,.2)
xtabs(qPearson(p=q, N=23, rho = 0, lower.tail = FALSE, log.p = FALSE) ~ q )
# q
#     0.025      0.05     0.075       0.1      0.15       0.2
# 0.4130710 0.3514298 0.3099236 0.2773518 0.2258566 0.1842217

We can calculate p-values as usual too…

1-pPearson(.41307,N=23,rho=0)
# [1] 0.0250003
Advertisements

Show me the mean(ing)…

Well testing a bunch of samples for the largest population mean isn’t that common yet a simple test is at hand. Under the obvious title “The rank sum maximum test for the largest K population means” the test relies on the calculation of the sum of ranks under the combined sample of size {{nk}}, where {{n}} is the common size of the k’s samples.

For illustration purposes the following data are used. They consist of 6 samples of 5 observations.

> data
[1]  4.17143986  1.31264787  0.12109036  0.63031601  1.56705511  0.58817076
[7]  1.98011001  1.63226118 -0.03869368  1.80964611  4.80878278  0.67015153
[13]  2.07602321  1.52952749  1.68483297  2.00147364  9.30173048  0.58331012
[19]  2.49537140  1.31229842  1.40193543  0.11906268  4.76253012  1.26550467
[25]  0.69497074 -0.27612056  5.05751484  1.96589383  2.58427547 -0.36979229

Next we construct a convenient matrix

data.mat=expand.grid(x=rep(NA,5),sample=c("1","2","3","4","5","6"))
data.mat$x=data
data.mat$Rank=rank(data.mat$x)

and we compute the sample ranks

R=rep(NA,6)
for (i in 1:6)
{
R[i]=sum(subset(data.mat,data.mat$sample==i)$Rank)
}
> rank(R)
[1] 3 2 5 6 1 4

So we would test whether the 4th sample has the largest population mean. First we need critical values.

##Critical valus 115/119/127/134 for 10%,5%,1% and 0.1%
> R[rank(R)==length(R)]>119
FALSE

So, we cannot accept the hypothesis of the largest mean for the 4th sample.

Look it up… Gopal K. Kanji, 100 Statistical Tests , Sage Publications [google]

Categories: statistics Tags: , , ,