Archive

Archive for the ‘statistics’ Category

Probability manipulations with Mathematica…

There comes a time that a statistician needs to do some ananytic calculations. There more than a bunch of tools to use but I usually prefer Mathematica or Maple. Today, I’m gonna use Mathematica to do a simple exhibition.

Let’s set this example upon the  U(2 \theta _1-\theta _2\leq x\leq 2 \theta _1+\theta _2)   distribution.

pfun = PDF[UniformDistribution[{2*Subscript[θ, 1] - Subscript[θ, 2],
2*Subscript[θ, 1] + Subscript[θ, 2]}], x]

\begin{cases}  \frac{1}{2 \theta _2} & 2 \theta _1-\theta _2\leq x\leq 2 \theta _1+\theta _2 \\  0 & \text{True}  \end{cases}

One of the most intensive calculations is the characteristic function (eq. the moment generating function). This is straightforward to derive.

cfun=CharacteristicFunction[UniformDistribution[
{2*Subscript[θ, 1]-Subscript[θ, 2],2*Subscript[θ, 1]+Subscript[θ, 2]}],x]

-\frac{i \left(-e^{i x \left(2 \theta _1-\theta _2\right)}+e^{i x \left(2 \theta _1+\theta _2\right)}\right)}{2 x \theta _2} .

The Table[] command calculates for us the raw moments for our distribution.

Table[Limit[D[cfun, {x, n}], x -> 0]/I^n, {n, 4}]

\left\{2 \theta _1,\frac{1}{3} \left(12 \theta _1^2+\theta _2^2\right),2 \theta _1 \left(4 \theta _1^2+\theta _2^2\right),16 \theta _1^4+8 \theta _1^2 \theta _2^2+\frac{\theta _2^4}{5}\right\} .

Calculate the sample statistics.

T=List[8.23,6.9,1.05,4.8,2.03,6.95];
{Mean[T],Variance[T]}

\{4.99333,8.46171\} .

Now, we can use a simple moment matching technique to get estimates for the parameters.

Solve[{Mean[T]-2*Subscript[θ, 1]==0,-(2*Subscript[θ, 1])^2+
1/3 (12 Subscript[θ, 1]^2+\!\*SubsuperscriptBox[\(θ\), \(2\), \(2\)])-
Variance[T]==0},{Subscript[θ, 2],Subscript[θ, 1]}]

\left\{\left\{\theta _1\to 2.49667,\theta _2\to -5.03836\right\},\left\{\theta _1\to 2.49667,\theta _2\to 5.03836\right\}\right\} .

Check the true value for the \theta _2.

Reduce[2 Subscript[θ, 1]-Subscript[θ, 2]<=2 Subscript[θ, 1]+Subscript[θ, 2],
Subscript[θ, 2]]

\theta _1\in \text{Reals}\&\&\theta _2\geq 0 .

Then, \left\{\left\{\theta _1\to 2.49667,\theta _2\to  5.03836\right\}\right\} .

Advertisements

A von Mises variate…

Inspired from a mail that came along the previous random generation post the following question rised :

How to draw random variates from the Von Mises distribution?

First of all let’s check the pdf of the probability rule, it is f(x):=\frac{e^{b \text{Cos}[y-a]}}{2 \pi  \text{BesselI}[0,b]}, for -\pi \leq x\leq \pi .

Ok, I admit that Bessels functions can be a bit frightening, but there is a work around we can do. The solution is a Metropolis algorithm simulation. It is not necessary to know the normalizing constant, because it will cancel in the computation of the ratio. The following code is adapted from James Gentle’s notes on Mathematical Statistics .

n <- 1000
x <- rep(NA,n)
a <-1
c <-3
yi <-3
j <-0

i<-2
while (i < n) {
	i<-i+1
	yip1 <- yi + 2*a*runif(1)- 1
	if (yip1 < pi & yip1 > - pi) {
		if (exp(c*(cos(yip1)-cos(yi))) > runif(1)) yi <- yip1
		else yi <- x[i-1]
		x[i] <- yip1
	}
}
hist(x,probability=TRUE,fg = gray(0.7), bty="7")
lines(density(x,na.rm=TRUE),col="red",lwd=2)

The distribution of rho…

There was a post here about obtaining non-standard p-values for testing the correlation coefficient. The R-library

SuppDists

deals with this problem efficiently.

library(SuppDists)

plot(function(x)dPearson(x,N=23,rho=0.7),-1,1,ylim=c(0,10),ylab="density")
plot(function(x)dPearson(x,N=23,rho=0),-1,1,add=TRUE,col="steelblue")
plot(function(x)dPearson(x,N=23,rho=-.2),-1,1,add=TRUE,col="green")
plot(function(x)dPearson(x,N=23,rho=.9),-1,1,add=TRUE,col="red");grid()

legend("topleft", col=c("black","steelblue","red","green"),lty=1,
		legend=c("rho=0.7","rho=0","rho=-.2","rho=.9"))</pre>

This is how it looks like,


Now, let’s construct a table of critical values for some arbitrary or not significance levels.

q=c(.025,.05,.075,.1,.15,.2)
xtabs(qPearson(p=q, N=23, rho = 0, lower.tail = FALSE, log.p = FALSE) ~ q )
# q
#     0.025      0.05     0.075       0.1      0.15       0.2
# 0.4130710 0.3514298 0.3099236 0.2773518 0.2258566 0.1842217

We can calculate p-values as usual too…

1-pPearson(.41307,N=23,rho=0)
# [1] 0.0250003

In search of a random gamma variate…

March 16, 2010 9 comments

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

{{X}_{i}}\tilde{\ }Exp(\lambda )\Rightarrow \sum\limits_{i}{{{X}_{i}}}\tilde{\ }Ga(k,\lambda ).

The code to do this is the following

rexp1 <- function(lambda, n) {
  u <- runif(n)
  x <- -log(u)/lambda
  }

rgamma1 <- function(k, lambda) {
  sum(rexp1(lambda, k))
}

This works unfortunately only for the case k\in \mathbb{N}
Read more…

\pi day!

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of \pi is the experiment tha Buffon performed using a needle of length,l. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?

estPi<- function(n, l=1, t=2) {
 m <- 0
 for (i in 1:n) {
 x <- runif(1)
 theta <- runif(1, min=0, max=pi/2)
 if (x < l/2 * sin(theta)) {
 m <- m +1
 }
 }
 return(2*l*n/(t*m))
}

So, an estimate would be…
Read more…

In a nls star things might be different than the lm planet…

March 10, 2010 1 comment

The nls() function has a well documented (and discussed) different behavior compared to the lm()’s. Specifically you can’t just put an indexed column from a data frame as an input or output of the model.

> nls(data[,2] ~ c + expFct(data[,4],beta), data = time.data,
+ start = start.list)
Error in parse(text = x) : unexpected end of input in "~ "

The following will work, when we assign things as vectors.

> nls(y ~ c + expFct(x,beta), data = time.data,start = start.list)
#
# Formula: y ~ c + expFct(x,beta)
#
# Parameters:
#        Estimate Std. Error t value Pr(>|t|)    
# c     3.7850419  0.0042017  900.83  < 2e-16 ***
# beta  0.0053321  0.0003733   14.28 1.31e-12 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.01463 on 22 degrees of freedom
#
# Number of iterations to convergence: 1
# Achieved convergence tolerance: 7.415e-06
Categories: statistics Tags: , , , ,

A quicky..

February 23, 2010 1 comment

If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.

An example.

# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))

# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
#   (1) the standard deviations (sdev) of the principal components,
#   (2) the matrix of eigenvectors (rotation),
#   (3) the principal component data (x),
#   (4) the centering (center) and
#   (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)
 Read more...