## In search of a random gamma variate…

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

${{X}_{i}}\tilde{\ }Exp(\lambda )\Rightarrow \sum\limits_{i}{{{X}_{i}}}\tilde{\ }Ga(k,\lambda )$.

The code to do this is the following

rexp1 <- function(lambda, n) {
u <- runif(n)
x <- -log(u)/lambda
}

rgamma1 <- function(k, lambda) {
sum(rexp1(lambda, k))
}

This works unfortunately only for the case $k\in \mathbb{N}$

Categories: statistics

## \pi day!

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of $\pi$ is the experiment tha Buffon performed using a needle of length,$l$. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?

estPi<- function(n, l=1, t=2) {
m <- 0
for (i in 1:n) {
x <- runif(1)
theta <- runif(1, min=0, max=pi/2)
if (x < l/2 * sin(theta)) {
m <- m +1
}
}
return(2*l*n/(t*m))
}

So, an estimate would be…

## A normal philosophy…

The following was sent by email to me. It originates to Youden.

                                 THE
NORMAL
LAW OF ERROR
STANDS OUT IN THE
EXPERIENCE OF MANKIND
GENERALIZATIONS OF NATURAL
PHILOSOPHY . IT SERVES AS THE
GUIDING INSTRUMENT IN RESEARCHES
IN THE PHYSICAL AND SOCIAL SCIENCES AND
IN MEDICINE AGRICULTURE AND ENGINEERING .
IT IS AN INDISPENSABLE TOOL FOR THE ANALYSIS AND THE
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT

–W.J. Youden

Youden is one of the truly inspiring statisticians to me.

Categories: infos Tags: , ,

## In a nls star things might be different than the lm planet…

March 10, 2010 1 comment

The nls() function has a well documented (and discussed) different behavior compared to the lm()’s. Specifically you can’t just put an indexed column from a data frame as an input or output of the model.

> nls(data[,2] ~ c + expFct(data[,4],beta), data = time.data,
+ start = start.list)
Error in parse(text = x) : unexpected end of input in "~ "


The following will work, when we assign things as vectors.

> nls(y ~ c + expFct(x,beta), data = time.data,start = start.list)
#
# Formula: y ~ c + expFct(x,beta)
#
# Parameters:
#        Estimate Std. Error t value Pr(>|t|)
# c     3.7850419  0.0042017  900.83  < 2e-16 ***
# beta  0.0053321  0.0003733   14.28 1.31e-12 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.01463 on 22 degrees of freedom
#
# Number of iterations to convergence: 1
# Achieved convergence tolerance: 7.415e-06
Categories: statistics Tags: , , , ,

## A known problem with a twist…

The following was sent as an email to me. It’s the old fashioned gablre’s ruin problem with one more option. If you love theory then this is a good treat 😉

A gambler plays the following game. He starts with $r$ dollars, and is trying to end up with $\alpha$ dollars. At each go he chooses an integer $s$ between 1 and $min(r,\alpha-r)$ and then tosses a fair coin. If the coin comes up heads, then he wins $s$ dollars, and if it comes up tails then he loses $s$ dollars.

The game finishes if he runs out of money (in which case he loses) or reaches $\alpha$ dollars (in which case he wins). Prove that whatever strategy the gambler adopts (that is, however he chooses each stake based on what has happened up to that point), the probability that the game finishes is 1 and the probability that the gambler wins is $r/\alpha$.

Categories: probability Tags: , ,

## Thou shalt…(could do better with the title,huh?)

Are these the ten commandments? MacIntyre, P.D., serves as Moses of our industry… 😉

I. Get as large a sample as you can.
A. Large N provides for more stable measurement of variables, they are less likely to be affected by outliers.
B. Large N also provides for distributions that are more normal, or better reflect the full range of scores in the population.

II. Run as few statistical tests as you can.
A. running several tests increase the risk of a Type 1 error
B. focus your results as much as possible

III. Never report the same data twice.
A. all of the statistics you have learned are part of the same model, thus if one test is significant (e.g., correlation) then a different statistic will also be (e.g., regression).
B. when doing tests of means following ANOVA, especially for analysis of interactions, include each mean in only 1 test (if possible).

IV. When using multivariate tests, always get the most for the least.
A. in factor analysis, account for high percentage of variance with as few factors as possible.
B. in multiple regression, get the highest R2 with the fewest predictors.
C. in path analysis, specify as few paths as possible that account for most of the correlations

V. Use the most reliable measures possible.
A. always test for the reliability of scales or multi-item tests before computing a total score for the test.
B. if a variable is unreliable, its correlations with other variables are almost always lower than they should be. Thus, you underestimate the true degree of correlation but you don’t know by how much.

VI. Plan your analysis before collecting the data.
A. there are some studies whose data cannot be analyzed because the analyses were not planned in advance.
B. control for potential problems when designing the study, not when analyzing the data.

VII. Use statistics to support the written (verbal) argument, not to substitute for it.
A. also, do not write statistics that you don’t understand, it shows.

VIII. Never do multiple tests without controlling for Type 1 error.
A. never do several t-tests when ANOVA is appropriate
B. never do several ANOVAs when MANOVA is appropriate
C. never do post hoc t-tests or many correlations without adjusting alpha (or at least admitting to the risk of Type 1 error when writing them up)

IX. Never try to prove the null hypothesis.
A. do not design a study to show “no difference” between means or “no correlation” between variables.

X. Others
B. Never interpret a correlation without looking at the scatterplot
C. Look for outliers but never toss them out unless you know that the data are inaccurate
D. Don’t tug on superman’s cape, spit into the wind, offer to pet a porcupine, or walk downtown with a live duck on your head.

Categories: infos