We Moved @ statsravingmad.com/blog

## In search of a random gamma variate…

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

.

The code to do this is the following

rexp1 <- function(lambda, n) { u <- runif(n) x <- -log(u)/lambda } rgamma1 <- function(k, lambda) { sum(rexp1(lambda, k)) }

This works unfortunately only for the case

Read more…

## \pi day!

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of is the experiment tha Buffon performed using a needle of length,. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?

estPi<- function(n, l=1, t=2) { m <- 0 for (i in 1:n) { x <- runif(1) theta <- runif(1, min=0, max=pi/2) if (x < l/2 * sin(theta)) { m <- m +1 } } return(2*l*n/(t*m)) }

So, an estimate would be…

Read more…

## A normal philosophy…

The following was sent by email to me. It originates to Youden.

THE NORMAL LAW OF ERROR STANDS OUT IN THE EXPERIENCE OF MANKIND AS ONE OF THE BROADEST GENERALIZATIONS OF NATURAL PHILOSOPHY . IT SERVES AS THE GUIDING INSTRUMENT IN RESEARCHES IN THE PHYSICAL AND SOCIAL SCIENCES AND IN MEDICINE AGRICULTURE AND ENGINEERING . IT IS AN INDISPENSABLE TOOL FOR THE ANALYSIS AND THE INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT

*–W.J. Youden*

Youden is one of the truly inspiring statisticians to me.

## In a nls star things might be different than the lm planet…

The nls() function has a well documented (and discussed) different behavior compared to the lm()’s. Specifically you can’t just put an indexed column from a data frame as an input or output of the model.

```
> nls(data[,2] ~ c + expFct(data[,4],beta), data = time.data,
+ start = start.list)
Error in parse(text = x) : unexpected end of input in "~ "
```

The following will work, when we assign things as vectors.

```
> nls(y ~ c + expFct(x,beta), data = time.data,start = start.list)
#
# Formula: y ~ c + expFct(x,beta)
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# c 3.7850419 0.0042017 900.83 < 2e-16 ***
# beta 0.0053321 0.0003733 14.28 1.31e-12 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.01463 on 22 degrees of freedom
#
# Number of iterations to convergence: 1
# Achieved convergence tolerance: 7.415e-06
```

## A known problem with a twist…

The following was sent as an email to me. It’s the old fashioned gablre’s ruin problem with one more option. If you love theory then this is a good treat 😉

A gambler plays the following game. He starts with dollars, and is trying to end up with dollars. At each go he chooses an integer between 1 and and then tosses a fair coin. If the coin comes up heads, then he wins dollars, and if it comes up tails then he loses dollars.

The game finishes if he runs out of money (in which case he loses) or reaches dollars (in which case he wins). Prove that whatever strategy the gambler adopts (that is, however he chooses each stake based on what has happened up to that point), the probability that the game finishes is 1 and the probability that the gambler wins is .

## Thou shalt…(could do better with the title,huh?)

Are these the ten commandments? MacIntyre, P.D., serves as Moses of our industry… 😉

I. Get as large a sample as you can.

A. Large N provides for more stable measurement of variables, they are less likely to be affected by outliers.

B. Large N also provides for distributions that are more normal, or better reflect the full range of scores in the population.II. Run as few statistical tests as you can.

A. running several tests increase the risk of a Type 1 error

B. focus your results as much as possibleIII. Never report the same data twice.

A. all of the statistics you have learned are part of the same model, thus if one test is significant (e.g., correlation) then a different statistic will also be (e.g., regression).

B. when doing tests of means following ANOVA, especially for analysis of interactions, include each mean in only 1 test (if possible).IV. When using multivariate tests, always get the most for the least.

A. in factor analysis, account for high percentage of variance with as few factors as possible.

B. in multiple regression, get the highest R2 with the fewest predictors.

C. in path analysis, specify as few paths as possible that account for most of the correlationsV. Use the most reliable measures possible.

A. always test for the reliability of scales or multi-item tests before computing a total score for the test.

B. if a variable is unreliable, its correlations with other variables are almost always lower than they should be. Thus, you underestimate the true degree of correlation but you don’t know by how much.VI. Plan your analysis before collecting the data.

A. there are some studies whose data cannot be analyzed because the analyses were not planned in advance.

B. control for potential problems when designing the study, not when analyzing the data.VII. Use statistics to support the written (verbal) argument, not to substitute for it.

A. also, do not write statistics that you don’t understand, it shows.VIII. Never do multiple tests without controlling for Type 1 error.

A. never do several t-tests when ANOVA is appropriate

B. never do several ANOVAs when MANOVA is appropriate

C. never do post hoc t-tests or many correlations without adjusting alpha (or at least admitting to the risk of Type 1 error when writing them up)IX. Never try to prove the null hypothesis.

A. do not design a study to show “no difference” between means or “no correlation” between variables.X. Others

A. Never trust a factor with less than three substantial loadings.

B. Never interpret a correlation without looking at the scatterplot

C. Look for outliers but never toss them out unless you know that the data are inaccurate

D. Don’t tug on superman’s cape, spit into the wind, offer to pet a porcupine, or walk downtown with a live duck on your head.

## PoRtable…

Jobless as I might be, I do have some clients for data analysis. I try not to visit them in their office coz then things get really slow and time-consuming. When I can’t escape this, the worst thing is tuning data and software with client. So, I have a USB with portable versions of my toolbox. Yesterday, I installed R and today I tested it. It worked fine!

If you want to try it, download the software platform from here. Install it (when prompt to extract/unzip) into a file in your USB, ie /OS, and then download the R portable from here. From the portable OS menu (down-right on your screen) select install software and browse to the R portable file. After a couple of minutes you’re done!

*Special thanks to Andrew Redd for providing us with the portable version.*