In search of a random gamma variate…

March 16, 2010 9 comments

One of the most common exersices given to Statistical Computing,Simulation or relevant classes is the generation of random numbers from a gamma distribution. At first this might seem straightforward in terms of the lifesaving relation that exponential and gamma random variables share. So, it’s easy to get a gamma random variate using the fact that

{{X}_{i}}\tilde{\ }Exp(\lambda )\Rightarrow \sum\limits_{i}{{{X}_{i}}}\tilde{\ }Ga(k,\lambda ).

The code to do this is the following

rexp1 <- function(lambda, n) {
  u <- runif(n)
  x <- -log(u)/lambda

rgamma1 <- function(k, lambda) {
  sum(rexp1(lambda, k))

This works unfortunately only for the case k\in \mathbb{N}
Read more…


\pi day!

It’s π-day today so we gonna have a little fun today with Buffon’s needle and of course R. A well known approximation to the value of \pi is the experiment tha Buffon performed using a needle of length,l. What I do in the next is only to copy from the following file the function estPi and to use an ergodic sample plot… Lame,huh?

estPi<- function(n, l=1, t=2) {
 m <- 0
 for (i in 1:n) {
 x <- runif(1)
 theta <- runif(1, min=0, max=pi/2)
 if (x < l/2 * sin(theta)) {
 m <- m +1

So, an estimate would be…
Read more…

A normal philosophy…

The following was sent by email to me. It originates to Youden.

                             LAW OF ERROR
                           STANDS OUT IN THE
                         EXPERIENCE OF MANKIND
                        AS ONE OF  THE BROADEST
                       GENERALIZATIONS OF NATURAL
                     PHILOSOPHY . IT SERVES AS THE

–W.J. Youden

Youden is one of the truly inspiring statisticians to me.

Categories: infos Tags: , ,

In a nls star things might be different than the lm planet…

March 10, 2010 1 comment

The nls() function has a well documented (and discussed) different behavior compared to the lm()’s. Specifically you can’t just put an indexed column from a data frame as an input or output of the model.

> nls(data[,2] ~ c + expFct(data[,4],beta), data =,
+ start = start.list)
Error in parse(text = x) : unexpected end of input in "~ "

The following will work, when we assign things as vectors.

> nls(y ~ c + expFct(x,beta), data =,start = start.list)
# Formula: y ~ c + expFct(x,beta)
# Parameters:
#        Estimate Std. Error t value Pr(>|t|)    
# c     3.7850419  0.0042017  900.83  < 2e-16 ***
# beta  0.0053321  0.0003733   14.28 1.31e-12 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 0.01463 on 22 degrees of freedom
# Number of iterations to convergence: 1
# Achieved convergence tolerance: 7.415e-06
Categories: statistics Tags: , , , ,

A known problem with a twist…

The following was sent as an email to me. It’s the old fashioned gablre’s ruin problem with one more option. If you love theory then this is a good treat 😉

A gambler plays the following game. He starts with r dollars, and is trying to end up with \alpha dollars. At each go he chooses an integer s between 1 and min(r,\alpha-r) and then tosses a fair coin. If the coin comes up heads, then he wins s dollars, and if it comes up tails then he loses s dollars.

The game finishes if he runs out of money (in which case he loses) or reaches \alpha dollars (in which case he wins). Prove that whatever strategy the gambler adopts (that is, however he chooses each stake based on what has happened up to that point), the probability that the game finishes is 1 and the probability that the gambler wins is r/\alpha .

Categories: probability Tags: , ,

Thou shalt…(could do better with the title,huh?)

February 26, 2010 Leave a comment

Are these the ten commandments? MacIntyre, P.D., serves as Moses of our industry… 😉

I. Get as large a sample as you can.
A. Large N provides for more stable measurement of variables, they are less likely to be affected by outliers.
B. Large N also provides for distributions that are more normal, or better reflect the full range of scores in the population.

II. Run as few statistical tests as you can.
A. running several tests increase the risk of a Type 1 error
B. focus your results as much as possible

III. Never report the same data twice.
A. all of the statistics you have learned are part of the same model, thus if one test is significant (e.g., correlation) then a different statistic will also be (e.g., regression).
B. when doing tests of means following ANOVA, especially for analysis of interactions, include each mean in only 1 test (if possible).

IV. When using multivariate tests, always get the most for the least.
A. in factor analysis, account for high percentage of variance with as few factors as possible.
B. in multiple regression, get the highest R2 with the fewest predictors.
C. in path analysis, specify as few paths as possible that account for most of the correlations

V. Use the most reliable measures possible.
A. always test for the reliability of scales or multi-item tests before computing a total score for the test.
B. if a variable is unreliable, its correlations with other variables are almost always lower than they should be. Thus, you underestimate the true degree of correlation but you don’t know by how much.

VI. Plan your analysis before collecting the data.
A. there are some studies whose data cannot be analyzed because the analyses were not planned in advance.
B. control for potential problems when designing the study, not when analyzing the data.

VII. Use statistics to support the written (verbal) argument, not to substitute for it.
A. also, do not write statistics that you don’t understand, it shows.

VIII. Never do multiple tests without controlling for Type 1 error.
A. never do several t-tests when ANOVA is appropriate
B. never do several ANOVAs when MANOVA is appropriate
C. never do post hoc t-tests or many correlations without adjusting alpha (or at least admitting to the risk of Type 1 error when writing them up)

IX. Never try to prove the null hypothesis.
A. do not design a study to show “no difference” between means or “no correlation” between variables.

X. Others
A. Never trust a factor with less than three substantial loadings.
B. Never interpret a correlation without looking at the scatterplot
C. Look for outliers but never toss them out unless you know that the data are inaccurate
D. Don’t tug on superman’s cape, spit into the wind, offer to pet a porcupine, or walk downtown with a live duck on your head.

Categories: infos


February 24, 2010 Leave a comment

Jobless as I might be, I do have some clients for data analysis. I try not to visit them in their office coz then things get really slow and time-consuming. When I can’t escape this, the worst thing is tuning data and software with client. So, I have a USB with portable versions of my toolbox. Yesterday, I installed R and today I tested it. It worked fine!

If you want to try it, download the software platform from here. Install it (when prompt to extract/unzip) into a file in your USB, ie /OS, and then download the R portable from here. From the portable OS menu (down-right on your screen) select install software and browse to the R portable file. After a couple of minutes you’re done!

Special thanks to Andrew Redd for providing us with the portable version.

Categories: infos Tags: , ,