The test-choosing wizard implies that you will know whether your data have a Normal distribution and should use this knowledge to decide whether to compare means of two groups or do a Mann-Whitney U test. But you won’t and you shouldn’t.

Firstly, people who want to use the Mann-Whitney U test because of its resistance to outliers should always use it — you can’t reliably know that your data are Normal enough to prefer the t-test. After all, the reason you are doing the test is that you don’t even know if the means are different, so it’s unlikely you know the whole distribution accurately, and if the data are exactly from a Normal distribution the Mann-Whitney U test performs very well.

Secondly, even non-Normal distributions have means. These are well-estimated by sample means (the Law of Large Numbers) and have a Normal distribution except in very small samples (the Central Limit Theorem). If you want to compare the means, you can use the t-test unless the sample size is tiny.

Thirdly, the Mann-Whitney U test also has assumptions, which don’t go away with large samples, and strange things can happen if they are violated. For an example, Google for “efron non-transitive dice’ and think about what how the U statistic works.

]]>People so “un-liked” it that the thread got closed:

http://stackoverflow.com/questions/2707887/why-might-someone-say-r-is-not-a-programming-language-closed ]]>