Viewed k times. Improve this question. Tal Galili Tal Galili You ask them "why this? Watch this, it precisely answers you question.
Add a comment. Active Oldest Votes. Improve this answer. Michael Lew Michael Lew In essence, the correction is n-1 rather than n-2 etc because the n-1 correction gives results that are very close to what we need. More exact corrections are shown here: en. What if it overestimates? Show 1 more comment. Dror Atariah 2 2 silver badges 15 15 bronze badges.
Why is it that the total variance of the population would be the sum of the variance of the sample from the sample mean and the variance of the sample mean itself? How come we sum the variances?
See here for intuition and proof. Show 4 more comments. I have to teach the students with the n-1 correction, so dividing in n alone is not an option. As written before me, to mention the connection to the second moment is not an option.
Although to mention how the mean was already estimated thereby leaving us with less "data" for the sd - that's important. Regarding the bias of the sd - I remembered encountering it - thanks for driving that point home.
In other words, I interpreted "intuitive" in your question to mean intuitive to you. Thank you for the vote of confidence :. The loose of the degree of freedom for the estimation of the expectancy is one that I was thinking of using in class. But combining it with some of the other answers given in this thread will be useful to me, and I hope others in the future. Show 3 more comments. You know non-mathers like us can't tell.
I did say gradually. Mooncrater 2 2 gold badges 8 8 silver badges 19 19 bronze badges. Any way to sum-up the intuition, or is that not likely to be possible? I'm not sure it's really practical to use this approach with your students unless you adopt it for the entire course though. Mark L. Stone Mark L. Stone I am unhappy to see the downvotes and can only guess that they are responding to the last sentence, which could easily be seen as attacking the O. Richard Hansen Richard Hansen 1 1 silver badge 3 3 bronze badges.
Dilip Sarwate Dilip Sarwate Ben Ben B Student B Student. Even though the equation is interesting, I don't get how it could be used to teach n-1 intuitively?
This shows the sleight-of-hand that has occurred: somehow, you need to justify not including such self-pairs. Because they are included in the analogous population definition of variance, this is not an obvious thing.
Vivek Vivek 1 1 silver badge 8 8 bronze badges. Laurent Duval Laurent Duval 2, 1 1 gold badge 19 19 silver badges 33 33 bronze badges. Indeed, you seem to use "sample variance" in the sense of a variance estimator , which is more confusing yet. Sahil Chaudhary Sahil Chaudhary 4 4 bronze badges. Grothendieck G.
Grothendieck 1, 6 6 silver badges 12 12 bronze badges. We see that the biased measure of variance is indeed biased. The average variance is lower than the true variance indicated by the dashed line , for each sample size. We also see that the unbiased variance is indeed unbiased. On average, the sample variance matches that of the population variance. The results of using the biased measure of variance reveals several clues for understanding the solution to the bias. We see that the amount of bias is larger when the sample size of the samples is smaller.
So the solution should be a function of sample size, such that the required correction will be smaller as the sample size increases. Ideally we would estimate the variance of the sample by subtracting each value from the population mean.
This is where the bias comes in. In fact, the mean of a sample minimizes the sum of squared deviations from the mean. This means that the sum of deviations from the sample mean is always smaller than the sum of deviations from the population mean. The only exception to that is when the sample mean happens to be the population mean.
Below are two graphs. In each graph I show 10 data points that represent our population. I also highlight two data points from this population, which represents our sample. In the left graph I show the deviations from the sample mean and in the right graph the deviations from the population mean. We see that in the left graph the sum of squared deviations is much smaller than in the right graph. The sum is smaller when using the sample mean compared to using the population mean.
This is true for any sample you draw from the population again, except when the sample mean happens to be the same as the population mean. The difference is small now, but using the sample mean still results in a smaller sum compared to using the population mean.
In short, the source of the bias comes from using the sample mean instead of the population mean. The sample mean is always guaranteed to be in the middle of the observed data, thereby reducing the variance, and creating an underestimation. Now that we know that the bias is caused by using the sample mean, we can figure out how to solve the problem.
Looking at the previous graphs, we see that if the sample mean is far from the population mean, the sample variance is smaller and the bias is large.
The resulting SD is the SD of those particular values. It makes no sense to compute the SD this way if you want to estimate the SD of the population from which those points were drawn. It only makes sense to use n in the denominator when there is no sampling from a population, there is no desire to make general conclusions.
The goal of science is always to generalize, so the equation with n in the denominator should not be used. The only example I can think of where it might make sense is in quantifying the variation among exam scores.
But much better would be to show a scatterplot of every score, or a frequency distribution histogram. Analyze, graph and present your scientific work easily with GraphPad Prism.
No coding required. Home Support. How ito calculate the standard deviation 1. Compute the square of the difference between each value and the sample mean.
0コメント