How To Deal With the Central Limit Theorem, and is it Related to the Normal Distribution?
There must be a reason why the normal distribution is SO popular. I mean, if we consider that a normal distribution with a mean of \(\mu\) and variance \({{\sigma }^{2}}\) has a density function as the one shown below
\[f\left( x \right)=\frac{1}{\sqrt{2\pi {{\sigma }^{2}}}}\exp \left( -\frac{{{\left( x-\mu \right)}^{2}}}{2{{\sigma }^{2}}} \right)\]
then one must think that it is popular not precisely due to the simplicity of its density function.
Manipulating the Normal Distribution
Indeed, Stats students dread to have to deal with the normal distribution in regards to its algebraic manipulation because, granted, it can be cumbersome. For example, the density function \(f\left( x \right)\) presented above is indeed a density, as it can be proven (although it is not elementary to do so) that
\[\int\limits_{-\infty }^{\infty }{\frac{1}{\sqrt{2\pi {{\sigma }^{2}}}}\exp \left( -\frac{{{\left( x-\mu \right)}^{2}}}{2{{\sigma }^{2}}} \right)dx}=1\]
And since this density \(f\left( x \right)\) is a valid density, we must have then that
\[\int\limits_{-\infty }^{\infty }{\frac{x}{\sqrt{2\pi {{\sigma }^{2}}}}\exp \left( -\frac{{{\left( x-\mu \right)}^{2}}}{2{{\sigma }^{2}}} \right)dx}=\mu\]
and\[\int\limits_{-\infty }^{\infty }{\frac{{{x}^{2}}}{\sqrt{2\pi {{\sigma }^{2}}}}\exp \left( -\frac{{{\left( x-\mu \right)}^{2}}}{2{{\sigma }^{2}}} \right)dx}={{\mu }^{2}}+{{\sigma }^{2}}\]
which are not trivial to prove (especially the last one). So, yes, it is hard to algebraically deal with the normal distribution. But then, why is it so popular??
Standard Normal Distribution and Z-scores
One good reason, which is a probably strong enough reason on its own, is that via a very simple standardization process, we can reduce ANY normal distribution \(N\left( \mu ,{{\sigma }^{2}} \right)\) to the standard normal distribution, with is the normal distribution that has a mean of zero and a standard deviation of 1, or \(N\left( 0,1 \right)\). The standardization consists of reducing the original variable X to z-scores using the following expression:
\[Z=\frac{X-\mu }{\sigma }\]
Indeed, it can be proven that if X has a normal distribution with mean \(\mu\) and variance \({{\sigma }^{2}}\), \(N\left( \mu ,{{\sigma }^{2}} \right)\), then \(Z\) defined as
\[Z=\frac{X-\mu }{\sigma}\]
also has a normal distribution, but with mean 0 and standard deviation 1. This little reduction turns out to be EXTREMELY efficient, because by using we can reduce the calculation of ANY normal distribution probabilities to the calculation of probabilities for the standard normal distribution.
Have you even wondered why the back of the Stats textbooks come with normal distribution tables ONLY for the standard normal distribution? It is because all normal distributions can be reduced to the standard normal distributions, via z-scores, and it would be really impractical, or impossible, to print out ALL possible tables for all possible normal distributions.
Example: Assume that the mean weight of children in fifth grade is 72 pounds, with a standard deviation of 8 pounds, and the distribution follows normal distribution. Compute the probability that a random child weights less than 75.5 pounds.
Solution: Observe that the event \(X<75.5\) can be expressed equivalently as
\[X-72<75.5-72\]
Why? Because we simply subtracted 72 to both sides of the inequality, which does not change the solutions of the inequality. Along the same reasoning, I can divide both sides by 8 to get an equivalent event
\[\frac{X-72}{8}<\frac{75.5-72}{8}\]
PLEASE, DON'T GET CONFUSED HERE: All we are saying is that if X is a solution of \(X<75.5\), then X is also a solution of \(X-72<75.5-72\), and then X is also a solution of \(\frac{X-72}{8}<\frac{75.5-72}{8}\). And reversely, if X is a solution of \(\frac{X-72}{8}<\frac{75.5-72}{8}\), then X is also a solution of \(X-72<75.5-72\) and X is also a solution of \(X<75.5\). That is what we mean when we say that the events \(\left\{ X<75.5 \right\}\), \(\left\{ X-72<75.5-72 \right\}\) and \(\left\{ \frac{X-72}{8}<\frac{75.5-72}{8} \right\}\) are EQUIVALENT (this is, they define the same set of solutions).
So therefore, in this example, we need to compute the following probability, using z-scores:
\[\Pr \left( X<75.5 \right)=\Pr \left( \frac{X-72}{8}<\frac{75.5-72}{8} \right)=\Pr \left( Z<0.4375 \right)=0.6691\]
As you can see, standard with a certain normal distribution, I made the transformation to get an equivalent event that involves a Z-score, and then I can use any standard normal distribution table (or Excel) to compute the final probability.
The Central Limit Theorem (CLT)
If the above was not a strong enough reason for you to LOVE the normal distribution (in spite of its cumbersome algebraic shape), I'll give you a reason you cannot resists. It turns out that there are many types of probability distributions (I mean, MANY), that can have completely different properties than the normal distribution.
But, if you take repetitions of a random variable, from ANY distribution, and you compute their average, those averages will be (what you think?) dangerously resembling to a normal distribution, especially when the sample size (number of repetitions) is large.
So then, the process of taking averages of a sample of values coming from ANY probability distribution and now analyzing the distribution of those averages, we start seeing a normal distribution (when the sample size is large). Somehow, taking averages bends the original shape of the distribution and turns it into normal, REGARDLESS of the underlying distribution. This fact is one of the most amazing discoveries in Statistics, made by Carl Friederich Gauss. A word of caution, the Central Limit Theorem has a formal statistical formulation, which we won't include here, but it states that the sample averages CONVERGE to a normal distribution, in a certain probability sense. Without entering into too many technicalities, that means that for most cases, the sample averages have an APPROXIMATE normal distribution for a sufficiently large sample size. It is all too common that sometimes instructors give the wrong interpretation by saying that the distribution of sample averages BECOMES a normal distribution, which is not true in general (actually, it is only true when the underlying original distribution is normal).
So that is why the normal distribution is highly cherished: it is because it has this kind of magic property that taking averages of any distribution you will end up with something that looks fairly normal, if you take a sample size large enough.