The Use of Notation in Basic Statistics – Part II
This is a follow up from the previous section , where the most common notations for descriptive statistics were presented. It is crucial to understand how notation is used, as notation in Math and Statistics are used as shortcuts , and as such, if you do not understand their meaning, you will be soon lost and REALLY not understanding what is being talked about.
In the following paragraphs we will continue this series, attempting to clarify the use of notation in Inferential Statistics, where more profuse and sophisticated notation are used, and consequently you should pay attention to what comes.
Notation in Inferential Statistics
The following symbols and notations are commonly used when working with Inferential Statistics. These symbols are still used throughout most of your Statistics class.
· \(\mu\): This is the generic symbol the represents the population mean. This is a parameter (because it is constant that is not constructed with sample information). Sometimes \(\mu\) comes with a sub-index to represent the population mean of which variable we are talking about. For example, if we see \({{\mu }_{X}}\), that symbol refers to the population mean of the random variable \(X\). In general terms, if\(f\left( x \right)\) is the distribution (density) random variable \(X\), the population mean is computed with the following expression:
\[{{\mu }_{X}}=\int\limits_{-\infty }^{\infty }{x\,f\left( x \right)dx}\]
in the case of a continuous random variable, or
\[{{\mu }_{X}}=\sum\limits_{k}{{{x}_{k}}f\left( {{x}_{k}} \right)}\]
for the case of a discrete distribution.
A couple of things to keep in mind: Although \(\mu\) is the generic symbol to refer to the population mean, there are certain distributions that customarily use different symbols. For example, if X is a Poisson random variable, the tradition is to use \(\lambda\) as the symbol for the population mean. The important thing to keep in mind is that it is only a notation, this is, a CONVENTION.
· \({{\sigma }^{2}}\): This is the population variance, which is computed as
\[{{\sigma }^{2}}=\int\limits_{-\infty }^{\infty }{{{x}^{2}}\,f\left( x \right)dx}-{{\mu }^{2}}=\int\limits_{-\infty }^{\infty }{{{x}^{2}}\,f\left( x \right)dx}-{{\left( \int\limits_{-\infty }^{\infty }{xf\left( x \right)dx} \right)}^{2}}\]
This is population parameter, because it is a fixed number (not a random variable) that is not constructed from sample information). Same as with the population mean, it is customary to add a sub-index to represent the underlying variable. This is, \(\sigma _{X}^{2}\) represents the population variance of the random variable X, whereas \(\sigma _{Y}^{2}\) represents the population variance of the random variable Y.
Again, same as in the previous case, this is a most common NOTATION (or shortcut, if you will) to write the population variance. But there are cases where the tradition is to use something else. For example, if X has a Poisson distribution, then we mentioned before that the population mean is referred to as \(\lambda\), and it turns out that when computing the population variance, we find that it is equal to \(\lambda\) as well. In such case, we would write \(\sigma _{X}^{2}=\lambda\). So, please, please, do not get confused between a the notation part of \(\sigma _{X}^{2}=\lambda\) and the calculation part of \(\sigma _{X}^{2}=\lambda\).
· \(\sigma\): This is the population standard deviation, which is computed by taking the square root of the population variance, or simply by using the formula below,
\[\sigma =\sqrt{\int\limits_{-\infty }^{\infty }{{{x}^{2}}\,f\left( x \right)dx}-{{\left( \int\limits_{-\infty }^{\infty }{xf\left( x \right)dx} \right)}^{2}}}\]
This is parameter, because it is a fixed number that is not constructed with sample information.
· \({{H}_{0}}\): This is the notation for the null hypothesis . In hypothesis testing, the null hypothesis is the hypothesis of no effect
· \({{H}_{A}}\): This is the notation for the alternative hypothesis . In hypothesis testing, the alternative hypothesis is the hypothesis that can be proved if the sample data is sufficiently unlikely, if the null hypothesis Ho were true
· \(\Theta\): This is a less commonly used symbol, and it represents the set of all possible values for the population parameter. For example, if X is a normally distributed random variable, with a population variance of \({{\sigma }^{2}}=1\), and an unknown population mean \(\mu\), the set of all possible values that can be taken by \(\mu\) is the whole real line. So, in other words, we would have in that case that \(\Theta =\left( -\infty ,\infty \right)\).
· \({{\Theta }_{0}}\): In the context of the above symbol, this symbol represents the possible values taken by a population parameter as stated in the null hypothesis of a hypothesis test. For example, assume that X is a normally distributed random variable, with a population variance of \({{\sigma }^{2}}=1\), and an unknown population mean, and we are interested in testing the following null and alternative hypotheses:
\[\begin{align} & {{H}_{0}}:\mu =0 \\ & {{H}_{A}}:\mu \ne 0 \\ \end{align}\]
In that case, we would have that \({{\Theta }_{0}}=\left\{ 0 \right\}\) .
· \({{\Theta }_{A}}\): Along the lines of the previous symbols, this symbol represents the possible values taken by a population parameter as stated in the alternative hypothesis of a hypothesis test. For example, assume that X is a normally distributed random variable, with a population variance of \({{\sigma }^{2}}=1\), and an unknown population mean, and we are interested in testing the following null and alternative hypotheses:
\[\begin{align} & {{H}_{0}}:\mu =0 \\ & {{H}_{A}}:\mu \ne 0 \\ \end{align}\]
In that case, we would have that \({{\Theta }_{A}}=\left( -\infty ,0 \right)\cup \left( 0,\infty \right)\) . Notice that by definition, we need to have that \(\Theta ={{\Theta }_{0}}\cup {{\Theta }_{A}}\).
· \(\rho\): This corresponds to the population correlation between variables X and Y. In order to be more explicit about the variables involved, the notation can be written as \(\rho \left( X,Y \right)\) or even \({{\rho }_{X,Y}}\).
· \(\pi\): Although not universal, this symbol is used to represent a population proportion. Along those lines, \({{\pi }_{1}}\) will represent the population proportion (for some categorical variable) in population 1, etc. Sometimes, a plain \(p\) is used to represent a population proportion, but I think that is a bad idea, although, more or less, \(p\) is the most commonly used notation to represent a population proportion.
· \(\sim\): The "tilde" symbol is used to represent that a certain random variable has a specified distribution. For example, if we see: \(X\tilde{\ }Poisson\left( \lambda \right)\), we interpret it as: "X is a random variable that has a Poisson distribution with mean \(\lambda\)".