Set 4 Part I Poisson v ersu s normal approximation to binomial Let X have Binomial (n = 1000, p) distribution.
Problem Set 4
Part I
- Poisson v ersu s normal approximation to binomial
Let X have Binomial (n = 1000, p) distribution. For each p and each event A below use MINITAB (and possibly a spreadsheet package of your choice) to compute the following quantities:
- exact probability \({{P}_{b}}\left( A \right)\) of A under the binomial model;
-
probability \({{P}_{p}}\left( A \right)\) using Poisson approximation and the relative error of approximation:
\[\frac{|{{P}_{p}}\left( A \right)-{{P}_{b}}\left( A \right)|}{{{P}_{b}}\left( A \right)}\] - probability \({{P}_{n}}\left( A \right)\) of A using normal approximation, without and with the continuity correction of 0.5, and relative errors of approximation:
Specify parameters of the distribution used for approximation.
Organize the results of your computation in a table and briefly discuss your findings for each case.
- p=0.01; A = [X =i] for i= 1, 2,..., 10.
- p=0.3; A =[250-f-10.(i-1)<X≤250-f-10.i] for I = 1,2,...,10.
(2) What n is large enough for the CLT to work?
The setup here is as follows:
- You have an i.i.d. sample X 1 , X 2 , . . . , X n from some distribution.
-
The CLT states that for all n large enough the sample mean \({{\bar{X}}_{n}}={{n}^{-1}}\sum\limits_{i=1}^{n}{{{X}_{i}}}\) has approximately normal distribution. Equivalently, \({{S}_{n}}=n{{\bar{X}}_{n}}\) must also be roughly normal (why?).
• This means that if we have in independent samples, each of size n, from the given distribution, the histogram of sample means based on samples of size n must have the shape of a normal density.
1. Data from uniform distribution - Use MINITAB to generate 20 columns of data, each having in = 10 rows, from a Uniform (0, 1) distribution.
- For the value of n of your choice (between 1 and 20) compute the row sums using the first n columns of your data (Calc -> Row Statistics). Store the sums in a column, say, C1.
- Obtain a histogram of your data in C1 and compare its shape with that of the histogram of a sample of size in = 10 from a normal distribution.
-
Find the smallest n, for which the histogram of the sums of the first n columns looks like a normal density. This is your "n large enough" for the CLT.
What to include in your answer: - A histogram for a case when n is not "large enough".
-
A histogram for the case of your least n that is "large enough".
For each of these histograms, state what n is and what are (theoretical) expectation and variance of the distribution of S n . - A couple of intelligent sentences that summarize your findings.
- Please do not submit printouts of more than two histograms or of your simulated data!
2. Data from Poisson distribution
Repeat the previous exercise for m = 10 4 and Xi’s being i.i.d. Poisson (λ = 0.1) random variables. Your objective is to find "n large enough" for the normal approximation to work.
(Hint: do not generate n columns of data from Poisson (0.1) distribution and sum up the data values to get a sample from the distribution of Sn like you did in the previous subproblem. The n you are after may exceed 20, so find the distribution of Sn (results from HW3 should be useful) for a general n and generate data from the distribution of Sn directly.)
Report the same collection of summaries as in the previous subproblem.
- Briefly compare and summarize your findings for the subproblems (2.1) and (2.2).
-
Suppose X
1
, X
2
,. . . , X
n
is an i.i.d. sample from a scale family, e.g., exponential. Does the size of the "n large enough" depend on the scale parameter? Briefly explain: if yes, then how; if no, then why? If you do not see the answer, try doing a few simulations for an exponential distribution for various values of the scale/rate parameter to gain intuition. In the end, however, your answer should not be a mere restatement of your empirical findings.
5. What is "n large enough" for the CLT to hold if Xi’s are i.i.d. N(µ, σ 2 )? Justify your answer. - Memoryless/lack of memory property
-
Verify the "memoryless property": if X is an exponential random variable with expectation α, then
\[\Pr \left( X>s+t|X>t \right)=\Pr \left( X>s \right)\]
for all s, t>0. - Let \(Y=\left\lceil X \right\rceil \). (Recall the distribution of Y from HW3.) If s and t are any positive integers, is it true that \(\Pr \left( Y>s+t|Y>t \right)=\Pr \left( Y>s \right)\) ? Justify your answer.
(4) Laplace density
Let X be an exponential random variable with rate λ = 1 and let B be a Bernoulli (p = 0.5) random variable. Assume X and B are independent. Define \(Y=X\left( 2B-1 \right)\). (In words, you are taking a standard exponential random variable and assigning a sign to it at random.)
-
Find the c.d.f. of Y. (Hint: use the total probability formula
\(\Pr \left( A \right)=\Pr \left( A|B=1 \right)\Pr \left( B=1 \right)+\Pr \left( A|B=0 \right)\Pr \left( B=0 \right)\)
for an appropriately defined event) - Find the p.d.f. of Y.
- Let F be the collection of cdfs of r defined as \(\mu +\sigma Y\), for all \(t\in \mathbb{R}\) and σ > 0. Is F a location/scale/scale-and-location family? Justify your answer by checking the definition.
- Find the p.d.f \({{f}_{Z}}\), expectation and variance of \(Z=\mu +\sigma Y\).
-
Recall the notation for the p-th quantile/percentile:
\[\eta \left( p \right)=\min \left\{ x:\,\,F\left( x \right)\ge p \right\},\,\,p\in \left( 0,1 \right)\]
Find \(\eta \left( 0.87 \right)\) when \(\mu =3.14\) and σ = 0.77 (You can do this either by hand or using MINITAB.) - Without using the inverse of c.d.f of Z, show how to use η(.87) to find η(.13). (Hint: use symmetry of f Z . If you are still hesitant, examine the derivation of pdf of χ 2 1 rv we did in class.)
(5) Experiments with distributions
Use MINITAB to explore characteristics of probability distributions. For 3 distributions (and respective parameters) of your choice, generate m = 1000 observations from each distribution (Calc -> Random Data). For each of the three samples, prepare
- a histogram;
- a Q-Q plot of the sample quantiles against theoretical normal (i.e., Gaussian) quantiles (be careful about axes; I want you to do precisely what I am asking);
- a box plot (Graph -> Boxplot);
Use the above graphs to determine whether the distribution is symmetric (if so, locate the center of symmetry), whether its tails are lighter or heavier than normal, whether the distribution is skewed (if so, in which direction?).
(6) Sum of two independent χ 2 rvs
A chi-squared random variable with \(k\in \left\{ 1,2,3.... \right\}\) degrees of freedom has a pdf
\[f\left( x \right)={{c}_{k}}{{x}^{k/2-1}}{{e}^{-x/2}}\]for x ≥ 0. The number c k is called the normalization constant; it is determined by the piece of f(x) that involves x as \(\frac{1}{{{c}_{k}}}=\int\limits_{0}^{\infty }{{{x}^{k/2-1}}{{e}^{-x/2}}dx}\) (why?).
Let X p and X q be two independent chi-squared random variables with p and q degrees of freedom, \(p,q\in \left\{ 1,2,3,.... \right\}\)
- Use the convolution formula to find the density of \({{X}_{p+q}}={{X}_{p}}+{{X}_{q}}\). Is \({{X}_{p+q}}\)a chi-squared rv? If so, what is the degrees of freedom’s parameter?
- If p = q = 1, what is the density of \({{X}_{p+q}}\)(also determine the normalization constant)? Does it look familiar?
Deliverable: Word Document
