Birthday Problem Simulation Using Excel Simulations can be conducted using a variety of computer-based
Birthday Problem Simulation Using Excel
Simulations can be conducted using a variety of computer-based methodologies. This document will attempt to describe an example of the logic which could be employed to find the probability that among 25,50,75,100,500, or 1,000 randomly selected people, at least 2 people share the same birthday. For the cases where the number of random selected people or birthdays equals 20 and 35 , the steps necessary to find the target probabilities will be provided using MS Excel. Students are encouraged to either use this design or one of their own choosing to determine the remaining probabilities.
NOTE: These steps are relevant to MS Excel 2007. A very similar logic can be used using earlier versions of Excel.
-
Open an Excel file and save it to your desktop. For the purposes of this simulation, let January \(1^{\text {st }}=1\), January \(2^{\text {nd }}=2\), January \(3^{\text {rd }}=3, \ldots\) etc. Any number from 1 to 365 will represent a person's birthday on that day of the year. We will investigate the birthdays of 30 groups of people. For the first case that we will explore, each group will contain 20 randomly selected birthdays. Make sure that you label your spreadsheet appropriately.
NOTE: Your results will more accurately approach the theoretical probability if you use larger and larger groups of people. You may want to repeat this procedure for 40,50,60, etc. groups of people to see if there are differences in the results that you generate. -
Place the formula shown below within cells \(A 2, B 2, C 2, \ldots, A B 2, A C 2, A D 2\). This formula will generate a randomly selected birthday for the \(1^{\text {st }}\) person within each of the 30 groups.
\(=\operatorname{RANDBETWEEN(1,365)}\)
You should end up with Excel output which is shown on the next page.
NOTE: One thing to keep in mind as you review the following steps is that you will notice that the cell values will change up until the time that we are done. Why does this occur? Excel is on autocalculate. As the data is manipulated from step-to-step, the numbers will change. Don't be alarmed. What is important is the final output. - Highlight cells \(\mathrm{A} 2, \mathrm{~B} 2, \mathrm{C} 2, \ldots, \mathrm{AB} 2, \mathrm{AC} 2, \mathrm{AD} 2\) and then drag the formulas down to line 21 . This process will create the remaining randomly selected birthdays for the \(2^{\text {nd }}\) through \(20^{\text {th }}\) person within each of the 30 groups.
- At this point, you would like to identify all birthdays which match (duplicate) within each group.
- Highlight one column of cells (e.g., cells A2 to A21).
- Click on the Home tab.
- Click on Conditional Formatting.
- Click on Highlight Cells Rules.
- Click on Duplicate Values ...
- Click on OK
At this point, all randomly selected birthdays which match (duplicate) will be highlighted within the chosen column (e.g., cells A2 to A 21).
NOTE: If no matches or duplicates appear, then none of the cells within the chosen column will be highlighted.
Repeat this step-by-step process for each column to acquire duplicate values for each column (e.g., cells B2 to B21, cells C 2 to C21, D2 to \(D 21, \ldots, A B 2\) to A B 21, A C 2 to A C 21, A D 2 to A D 21.
On the next couple of pages, you will see how the Excel output should look after duplicates were identified for cells A2 to A21 and then how your spreadsheet should look after you complete this process for each column. Remember that the general appearance of your spreadsheet should look like this. The numbers will be different because we are randomly generating birthdays. One would expect the numbers to be different, but the overall observed final results to be similar.
Deliverable: Word Document
