Project One In this project, you are asked to prepare a short data analysis on life expectancy across
Project One
In this project, you are asked to prepare a short data analysis on life expectancy across the world for a program director at UNICEF. You will use the data set ‘demog.sav’ from UNICEF. It contains measurements at the country level on health and economic indicators in 2003. Measurements were recorded for most of the 178 countries in the survey; these countries are grouped into 7 unique regions. The data set contains the following variables that will be useful for you:
e0: Life expectancy at birth (years). It measures the number of years of life a newborn is expected to live.
GNIpc: Gross National Income per capita (in US dollars).
IMR: Infant mortality rate. It is measured by number of deaths (aged one year or younger) per 1,000 live births.
Region : the region of the world (note the value labels for this variable).
Country: the name of the country or territory.
The following sections will guide you through the analysis.
Section One:
Describe the distribution of the two demographic variables--life expectancy at birth and infant mortality rate distribution around the world to the program director. This paragraph should include the following information:
- First check the data, report how many countries do not have data on either of these two variables.
- What are the ranges of the life expectancy and infant mortality rate around the world? Which countries have the lowest and highest values of each of the variables?
- Describe these two variables using important measures of central tendency and spread.
- Include histograms of both variables. Comment on the distributions regarding the modality, skewness based on the graph.
- Based on the shape of the histogram, explain to the program director how she should read from the mean and median values of two variables and which measures will be more appropriate summary of central tendency.
- Present a scatterplot with life expectancy plotted along Y axis and infant mortality rate along X axis. Comment on the relationship between these two variables.
Section Two:
Describe the regional variation in life expectancy around the world to the program director. This paragraph should include the following information:
-
Fill the template table below based on the SPSS output.
Table 1 Life expectancy at birth (years) around the world Region Mean N valid N missing Std. Deviation Minimum Maximum Skewness Sub-Saharan Africa 47.1 41 1 9.380 33 72 1.019 Middle East and North Africa 68.72 18 0 7.954 46 77 -1.764 South Asia 61.75 8 0 8.681 43 73 -1.431 East Asia and Pacific 67.37 19 6 7.463 50 78 -0.944 Latin America and Caribbean 70.86 28 4 5.675 50 78 -1.913 Central/east Europe 71 17 0 2.646 67 74 -0.275 Developed ("West") 77.23 31 5 2.629 71 82 -0.921 Total 64.98 162 16 12.974 33 82 -0.918 - Rank the regions based on the average life expectancy from low to high. Also point out which regions consist of countries that are most homogeneous and which regions are least homogeneous (in terms of life expectancy).
- Include any other interesting features (based on the information from the table) about the regional differences in life expectancy that you like to point out to the director. (This is an open question. You need to at least make one comment here).
Section Three:
Note that the regional variation is partly due to different level of development in the member countries. Prepare a paragraph to explain to the director that the observed variability in life expectancy can be partly due to the level of development. In this report, you will use per capita gross national income (GNIpc) as a proxy for the level of development*. Please classify the countries into high income and low income groups based on World Bank’s criteria: If a country’s per capita GNI is greater than $4,000 (>=$4,000), then it is classified as "high income country"; if a country’s per capita GNP is less than $4,000 (<=$3,999), then it is classified as "low income country". (*Note: in the variable "region", some countries are classified as "developed (west)". Please do not use this definition to measure the level of development.)
- Explain to the program director about the classification rule (as given in the question). Present the following information: number of countries in each of the two income categories, the average life expectancy and its standard error in each category.
- Present a boxplot of life expectancy by income category. Comment on the differences in life expectancy distribution (both the central tendency and spread) between the two categories.
- Discuss what factors might contribute to the observed difference in life expectancy distribution between high and low income countries. Hint: You should think what factors can affect life expectancy and are related to different levels of income at country level. You can either look at variable that are already included in this dataset, or think more broadly about the determinants of life expectancy. But you do not need to use statistics to support your comments. (This is an open question. You need to make at least one reasonable point).
Deliverable: Word Document
