Researchers have been attempting to evaluate the impact of employment and training policy ever since such
Problem 1:
Researchers have been attempting to evaluate the impact of employment and training policy ever since such programs became a part of federal social policy in the early 1960 's. The central problem has always been generating a credible estimate of what would have happened to those receiving the training, had such the program not been available.
In the mid-1980's two different committees commissioned by the National Academy of Sciences and the Department of Labor concluded that estimates using non-experimental comparison groups did not provide a sufficiently reliable basis for making federal policy. As a result, the U.S. Department of Labor began planning for an experimental evaluation of the training provided under the Job Training Partnership Act in 1986. The National JTPA study is the largest randomized training evaluation ever undertaken in the United States, collecting data on roughly 20,000 persons in 16 different sites around the country. Between November 1987 and September 1989, eligible persons who applied for JTPA services were screened and their baseline data was collected. During the application and assessment process, staff members explained to applicants that not all of them would be served and that a lottery would be conducted to determine who would participate in the training. Therefore, it was only after assessment that staff members telephoned a random assignment clerk at the central office to determine which applicants would be eligible to receive the training. About two thirds of the applicants were randomly assigned to receive JTPA training and one-third were assigned to a control group and prevented from receiving JTPA training.
For this problem set, we have provided you with a subset of the original JTPA data. The data is available on the course website (lalondeExperiment.csv). It contains baseline characteristics for all applicants as well as their total earnings over the 30 months following random assignment. Although the original program served both youth and adults, we will focus on the results for adults. Below is a list of most of the variables in your dataset.
- recid-Unique person identification code
- assignmt - Assignment to training (1= Treatment, 0= Control )
- site - Code identifying 1 of 16 sites
- training - Enrollment in training (1=Enrolled, 0= Not Enrolled)
- sex- Gender (1= Male, 0= Female )
- age - Age in years at random assignment
- earnings - Total earnings over 30 months following random assignment
- prevearn - Earnings in year prior to random assignment
- married =1 if Married and living with spouse, =0 Otherwise (some intermediate values)
- hsorged =1 if has HS Diploma or GED, =0 Otherwise (some intermediate values)
- black =1 if African American, =0 Otherwise
- hispanic =1 if Hispanic, 0 Otherwise
Note: Please round all your answers to three decimal places.
-
Obtain an unbiased estimate of the average effect of assignment on earnings not adjusting for any of the other variables. Report the estimate, the standard error and a 95 percent confidence interval. Briefly interpret substantive and statistical significance of this estimate. What feature of the data generation process allows us to claim this is truly an "effect"?
Answer: Assignment causes earnings to increase by 1390.091 dollar with a standard error or 409.0759. The t-value is 3.398126, so we reject the null at conventional levels. The CI is (588.1931; 2191.990). This effect is economically significant to the extent that at the mean previous earnings of 4273 dollar this would represent an increase of 32 percent. However, we have to recognize that this is cumulative earnings over a 30 month period. Another factor one may want to consider to judge the magnitude is the cost of the program itself.
We can claim this as an "effect" because the study has been randomized. Since the study has been randomized we can assume that exchangeability (or unconfoundedness) has been satisfied; in other words, we can assume that there are no confounding variables out there that will affect both assignment and the respondents' eventual income. Another way to think about this is that we have very good reason to think that the control and treatment groups look very similar to each other and that the only difference between the two groups is that one's been treated (i.e., assigned to the job training program) and the other hasn't. -
Run a regression of earnings on actual enrollment in the training program and report the resulting "effect" and standard error. Explain why this regression fails to give us an unbiased estimate of the average effect of the enrollment in the training program on earnings using either the econometric assumptions in the ALZ text or an exchangeability argument from the perspective of the Hernan and Robins text.
In this new regression, we obtain a coefficient on training of $\$ 2229.6$ with a standard error of $\$ 387.5$. We may not assume that regression income on enrollment would produce a causal estimate. Why? Because there are many, many variables that could affect both the decision to enrolment in the program and eventual income. Motivation is a good example. People who are more motivated might be more inclined to both enroll in the program and then perform better at their jobs - therefore earning more in wages. Because there are a host of other potential confounders out there (some of which, like marriage, we have data on), we can't say that a simple regression of earnings on wages would give us a causal estimate. -
Now estimate the effect of treatment assignment while controlling for the covariates sex, age, prevearn, married, hsorged, black, and hispanic. How different is the estimated effect from the effect you estimated in part A? (Hint: look at the standard errors/confidence intervals to get a sense of what a "significant" difference might be.) Why might we expect this result?
In this new regression, the regression on assignmt is $\$ 1,517$ with a standard error of $\$ 382.4$. This is very similar to the previous estimate of $\$ 1,390$, especially once we look at the associated standard errors and note that these two estimates are virtually indistinguishable statistically. One way of showing this is to plot each coefficient and its associated \(95 \%\) confidence interval (this was NOT required for credit).
We expect that conditioning on the pretreatment covariates won't change our estimate of the average effect of treatment because the treatment was randomize so in expectation, the exchangeability / unconfoundedness assumption holds. This means that we don't need to condition on any covariates to get unbiased estimates. -
R students only: Create a figure showing the differences between the treatment and control units on the covariates you included in the regression in part C. There are many ways to do this - for now we will have you create a type of plot that is currently popular in political science, called a balance plot. This is similar to what you did on Problem Set 7 for missing data. For each of the covariates, calculate \(\frac{x_{\text {treated }}-x_{c a n t r o l}}{s_{\text {treated }}}\) where \(s\) is the standard deviation of the variable for the treated units. The reason to standardize these values by the standard deviation of the variable is so that they can all be plotted on the same axis and roughly compared. Calculate a \(95 \%\) confidence interval for each standardized difference (remember to use the formula for the difference in means) and show these confidence intervals on the graph. Our figure is shown here - you don't have to reproduce it but your plot should have the same information.
The code we used to produce the plot is here:
-
Regress earnings on assignment and an interaction term between assignment and sex (ie. include only assignmt and assignmt \(\times\) sex as predictors of earnings). Obtain and report four point estimates (no need to report standard errors) for the expected average earnings for the four groups: unassigned males, unassigned females, assigned males, and assigned females in a 2 by 2 matrix. What is the average effect of assignment for males? What is the average effect of assignment for females?
Answer: The average estimated earnings for the four groups are given in the table 1 below. The average effect assignment on earnings for women is a loss of $1776.211$ dollar, and a gain of $4769.65$ for men. - Regress earnings on assignment, an interaction term between assignment and sex and sex itself (ie. include assignmt, sex, and assignmt \(\times\) sex as predictors of earnings). Obtain and report four point estimates (no need to report standard errors) for the expected average earnings for the four groups: unassigned males, unassigned females, assigned males, and assigned females in a 2 by 2 matrix. What is the average effect of assignment for males?
What is the average effect of assignment for females? Comparing E and \(\mathrm{F}\), which model do you like better and why?
Answer: The average estimated earnings for the four groups are given in table 2 below. Now the marginal effects are an average increase of $1031.275$ dollars for women and a $1970.815$ dollar increase for men. This fully saturated model is clearly better because it does not impose the entirely unrealistic assumption that \(\beta_{sex}=0\). This assumption is soundly rejected by the data. The general lesson is to never leave out a lower order term (ie. Sex and Assignment) when interaction terms (Sex * Assignment) are included. Leaving out a lower order term almost always involves fairly silly assumptions.
Problem 2 :
We will work a simplified example of a conditionally randomized experiment based Olken's paper "Direct Democracy and Local Public Goods: Evidence from a Field Experiment in Indonesia" (American Political Science Review, 2010, 104(2)). The data is available as Olken-villagedata.csv.
Olken runs an experiment on 49 Indonesian villages through which he studies the role of two alternative democratic institutions on various outcomes. The two democratic institutions are representative-based meetings or direct election-based plebiscites. We will take the meetings to be the control group (elect=0) on the wave of the experiment. There were two waves: the first wave/phase (wave=0) that included villages from North Sumatra and East Java and the second wave/phase (wave=1) that included villages from Southeast Sulawesi. Within each wave, treatment assignment was unconditionally randomized.
Once assigned a decision mechanism, each village had to choose an infrastructure project to carry out. This was all carried out under the auspices of the Kecamatan Development Program (KDP). Olken then measures the characteristics of the projects that were selected, as well as various attitudinal outcomes of the villagers to determine the efficacy of the democratic institutions.
Note that this is a highly simplified version of the dataset since we have removed the multilevel structure of the data - everything is at the level of the village now. As such, we won't be directly replicating the causal effects that Olken is finding, although the intuition behind what we're showing here and his analysis is quite similar.
-
Find the proportion of villages treated within each of the two waves (defined by wave).
1st wave: \(P(\bar{T} \mid W=0)=\frac{8}{29}=0.276\)
2nd wave: \(P(\bar{T} \mid W=1)=\frac{9}{20}=0.45\) - Table 2 in Olken provides the distribution of various background characteristics across the treated and untreated units. Let's focus on asphalt, which gives the percent of village roads that are asphalt. This is a pretreatment covariate in the experiment.
- Calculate the mean of asphalt in the meeting group (elect=0) and the plebiscite group (elect=0).
Answer:
The mean of % village roads that are asphalt in the meeting group is $0.305$. The corresponding mean for the plebiscite group is $0.2056$.
Deliverable: Word Document
