Washington (American Economics Review, 2008) found that having a daughter (as opposed to a son) might
Problem 1:
Washington (American Economics Review, 2008) found that having a daughter (as opposed to a son) might affect how politicians view women's issues. In particular, Washington finds that having a daughter causes a Congressional representative to vote more liberally, particularly on issues important to women. Having a daughter, in other words, directly increases one's support for feminist issues.
This question will explore Washington's claims using her own data set. Download the data washington.dta from the course website. This dataset is in Stata format - in Stata you can just open it but in R you will need to load it using the read. dta command from the foreign library. The data in the paper focus on the 105th, 106th, 107th and 108 th Congresses, but we have restricted the data to include only the 105 th Congress.
The other key variables are as follows:
- ngirls and nboys: The number of female or male children;
- totchi: The total number of children;
- aauw: Legislator's voting score assigned by the American Association of University Women;
As a proxy for liberal, feminist-leaning votes, we will use - as Washington did - the voting score assigned to representatives by the American Association of University Women (aauw). For the following parts, you may assume that all representatives included in the dataset had their last child by the time the 105 th Congress
convened. You may also assume that the probability that a child will be male or female is one half, and that representatives are not practicing selective fertility techniques or abortions. You may also assume that there are no adopted children.
- Regress a representative's aauw score on his or her number of female children, ngirls and report your results (including standard errors and $p$-values) in a nicely formatted table. What is the relationship between the number of female children and a representative's AAUW score?
- Suppose that you're interested, as Washington was, in the causal effect that the "treatment" of having female children has on liberal voting (as measured by AAUW scores). What assumptions would you have to make in order for your estimate in Part A to be a causal estimate of this effect? Are these assumptions satisfied? Why or why not?
- Regress a representative's aauw score on his or her number of female children (ngirls) and his or her total number of children (totchi), and report your results (including standard errors and p-values) in a nicely formatted table. Compare your results from this model to those in Part A and provide a substantive interpretation. Assuming that linearity and additivity are reasonable assumptions, can you make a causal claim based on this model? Why or why not? Please be specific.
- Suppose you believe that the influence of having daughters increases or decreases along with the total number of children a legislator has and therefore additivity is not a reasonable assumption. Implement this theory by regressing a representative's aauw score on his or her number of female children, his or her total number of children, and an interaction between the two. Report your results (including standard errors and p-values) in a nicely formatted table. Causally interpret the lower order terms (i.e. not the interaction term) if they have meaningful interpretations (if not meaningful, explain why).
- Suppose you would like to relax the assumption that the treatment has a linear relationship with the outcome, but you're not worried about potential interactions. To do so, generate a set of dummy variables for every observed value of ngirls, but DO NOT generate one for ngirls=0. This will be the omitted category. Call these dummies ngirls1 through ngirls6. ngirls1, for example, is 1 for any observation with exactly 1 female child, and 0 otherwise. Once you have these dummy variables (there should be six of them), run a regression of aauw on totchi and the dummy variables, ngirls1 through ngirls6. (If it is convenient, you may have R generate the dummies automatically in the regression using factor(), in which case you can disregard the instructions about naming the dummies). Explain in words how you would interpret the coefficients on these dummy variables.
- Using the regression from Part E, present a plot of the estimated treatment effect across values for the treatment, holding totchi constant at 5 . The horizontal axis should represent the different values for ngirls, and the vertical axis should be the predicted aauw at those values. Comment on whether the relationship between ngirls and aauw, conditional on totchi =5, appears to be linear or not.
- For simplicity, subset the data so that you only have observations with \(1 \leq\) totchi \(\leq 5\). Now regress aauw on ngirls separately for every value of totchi. So you should run five different regressions, one each for totchi \(\in\{1,2,3,4,5\}\). Plot the estimated treatment effect, along with its \(95 \%\) confidence interval, for each value of totchi. Based on the plotted confidence intervals, can you reject the null hypothesis that the marginal effect of ngirls is equal for individuals with totchi=3 and totchi=4? Explain.
-
Using the results from Part G, construct an estimate for the Average Treatment Effect by taking a weighted average of the estimates at each value of totchi, where the weights are the proportion of observations in that stratum. Formally, if \(ATE_{i}\) is the ATE estimated for individuals with totchi=i, calculate the quantity
\(A T E=\sum_{i=1}^{5} \frac{n_{i}}{N} A T E_{i}\)
where \(N\) is the total sample size and \(n_{i}\) is the number of individuals with totchi=i. - \(R\) Users Only: Create a function that combines the five separate ATEs, as in Part H. Using 400 iterations, bootstrap the dataset (i.e., sample from the data with replacement) and get 400 estimates for the overall ATE. Report the bootstrapped standard error of the overall ATE, and present a density plot of the estimated sampling distribution.
- Continue to use the subsetted data from Part G. Generate a set of dummy variables for every value that totchi takes on, but DO NOT generate one for totchi=1. So you should have four different dummy variables. Call these totchi2 through totchi5. totchi2, for example, takes the value one for observations with exactly 2 total children, and 0 otherwise. Regress aauw on ngirls and totchi2 through totchi5 and report the coefficient on ngirls. (If it is convenient, you may have R generate the dummies automatically in the regression using factor(), in which case you can disregard the instructions about naming the dummies). Is the coefficient on ngirls the same as the ATE calculated in Part H? Why or why not?
- Like in previous problem sets, create a plot showing the estimated effect for different levels of totchi. The horizontal axis should represent ngirls, the vertical axis aauw, with different lines for the different levels of totchi.
- Continue to use the subsetted data from Part G. Along with the dummy variables for totchi that you generated in Part J, generate interactions of each dummy with ngirls. Call these interactions ngirls_totchi2 through ngirls_totchi5. Regress aauw on ngirls, the dummies (totchi2 through totchi5), and the interactions (ngirls_totchi2 through ngirls_totchi5). (If it is convenient, you may have R generate the dummies automatically in the regression using factor(), in which case you can disregard the instructions about naming the dummies). Create another plot of the results, like in Part K. How are the effects different here? Why?
- Returning to the regression from Part C above and using any of the tools from earlier in this course, evaluate the assumption of constant variance. If necessary, replicate your analysis while correcting for any violations of this assumption.
- Again returning to the regression from Part C above and using any of the tools from earlier in this course, evaluate the assumption of normality.
Deliverable: Word Document
