For this project you will work in groups of 2. You can choose your pair. This will test your skills for collaborating on Github.
In this problem set, you will experiment with cleaning a messy dataset and producing some basic results. You will replicate graphs from the paper Diversifying Society’s Leaders? The Determinants and Causal Effects of Admission to Highly Selective Private Colleges by Raj Chetty, David J. Deming, and John N. Friedman (and a variety of co-authors). This working paper uses anonymized admissions data from 139 elite colleges linked to income tax records to ask whether these highly selective schools show a preference for high-income students beyond SAT/ACT scores and the effect of attending one of these schools on future earnings.
See a non-technical summary and a New York Times report for more information. Please familiarize yourself with this working paper.
I have provided starter code, which you will use to complete this project. The code includes:
housekeeping.R
download_data.R
clean_data.R
PS1_writeup.Rmd
The problem set questions are in ps1_writeup.Rmd.
I will be grading this problem set based on the following criteria:
- Quality of code (33%): Is it well-commented? Is it easy to follow? Can I run it?
- Quality of graphs (33%): Are they well-labeled? Do they have titles? Do they have legends? Are they formatted well?
- Quality of answers (33%): Are they clear? Do they answer the question?
In order to submit this project, you will need to:
- Stage your changes using RStudio, GitHub Desktop, or git from the command line
- Commit these changes
- Pull from the repository (always pull before you push)
- Push your changes to the repository
I encourage you to actively use generative AI to assist with writing code for this assignment.