Problem Set 1 for Big Data Class

Due: 9/24/2023

Group problem set

For this project you will work in groups of 2. You can choose your pair. This will test your skills for collaborating on Github.

Problem Set description

In this problem set, you will experiment with cleaning a messy dataset and producing some basic results. You will replicate graphs from the paper Diversifying Society’s Leaders? The Determinants and Causal Effects of Admission to Highly Selective Private Colleges by Raj Chetty, David J. Deming, and John N. Friedman (and a variety of co-authors). This working paper uses anonymized admissions data from 139 elite colleges linked to income tax records to ask whether these highly selective schools show a preference for high-income students beyond SAT/ACT scores and the effect of attending one of these schools on future earnings.

See a non-technical summary and a New York Times report for more information. Please familiarize yourself with this working paper.

I have provided starter code, which you will use to complete this project. The code includes:

housekeeping.R
download_data.R
clean_data.R
PS1_writeup.Rmd

The problem set questions are in ps1_writeup.Rmd.

Grading

I will be grading this problem set based on the following criteria:

Quality of code (33%): Is it well-commented? Is it easy to follow? Can I run it?
Quality of graphs (33%): Are they well-labeled? Do they have titles? Do they have legends? Are they formatted well?
Quality of answers (33%): Are they clear? Do they answer the question?

Submitting project

In order to submit this project, you will need to:

Stage your changes using RStudio, GitHub Desktop, or git from the command line
Commit these changes
Pull from the repository (always pull before you push)
Push your changes to the repository

ChatGPT/GitHub CoPilot

I encourage you to actively use generative AI to assist with writing code for this assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_data.R		download_data.R
housekeeping.r		housekeeping.r
ps1_writeup.Rmd		ps1_writeup.Rmd
ps1_writeup.html		ps1_writeup.html
ps1_writeup.pdf		ps1_writeup.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Set 1 for Big Data Class

Due: 9/24/2023

Group problem set

Problem Set description

Grading

Submitting project

ChatGPT/GitHub CoPilot

About

Releases

Packages

Languages

License

tcastriotta/big-data-PS1

Folders and files

Latest commit

History

Repository files navigation

Problem Set 1 for Big Data Class

Due: 9/24/2023

Group problem set

Problem Set description

Grading

Submitting project

ChatGPT/GitHub CoPilot

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages