Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull Federal Spending from USspendin API #11

Open
6 tasks done
ouslan opened this issue Oct 14, 2024 · 0 comments
Open
6 tasks done

Pull Federal Spending from USspendin API #11

ouslan opened this issue Oct 14, 2024 · 0 comments
Assignees

Comments

@ouslan
Copy link
Member

ouslan commented Oct 14, 2024

Objective

Create a function inside the DataPull class to download the data in bulk from the USspending api. Utilize the pull_file() function to pull all the required files, unzip and store them inside the data/raw directory

  • Create helper function to query the API and get the download link
  • Use the pull_file() to download each file
  • Iterate through all available data in the API and download it as year_spending.zip inside the raw folder.
  • Create another helper function in data_pull.pyto unzip the file then read the *.csv and do basic cleaning operation like (creating a date sting and ensuring the correct data type.)
  • Create a DuckDB schema for the table in src/models.py (There is no need to create multiple tables for the schema)
  • Iterate for each file and insert them inside the database (you can use self.conn.insert("table_name", df) to automatically insert the data)
@ouslan ouslan self-assigned this Oct 14, 2024
@ouslan ouslan changed the title Add USspending data Pull Federal Spending from USspendin API Jan 30, 2025
@NatiPerez459 NatiPerez459 self-assigned this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants