You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a function inside the DataPull class to download the data in bulk from the USspending api. Utilize the pull_file() function to pull all the required files, unzip and store them inside the data/raw directory
Create helper function to query the API and get the download link
Use the pull_file() to download each file
Iterate through all available data in the API and download it as year_spending.zip inside the raw folder.
Create another helper function in data_pull.pyto unzip the file then read the *.csv and do basic cleaning operation like (creating a date sting and ensuring the correct data type.)
Create a DuckDB schema for the table in src/models.py (There is no need to create multiple tables for the schema)
Iterate for each file and insert them inside the database (you can use self.conn.insert("table_name", df) to automatically insert the data)
The text was updated successfully, but these errors were encountered:
Objective
Create a function inside the DataPull class to download the data in bulk from the USspending api. Utilize the
pull_file()
function to pull all the required files, unzip and store them inside thedata/raw
directorypull_file()
to download each fileyear_spending.zip
inside the raw folder.data_pull.py
to unzip the file then read the*.csv
and do basic cleaning operation like (creating a date sting and ensuring the correct data type.)src/models.py
(There is no need to create multiple tables for the schema)self.conn.insert("table_name", df)
to automatically insert the data)The text was updated successfully, but these errors were encountered: