A Python script that extracts historical US federal budget data from Congressional Budget Office (CBO) Excel workbooks into CSV format.
Important: We deliberately avoid any calculations or transformations here. All values are extracted directly from CBO's published data.
You can audit the extraction script here. The only changes made during extraction are:
- Converting surplus values to deficit (multiplying by -1) for more intuitive interpretation
- Very minor column renames and reordering for readability
The script extracts CBO data into two CSV files:
budget_gdp.csv
: Budget values as a percentage of gross domestic product (GDP)budget_nominal.csv
: Raw nominal budget values in billions of dollars (not adjusted for inflation)
GDP percentages are generally more meaningful for analysis since they show spending, revenue, and deficits relative to the size of the economy (which is what matters).
- Python 3.8+
- pip
- Clone this repository:
git clone https://github.com/holdenmatt/us-budget-csv.git
cd us-budget-csv
- Create a Python virtual environment:
python -m venv venv
- Activate it:
source venv/bin/activate # On Unix/macOS
# or
.\venv\Scripts\activate # On Windows
- Install dependencies:
pip install -r requirements.txt
-
Download the latest "Historical Budget Data" Excel file from the CBO website
-
Move the Excel file to the
input/
directory -
Run the script:
python scripts/extract_budget_data.py