Skip to content

python source code for analyzing penalization under HACRP due to SIRs

Notifications You must be signed in to change notification settings

Rush-Quality-Analytics/HACRP-HAIs

Repository files navigation

HACRP-HAIs

Python source code and publicly available data for analyzing a history of biased hospital penalization under the Hospital Acquired Conditions Reduction Program (HACRP), a program administered by the Centers for Medicare and Medicaid Services (CMS). The associated manuscript has been submitted to a peer-reviewed journal and a link to the paper will appear once it is published. This public repository is provided to promote transparency and permit reproducibility of the associated research.

Directories and files

All results figures, statistics, and most of the data in this repository can be exactly reproduced by running files within this repository. Below, is a breakdown of the repository's contents. The directories are numbered to indicate the order that users should follow when reproducing files within the repository and results of the associated manuscript. Click on the arrows to reveal files, explanations, and other directories.

1_CleanCurateCompile_CareCompare_Data Each python file in this directory aggregates years of archived CMS CareCompare data into a single file.
  • HACRP_Facility_Files_CombineYears.py
  • HAI_Facility_Files_CombineYears.py
2_Preprocess_CareCompare_data Each python file in this directory preprocesses time-aggregated HAI data to achieve standardized feature names and filtered-feature datasets.
  • Generate_CAUTI_data.py
  • Generate_CLABSI_data.py
  • Generate_MRSA_data.py
  • Generate_CDI_data.py
3_Merge_HAC_with_HAI Jupyter notebook files in this directory merge data from HAI files with data from HACRP files. These files are also responsible for reproducing HACRP penalty assignments from scratch (a vital validation step). Each year is represented by its own file, due to the complexity of the tasks and varied changes in the HACRP program from one year to the next.
  • 2015.ipynb
  • 2016.ipynb
  • 2017-Part1.ipynb
  • 2017-Part2.ipynb
  • 2018.ipynb
  • 2019.ipynb
  • 2020.ipynb
  • 2021.ipynb
  • 2022.ipynb
4_Merge_HAC-HAI_with_HCRIS Jupyter notebook files in this directory take the merged HAC-HAI data and then merge it with data from the CMS Healthcare Cost Report Information System (HCRIS).
  • 1_generate_filtered_PUF_df.ipynb
    • This Jupyter notebook file checks, constructs, and/or reproduces payments from the Inpatient Prospective Payment System (IPPS) and penalties from the Hospital Acquired Conditions Reduction Program (HACRP). These data are obtained from HCRIS data sets.
  • 2_generate_compiled_df.ipynb
    • This Jupyter notebook produces the compiled file of HAI, HACRP, and HCRIS data that will be used in part for optimizing random sampling models, which are in-turn used to calculate the standardized infection score (SIS).
5_Optimize_random_sampling_models The purpose of contents in this directory are to explain the variation in reported numbers of infections for specific types of HAIs (across years and among hospitals) as a consequence of random variation based on hospital volume. The models used here are based on a simple binomial random sampling approach (similar to a model based on coin flips).
  • CAUTI_opt_DataGen.py - A small file (only 8 lines). The file is used to pass CAUTI-based parameters and arguments to functions in the HAI_optimize.py file.
  • CLABSI_opt_DataGen.py - A small file (only 8 lines). The file is used to pass CLABSI-based parameters and arguments to functions in the HAI_optimize.py file.
  • MRSA_opt_DataGen.py - A small file (only 8 lines). The file is used to pass MRSA-based parameters and arguments to functions in the HAI_optimize.py file.
  • CDI_opt_DataGen.py - A small file (only 8 lines). The file is used to pass CDI-based parameters and arguments to functions in the HAI_optimize.py file.
  • HAI_optimize.py - This python file optimizes parameters of random sampling models for particular types of HAIs (CAUTI, CLABSI, MRSA, CDI).
6_Generate_SIS_results The jupyter notebooks in this directory are similar to those in the Merge_HAC_with_HAI directory. However, rather than attempt to reproduce actual HACRP penalty assignments using numbers of observed infections, each Jupyter notebook produces HACRP penalty assignments based on the numbers of infections expected at random based on volume.
  • SIS_2015.ipynb
  • SIS_2016.ipynb
  • SIS_2017.ipynb
  • SIS_2018.ipynb
  • SIS_2019.ipynb
  • SIS_2020.ipynb
  • SIS_2021.ipynb
  • SIS_2022.ipynb
7_Generate_Final_results This jupyter notebook in this directory imports a single file containing merged data from all of the above directories (1 to 6). It then produces all tables and figures contained in the associated manuscript.
  • generate_final_results.ipynb
data This directory contains other directories, each containing data that are either imported or produced by the above directories (1 to 7).
  • CareCompare_data ... stuff ...
  • CombinedFiles_HACRP
    • Facility.pkl A pickle file containing cleaned and curated data from the Hospital Acquired Conditions Reduction Program (HACRP) files obtained from the CMS Care Compare hospitals archive.
  • CombinedFiles_HAI
    • Facility.pkl A pickle file containing cleaned and curated data on Healthcare Associated Infections (HAIs) obtained from the CMS Care Compare hospitals archive.
  • Compiled_HCRIS-HACRP-HAI-RAND Files in this directory contain data merged from HCRIS, cost report data from RAND, and files from the CMS Care Compare archive for HAIs and the HACRP. The two files below contain the exact same data, but in different file formats.
    • Compiled_HCRIS-HACRP-HAI-RAND.csv
    • Compiled_HCRIS-HACRP-HAI-RAND.pkl
  • finalized This directory contains files that are the final product of merging data from HCRIS, RAND, and HAI and HACRP data from Care Compare, as well as data on reproduced penalty assignments, penalty assignments based on the standardized infection score (SIS), and penalty assignments based on random expectations.
    • final_2015.pkl
    • final_2016.pkl
    • final_2017.pkl
    • final_2018.pkl
    • final_2019.pkl
    • final_2020.pkl
    • final_2021.pkl
    • final_2022.pkl
  • HCRIS_data This directory contains a file engineered from freely available SAS-based HCRIS cost report files. The file is generated by the `1_generate_filtered_PUF_df.ipynb` file. The resulting data file is then used by the `2_generate_compiled_df.ipynb` file.
    • FilteredEngineeredPUF_p5.pkl
  • merged_HAC_HAI Files in this directory are serialized python data files. These files contain HACRP data merged with HAI data, as well as reproduced penalty assignments and their associated data.
    • HAI_HAC_2015.pkl
    • HAI_HAC_2016.pkl
    • HAI_HAC_2017.pkl
    • HAI_HAC_2018.pkl
    • HAI_HAC_2019.pkl
    • HAI_HAC_2020.pkl
    • HAI_HAC_2021.pkl
    • HAI_HAC_2022.pkl
    • P1_HAI_HAC_2017_holdout.pkl
    • P1_HAI_HAC_2017.pkl
  • optimized_by_HAI_file_date This directory contains four other directories, each containing the outputs of random sample based modeling, including optimized model parameters.
    • CAUTI
    • CLABSI
    • MRSA
    • CDI
  • preprocessed_HAI_data The directory holds curated and processed data from the CMS CareCompare Hospital archive. The contents are:
    • CAUTI_Data.pkl
    • CLABSI_Data.pkl
    • MRSA_Data.pkl
    • CDI_Data.pkl
  • Rand_CostReport This directory contains a file obtained from the RAND hospital cost report tool, which offers a single freely available file. This file is used in the current project to verify derived IPPS payment values.
    • rand_hcris_free_2022_11_01.csv
  • states_codes This directory contains a single small file. The file contains state codes used in the formation of 6-digit CMS facility numbers. These are used in verifying and engineering HCRIS data.
    • HCRIS_STATE_CODES.csv
figures The files in this directory comprise the graphical results of the associated manuscript and its appendix.
  • `Hists_HAC.png` A figure showing that the distribution of random-based HAC scores is highly similar to the distribution of actual HAC scores.
  • `Obs_v_Pred.png` A figure showing that random expectations based on volume explain the majority of variation in reported numbers of infections across hospitals.
  • `change_in_rank.png` A figure showing the result of accounting for random expectations based on volume when calculating rates of HAIs. Specifically, hospital rankings in HAC scores drastically change when using the SIS to account for random expectations.
  • `expected_penalties.png` A figure showing the accumulation of biased HACRP penalties and inappropriate CMS savings across program years (2015 to 2022).

Reproducing data files and results of the associated research

Instructions are provided here for exactly reproducing nearly all data files and results from scratch. The code in this project is research code, and not constructed to software development standards. The user will need a basic-to-intermediate working knowledge of python. Additionally, the instructions and file paths are MAC-based. Windows users will need to modify the source code as needed.

1. Download this repository or fork it on GitHub.
  • The directory should be stored in a GitHub directory directly below the user directory. Otherwise, the user will need to change file paths in each python (.py) and jupyter notebook (.ipynb) file.
2. Ensure the following software is installed: Versions are those used in the current project. Similar versions will likely work as well.
  • python==3.8.12
  • pandas==1.4.0
  • numpy==1.22.1
  • scipy==1.7.3
  • matplotlib==3.3.4
  • matplotlib-inline==0.1.3
  • jupyter-book==0.12.2
  • jupyter-core==4.9.1
  • ipython==8.0.1
  • scikit_posthocs==0.7.0
3. Obtain HCRIS public use files (PUFs) for federal fiscal years (FFY 2015 - 2022)
  • These files are downloaded from: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/Cost-Reports-by-Fiscal-Year.

  • For each FFY, these PUFs consist of a report table, a numeric table, and an alpha-numeric table.

  • These files are too large to provision with this project's repository.

  • Store the files on this path: ~/Desktop/HCRIS/HCRIS_PUFs/, where the tilde (~) indicates the user directory. Of course, you can store them wherever you like, so long as the path in the 1_generate_filtered_PUF_df.ipynb file is changed to reflect the PUFs location.

4. Run programs following the numerical file structure.
  • 1_CleanCurateCompile_CareCompare_Data Run these files to generate aggregated HAI and HACRP data. It doesn't matter which is run first.
    • HACRP_Facility_Files_CombineYears.py
    • HAI_Facility_Files_CombineYears.py
  • 2_Preprocess_CareCompare_data Run these files to preprocesses aggregated HAI data. Each file will take a few hours, so it's best to run each in a different terminal window. It doesn't matter which is run first.
    • Generate_CAUTI_data.py
    • Generate_CLABSI_data.py
    • Generate_MRSA_data.py
    • Generate_CDI_data.py
  • 3_Merge_HAC_with_HAI Run each of these Jupyter notebook files. With the exception of part 1 and part 2 for 2017, it doesn't matter which notebook you run first. For 2017, run part 1 first.
    • 2015.ipynb
    • 2016.ipynb
    • 2017-Part1.ipynb
    • 2017-Part2.ipynb
    • 2018.ipynb
    • 2019.ipynb
    • 2020.ipynb
    • 2021.ipynb
    • 2022.ipynb
  • 4_Merge_HAC-HAI_with_HCRIS Run these Jupyter notebook files to merge HAC/HAI data with HCRIS data. Run `1_generate_filtered_PUF_df.ipynb` first.
    • 1_generate_filtered_PUF_df.ipynb
    • 2_generate_compiled_df.ipynb
  • 5_Optimize_random_sampling_models Run each of the `...opt_DataGen.py` files to generate optimized random expectations for each type of HAI, for each hospital in each year. Each file will take seveal hours to run, so it is recommended to run each of the 4 files in its own terminal window.
    • CAUTI_opt_DataGen.py
    • CLABSI_opt_DataGen.py
    • MRSA_opt_DataGen.py
    • CDI_opt_DataGen.py
    • HAI_optimize.py
  • 6_Generate_SIS_results The jupyter notebooks in this directory are similar to those in the Merge_HAC_with_HAI directory. However, rather than attempt to reproduce actual HACRP penalty assignments using numbers of observed infections, each Jupyter notebook produces HACRP penalty assignments based on the numbers of infections expected at random based on volume.
    • SIS_2015.ipynb
    • SIS_2016.ipynb
    • SIS_2017.ipynb
    • SIS_2018.ipynb
    • SIS_2019.ipynb
    • SIS_2020.ipynb
    • SIS_2021.ipynb
    • SIS_2022.ipynb
  • 7_Generate_Final_results Run this jupyter notebook to reproduce all tables and figures contained in the associated manuscript.
    • generate_final_results.ipynb

About

python source code for analyzing penalization under HACRP due to SIRs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published