Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1803: percent of zips (WIP) #1916

Open
wants to merge 96 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
218fa48
Create deploy_be_staging.yml (#1575)
emma-nechamkin Apr 18, 2022
f680d86
Imputing income using geographic neighbors (#1559)
emma-nechamkin Apr 27, 2022
3a96001
Adding HOLC indicator (#1579)
emma-nechamkin May 12, 2022
2e38aaa
Update backend for Puerto Rico (#1686)
switzersc-usds Jun 23, 2022
92d68ba
updating
emma-nechamkin Jul 13, 2022
f8a6567
Do not drop Guam and USVI from ETL (#1681)
switzersc-usds Jul 7, 2022
002cddf
Emma nechamkin/holc patch (#1742)
emma-nechamkin Jul 15, 2022
e98282d
updating ejscreen data, try two (#1747)
emma-nechamkin Jul 18, 2022
29419dd
Rescaling linguistic isolation (#1750)
emma-nechamkin Aug 2, 2022
daf188c
adds UST indicator (#1786)
emma-nechamkin Aug 3, 2022
bbb5bbc
Changing LHE in tiles to a boolean (#1767)
emma-nechamkin Aug 3, 2022
cac1e04
added indoor plumbing to chas
emma-nechamkin Aug 3, 2022
19d3bde
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
3aa03f1
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
ed9b717
first run through
emma-nechamkin Aug 3, 2022
9635ef5
Refactor DOE Energy Burden and COI to use YAML (#1796)
mattbowen-usds Aug 10, 2022
d55b7c0
Update etl_score_geo.py
emma-nechamkin Aug 11, 2022
485a9a8
Create deploy_be_staging.yml (#1575)
emma-nechamkin Apr 18, 2022
f047ca9
Imputing income using geographic neighbors (#1559)
emma-nechamkin Apr 27, 2022
1782d02
Adding HOLC indicator (#1579)
emma-nechamkin May 12, 2022
05748c9
Update backend for Puerto Rico (#1686)
switzersc-usds Jun 23, 2022
b41a287
updating
emma-nechamkin Jul 13, 2022
3071815
Do not drop Guam and USVI from ETL (#1681)
switzersc-usds Jul 7, 2022
7559cf4
Emma nechamkin/holc patch (#1742)
emma-nechamkin Jul 15, 2022
2ab24c6
updating ejscreen data, try two (#1747)
emma-nechamkin Jul 18, 2022
f6efdd4
Rescaling linguistic isolation (#1750)
emma-nechamkin Aug 2, 2022
b0a7284
adds UST indicator (#1786)
emma-nechamkin Aug 3, 2022
0d90ae5
Changing LHE in tiles to a boolean (#1767)
emma-nechamkin Aug 3, 2022
8c75190
added indoor plumbing to chas
emma-nechamkin Aug 3, 2022
15450cf
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
4f6a1b5
added indoor plumbing to score housing burden
emma-nechamkin Aug 3, 2022
baa591a
first run through
emma-nechamkin Aug 3, 2022
97e1754
Refactor DOE Energy Burden and COI to use YAML (#1796)
mattbowen-usds Aug 10, 2022
94cdc47
Update etl_score_geo.py
emma-nechamkin Aug 11, 2022
dcda155
fixing rebase
emma-nechamkin Aug 11, 2022
481a2a0
updated to fix linting errors (#1818)
emma-nechamkin Aug 11, 2022
13e7908
Adding back MapComparison video
vim-usds Aug 16, 2022
d5fbb80
Add FUDS ETL (#1817)
mattbowen-usds Aug 16, 2022
d6c04b1
Disable markdown check for link
vim-usds Aug 16, 2022
9321798
Merge branch 'emma-nechamkin/release/score-narwhal' of https://github…
vim-usds Aug 16, 2022
ebac552
Adding DOT composite to travel score (#1820)
emma-nechamkin Aug 16, 2022
5e378ae
Adding first street foundation data (#1823)
emma-nechamkin Aug 17, 2022
981a36c
first run -- adding NCLD data to the ETL, but not yet to the score
emma-nechamkin Aug 17, 2022
49623e4
Add abandoned mine lands data (#1824)
mattbowen-usds Aug 17, 2022
2e05b1d
Merge branch 'emma-nechamkin/release/score-narwhal' of github.com:usd…
emma-nechamkin Aug 17, 2022
7d89d41
Adding NLCD data (#1826)
emma-nechamkin Aug 17, 2022
88dc2e5
updating to avoid conflicts
emma-nechamkin Aug 17, 2022
6e41e0d
Add donut hole calculation to score (#1828)
mattbowen-usds Aug 18, 2022
cb4866b
Adding eamlis and fuds data to legacy pollution in score (#1832)
emma-nechamkin Aug 18, 2022
3ba1c62
Update to use new FSF files (#1838)
emma-nechamkin Aug 18, 2022
1ee26bf
Quick fix to kitchen or plumbing indicator
emma-nechamkin Aug 18, 2022
d892bce
Fast flag update (#1844)
emma-nechamkin Aug 19, 2022
ad1ce2b
Tiles fix (#1845)
emma-nechamkin Aug 19, 2022
e6385c1
Update etl_score_geo.py
emma-nechamkin Aug 19, 2022
4bf7773
Issue 1827: Add demographics to tiles and download files (#1833)
lucasmbrown-usds Aug 22, 2022
6418335
Updates backend constants to N (#1854)
emma-nechamkin Aug 23, 2022
637b8c3
updated to show T/F/null vs T/F for AML and FUDS (#1866)
emma-nechamkin Aug 25, 2022
d3efcbd
fix markdown
esfoobar-usds Aug 25, 2022
e539db8
tuple type
esfoobar-usds Aug 26, 2022
1c4d3e4
Score tests (#1847)
emma-nechamkin Aug 26, 2022
b0b7ff0
just testing that the boolean is preserved on gha (#1867)
emma-nechamkin Aug 31, 2022
5201f9e
Adding tests to ensure proper calculations (#1871)
emma-nechamkin Aug 31, 2022
ccd72e2
tribal tiles fix (#1874)
esfoobar-usds Sep 1, 2022
9c0e199
Pipeline tile tests (#1864)
emma-nechamkin Sep 1, 2022
d41153d
Add tests to make sure each source makes it to the score correctly (#…
mattbowen-usds Sep 6, 2022
426328e
Updating traffic barriers to include low pop threshold (#1889)
emma-nechamkin Sep 7, 2022
fb4c484
Remove no land tracts from map (#1894)
emma-nechamkin Sep 8, 2022
6e9c44e
Issue 1831: missing life expectancy data from Maine and Wisconsin (#1…
lucasmbrown-usds Sep 10, 2022
60164c8
Removing low pop tracts from FEMA population loss (#1898)
emma-nechamkin Sep 12, 2022
4d02525
1831 Follow up (#1902)
lucasmbrown-usds Sep 15, 2022
876655d
Add tests for all non-census sources (#1899)
mattbowen-usds Sep 19, 2022
aca2261
Issue 1900: Tribal overlap with Census tracts (#1903)
lucasmbrown-usds Sep 20, 2022
f70f30d
Improve score test documentation based on Lucas's feedback (#1835) (#…
mattbowen-usds Sep 23, 2022
d8dd4cf
Cleanup source tests (#1912)
mattbowen-usds Sep 23, 2022
6e0ef33
Add tribal count notebook (#1917) (#1919)
mattbowen-usds Sep 23, 2022
9e85375
Add tribal overlap to downloads (#1907)
mattbowen-usds Sep 23, 2022
9fb9874
Issue 1910: Do not impute income for 0 population tracts (#1918)
lucasmbrown-usds Sep 26, 2022
15d946c
updating click
esfoobar-usds Sep 26, 2022
2f61900
updating click
esfoobar-usds Sep 26, 2022
48d961b
Bump just jupyterlab (#1930)
mattbowen-usds Sep 27, 2022
4da55a9
Fixing link checker (#1929)
lucasmbrown-usds Sep 27, 2022
0f0d6db
Update deps safety says are vulnerable (#1937) (#1938)
mattbowen-usds Sep 28, 2022
8e5ed5b
Add demos for island areas (#1932)
mattbowen-usds Sep 29, 2022
247db4a
Reorder download fields, add plumbing back (#1942)
mattbowen-usds Sep 29, 2022
f4adf17
refactoring tribal (#1960)
lucasmbrown-usds Sep 30, 2022
f284d75
renaming geocorr to geocorr_urban
lucasmbrown-usds Sep 21, 2022
d4d72c8
placeholder etl files
lucasmbrown-usds Sep 21, 2022
7ceab51
wip on ETL
lucasmbrown-usds Sep 21, 2022
a3ad7e0
fixing up validation
lucasmbrown-usds Sep 21, 2022
9f0918d
adding todos
lucasmbrown-usds Sep 22, 2022
ed364fb
updating to directly calculate overlay
lucasmbrown-usds Sep 28, 2022
a6ba9f6
fixing pylint error
lucasmbrown-usds Sep 28, 2022
a7a4df0
wip
lucasmbrown-usds Sep 28, 2022
f080464
renaming
lucasmbrown-usds Sep 28, 2022
bfb08e4
pynb
lucasmbrown-usds Sep 28, 2022
74bf497
updating with tract area
emma-nechamkin Oct 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions data/data-pipeline/data_pipeline/content/config/csv.yml
Original file line number Diff line number Diff line change
@@ -257,3 +257,12 @@ fields:
- score_name: Percent of population not currently enrolled in college or graduate school
label: Percent of residents who are not currently enrolled in higher ed
format: percentage
- score_name: Greater than or equal to the 90th percentile for leaky underground storage tanks and is low income?
label: Greater than or equal to the 90th percentile for leaky underground storage tanks and is low income?
format: bool
- score_name: Leaky underground storage tanks (percentile)
label: Leaky underground storage tanks (percentile)
format: percentage
- score_name: Leaky underground storage tanks
label: Leaky underground storage tanks
format: float
9 changes: 9 additions & 0 deletions data/data-pipeline/data_pipeline/content/config/excel.yml
Original file line number Diff line number Diff line change
@@ -153,12 +153,21 @@ sheets:
- score_name: Greater than or equal to the 90th percentile for wastewater discharge, is low income, and has a low percent of higher ed students?
label: Greater than or equal to the 90th percentile for wastewater discharge, is low income, and high percent of residents that are not higher ed students?
format: bool
- score_name: Greater than or equal to the 90th percentile for leaky underground storage tanks and is low income?
label: Greater than or equal to the 90th percentile for leaky underground storage tanks and is low income?
format: bool
- score_name: Wastewater discharge (percentile)
label: Wastewater discharge (percentile)
format: percentage
- score_name: Leaky underground storage tanks (percentile)
label: Leaky underground storage tanks (percentile)
format: percentage
- score_name: Wastewater discharge
label: Wastewater discharge
format: float
- score_name: Leaky underground storage tanks
label: Leaky underground storage tanks
format: float
- score_name: Greater than or equal to the 90th percentile for asthma, is low income, and has a low percent of higher ed students?
label: Greater than or equal to the 90th percentile for asthma, is low income, and high percent of residents that are not higher ed students?
format: bool
4 changes: 4 additions & 0 deletions data/data-pipeline/data_pipeline/etl/score/constants.py
Original file line number Diff line number Diff line change
@@ -195,6 +195,8 @@
+ field_names.PERCENTILE_FIELD_SUFFIX: "UF_PFS",
field_names.WASTEWATER_FIELD
+ field_names.PERCENTILE_FIELD_SUFFIX: "WF_PFS",
field_names.UST_FIELD
+ field_names.PERCENTILE_FIELD_SUFFIX: "UST_PFS",
field_names.M_WATER: "M_WTR",
field_names.M_WORKFORCE: "M_WKFC",
field_names.M_CLIMATE: "M_CLT",
@@ -220,6 +222,7 @@
field_names.SUPERFUND_LOW_INCOME_LOW_HIGHER_ED_FIELD: "SFLI",
field_names.HAZARDOUS_WASTE_LOW_INCOME_LOW_HIGHER_ED_FIELD: "HWLI",
field_names.WASTEWATER_DISCHARGE_LOW_INCOME_LOW_HIGHER_ED_FIELD: "WDLI",
field_names.UST_LOW_INCOME_FIELD: "USTLI",
field_names.DIABETES_LOW_INCOME_LOW_HIGHER_ED_FIELD: "DLI",
field_names.ASTHMA_LOW_INCOME_LOW_HIGHER_ED_FIELD: "ALI",
field_names.HEART_DISEASE_LOW_INCOME_LOW_HIGHER_ED_FIELD: "HDLI",
@@ -242,6 +245,7 @@
field_names.NPL_PCTILE_THRESHOLD: "NPL_ET",
field_names.TSDF_PCTILE_THRESHOLD: "TSDF_ET",
field_names.WASTEWATER_PCTILE_THRESHOLD: "WD_ET",
field_names.UST_PCTILE_THRESHOLD: "UST_ET",
field_names.DIABETES_PCTILE_THRESHOLD: "DB_ET",
field_names.ASTHMA_PCTILE_THRESHOLD: "A_ET",
field_names.HEART_DISEASE_PCTILE_THRESHOLD: "HD_ET",
1 change: 1 addition & 0 deletions data/data-pipeline/data_pipeline/etl/score/etl_score.py
Original file line number Diff line number Diff line change
@@ -413,6 +413,7 @@ def _prepare_initial_df(self) -> pd.DataFrame:
field_names.NPL_FIELD,
field_names.WASTEWATER_FIELD,
field_names.LEAD_PAINT_FIELD,
field_names.UST_FIELD,
field_names.UNDER_5_FIELD,
field_names.OVER_64_FIELD,
field_names.LINGUISTIC_ISO_FIELD,
6 changes: 5 additions & 1 deletion data/data-pipeline/data_pipeline/score/field_names.py
Original file line number Diff line number Diff line change
@@ -170,7 +170,7 @@
NPL_FIELD = "Proximity to NPL sites"
AIR_TOXICS_CANCER_RISK_FIELD = "Air toxics cancer risk"
RESPIRATORY_HAZARD_FIELD = "Respiratory hazard index"
UST_FIELD = "Underground storage tanks"
UST_FIELD = "Leaky underground storage tanks"

LOW_INCOME_THRESHOLD = "Exceeds FPL200 threshold"

@@ -430,6 +430,8 @@

# Critical Clean Water and Waste Infrastructure
WASTEWATER_DISCHARGE_LOW_INCOME_FIELD = f"Greater than or equal to the {PERCENTILE}th percentile for wastewater discharge and is low income?"
UST_LOW_INCOME_FIELD = f"Greater than or equal to the {PERCENTILE}th percentile for leaky underground storage tanks and is low income?"


# Health Burdens
DIABETES_LOW_INCOME_FIELD = f"Greater than or equal to the {PERCENTILE}th percentile for diabetes and is low income?"
@@ -629,6 +631,8 @@
NPL_PCTILE_THRESHOLD = f"Greater than or equal to the {PERCENTILE}th percentile for NPL (superfund sites) proximity"
TSDF_PCTILE_THRESHOLD = f"Greater than or equal to the {PERCENTILE}th percentile for proximity to hazardous waste sites"
WASTEWATER_PCTILE_THRESHOLD = f"Greater than or equal to the {PERCENTILE}th percentile for wastewater discharge"
UST_PCTILE_THRESHOLD = f"Greater than or equal to the {PERCENTILE}th percentile for leaky underwater storage tanks"

DIABETES_PCTILE_THRESHOLD = (
f"Greater than or equal to the {PERCENTILE}th percentile for diabetes"
)
33 changes: 26 additions & 7 deletions data/data-pipeline/data_pipeline/score/score_narwhal.py
Original file line number Diff line number Diff line change
@@ -442,23 +442,42 @@ def _water_factor(self) -> bool:
]
>= self.ENVIRONMENTAL_BURDEN_THRESHOLD
)

# Straight copy here in case we add additional water fields.
self.df[field_names.WATER_THRESHOLD_EXCEEDED] = self.df[
field_names.WASTEWATER_PCTILE_THRESHOLD
].copy()
self.df[field_names.UST_PCTILE_THRESHOLD] = (
self.df[field_names.UST_FIELD + field_names.PERCENTILE_FIELD_SUFFIX]
>= self.ENVIRONMENTAL_BURDEN_THRESHOLD
)

self.df[field_names.WASTEWATER_DISCHARGE_LOW_INCOME_FIELD] = (
self.df[field_names.WASTEWATER_PCTILE_THRESHOLD]
& self.df[field_names.FPL_200_SERIES_IMPUTED_AND_ADJUSTED]
)

self.df[field_names.UST_LOW_INCOME_FIELD] = (
self.df[field_names.UST_PCTILE_THRESHOLD]
& self.df[field_names.FPL_200_SERIES_IMPUTED_AND_ADJUSTED]
)

self.df[field_names.WATER_THRESHOLD_EXCEEDED] = self.df[
[
field_names.WASTEWATER_PCTILE_THRESHOLD,
field_names.UST_PCTILE_THRESHOLD,
]
].max(axis=1)

self._increment_total_eligibility_exceeded(
[field_names.WASTEWATER_DISCHARGE_LOW_INCOME_FIELD],
[
field_names.WASTEWATER_DISCHARGE_LOW_INCOME_FIELD,
field_names.UST_LOW_INCOME_FIELD,
],
skip_fips=constants.DROP_FIPS_FROM_NON_WTD_THRESHOLDS,
)

return self.df[field_names.WASTEWATER_DISCHARGE_LOW_INCOME_FIELD]
return self.df[
[
field_names.WASTEWATER_DISCHARGE_LOW_INCOME_FIELD,
field_names.UST_LOW_INCOME_FIELD,
]
].any(axis=1)

def _health_factor(self) -> bool:
# In Xth percentile or above for diabetes (Source: CDC Places)