-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: use bhepop2 package for income assignment (#243)
* feat: use bhepop2 for eqasim income assignment * todo: manage zip * corrections from rebase (Filosofi is now read from .zip file) * restore default config file * add missing configuration of income_com_path * add missing request of income_com_xlsx config * remove columns filters and rename from _read_filosofi_excel * cleanup * convert income to integer * remove pandas warning * improve error catches and code layout * include population size in plot title * remove MODALITIES constant describe attribute modalities when giving attribute selection * wip use bhepop2 * merged municipality_attributes.py in municipality.py municipality stage now returns a DataFrame containing income deciles per attributes in addition to the usual global deciles. Two columns "attribute" and "modality" have been added to specify the related attribute and the value it takes (modality). Attribute and modality for global deciles are "all". Filter on "all" attribute and modality have been added where data.income.municipality were used. * move income_uniform_sample and MAXIMUM_INCOME_FACTOR to a utils module * change MAXIMUM_INCOME_FACTOR back to original value * renamed "eqasim method" into "uniform method" * move compare_methods.py to analysis/methods/income/ * renamed bhepop2_income.py module into bhepop2.py * refactor: improve bhepop2 integration merge municipality_attributes.py into municipality.py move compare_methods.py to analysis/methods/income/ created a utils.py module in synthesis/population/income/ to store common functions added test dataset and tests --------- Co-authored-by: leo-desbureaux-tellae <[email protected]> * fix: conflicts * try to make test_determinism work (#6) * add income_assignation_method config documentation * remove blank lines * remove debug print * remove float casting * update docs * renamed "modality" column in "value" --------- Co-authored-by: Valentin LE BESCOND <[email protected]>
- Loading branch information
1 parent
90e9416
commit f74bd98
Showing
19 changed files
with
497 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,5 +6,6 @@ output | |
data/nantes_2015 | ||
data/lyon_2015 | ||
.vscode | ||
.idea | ||
|
||
config_local_*.yml | ||
config_local_*.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
from synthesis.population.income.utils import MAXIMUM_INCOME_FACTOR | ||
from bhepop2.tools import add_household_size_attribute, add_household_type_attribute | ||
from bhepop2.sources.marginal_distributions import QuantitativeMarginalDistributions | ||
import pandas as pd | ||
import os | ||
|
||
""" | ||
Compare income assignation methods available in Eqasim. | ||
Comparison is realised on the synthetic population of the most populated commune. | ||
""" | ||
|
||
COMPARE_INCOME_FOLDER = "compare_income_methods" | ||
|
||
|
||
def configure(context): | ||
context.config("output_path") | ||
context.stage("data.income.municipality") | ||
context.stage("synthesis.population.income.uniform", alias="uniform") | ||
context.stage("synthesis.population.income.bhepop2", alias="bhepop2") | ||
context.stage("synthesis.population.sampled") | ||
|
||
|
||
def execute(context): | ||
|
||
# get complete population (needed to add attributes) | ||
df_population = context.stage("synthesis.population.sampled") | ||
df_population = add_household_size_attribute(df_population) | ||
df_population = add_household_type_attribute(df_population) | ||
|
||
# get most populated commune | ||
commune_id = df_population.groupby(["commune_id"], observed=True)["commune_id"].count().drop("undefined").idxmax() | ||
|
||
# get income distributions by attributes | ||
income_df = context.stage("data.income.municipality").query(f"commune_id == '{commune_id}'") | ||
income_df = income_df.rename( | ||
columns={ | ||
"value": "modality", | ||
"q1": "D1", | ||
"q2": "D2", | ||
"q3": "D3", | ||
"q4": "D4", | ||
"q5": "D5", | ||
"q6": "D6", | ||
"q7": "D7", | ||
"q8": "D8", | ||
"q9": "D9", | ||
} | ||
) | ||
|
||
households_with_attributes = df_population[[ | ||
"household_id", "commune_id", "size", "family_comp" | ||
]].drop_duplicates("household_id") | ||
|
||
# get enriched population with different methods | ||
uniform_pop_df = context.stage("uniform") | ||
uniform_pop_df = uniform_pop_df.merge(households_with_attributes, on="household_id") | ||
uniform_pop_df["household_income"] = ( | ||
uniform_pop_df["household_income"] * 12 / uniform_pop_df["consumption_units"] | ||
) | ||
uniform_pop_df = uniform_pop_df.query(f"commune_id == '{commune_id}'") | ||
|
||
bhepop2_pop_df = context.stage("bhepop2") | ||
bhepop2_pop_df = bhepop2_pop_df.merge(households_with_attributes, on="household_id") | ||
bhepop2_pop_df["household_income"] = ( | ||
bhepop2_pop_df["household_income"] * 12 / bhepop2_pop_df["consumption_units"] | ||
) | ||
bhepop2_pop_df = bhepop2_pop_df.query(f"commune_id == '{commune_id}'") | ||
|
||
# prepare populations analysis | ||
|
||
# create a source from the Filosofi distributions | ||
marginal_distributions_source = QuantitativeMarginalDistributions( | ||
income_df, | ||
"Filosofi", | ||
["size", "family_comp"], | ||
0, | ||
relative_maximum=MAXIMUM_INCOME_FACTOR, | ||
delta_min=1000 | ||
) | ||
|
||
# check output folder existence | ||
compare_output_path = os.path.join(context.config("output_path"), COMPARE_INCOME_FOLDER) | ||
if not os.path.exists(compare_output_path): | ||
os.mkdir(compare_output_path) | ||
|
||
# create an analysis instance | ||
analysis = marginal_distributions_source.compare_with_populations( | ||
{ | ||
"Uniform": uniform_pop_df, | ||
"Bhepop2": bhepop2_pop_df | ||
}, | ||
feature_name="household_income", | ||
output_folder=compare_output_path | ||
) | ||
analysis.plot_title_format = analysis.plot_title_format + f" \n(commune={commune_id})" | ||
|
||
analysis.generate_analysis_plots() | ||
analysis.generate_analysis_error_table() | ||
|
||
print(f"Generated compared analysis of income assignation methods in {compare_output_path}") | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,3 +29,4 @@ dependencies: | |
|
||
- pip: | ||
- synpp==1.5.1 | ||
- bhepop2==2.0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.