-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pairwise compare for optimization #11
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @jaceybronte, good job!
I added some suggestions to consider. Additionally, if you wanted to perform a sanity check, you could perform these steps:
- Calculate the number of comparisons you expect and make sure it equals the number of samples in the output dataframe. (this is implemented as a test in Pairwise Compare with the NF1 dataset)
- Choose
n
pairs of samples from the input dataframes and compute the pearsons correlation of these pairs using an existing tool, such as a pandas correlation function, then compare these results to the output dataframe for the corresponding pairs. (this is not implemented as a test in Pairwise Compare)
|
||
results = [] | ||
|
||
for plate in plate_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also combine all of the plates into the same dataframe and specify the plate column name in _same_columns
(e.g. Metadata_Plate
) instead of using a for loop. This works too though, and may be a better solution if memory is a concern
_same_columns=["Metadata_cell_line", "Metadata_seeding_density", "Metadata_time_point"], | ||
_different_columns=["Metadata_Well"], | ||
_feat_cols=feat_cols, | ||
_drop_cols=["Metadata_Concentration", "Metadata_Well"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you specify a column _drop_cols
that is not specified in _same_columns
or _different_columns
then the PairwiseCompareManager will ignore it. Could consider removing Metadata_Concentration
here to improve clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
comparer = PairwiseCompareManager( | ||
_df=plate_df.copy(), | ||
_comparator=pearsons_comparator, | ||
_same_columns=["Metadata_cell_line", "Metadata_seeding_density", "Metadata_time_point"], | ||
_different_columns=["Metadata_Well"], | ||
_feat_cols=feat_cols, | ||
_drop_cols=["Metadata_Concentration", "Metadata_Well"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider describing what you are aiming to compare here for people less familiar with the tool. You could also consider providing a link to the tool for those who are curious. It also may be worth commenting here, at the top of the notebook, or in both places describing the purpose of performing these comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
worst_results = ( | ||
combined_df.loc[ | ||
combined_df.groupby("Metadata_cell_line__antehoc_group0")["pearsons_correlation"].idxmin(), | ||
[ | ||
"Metadata_cell_line__antehoc_group0", | ||
"Metadata_seeding_density__antehoc_group0", | ||
"Metadata_time_point__antehoc_group0", | ||
"pearsons_correlation" | ||
] | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the column name here could make sense, although, it may be worth changing in the PairwiseCompareManager. I kept it the same, because the other tool (PairwiseCompare.py) has no concept of same or different columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
This PR implements the pairwise compare tool to optimize conditions for ALSF