Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise compare for optimization #11

Merged
merged 3 commits into from
Feb 18, 2025

Conversation

jaceybronte
Copy link
Member

This PR implements the pairwise compare tool to optimize conditions for ALSF

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@MattsonCam MattsonCam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @jaceybronte, good job!

I added some suggestions to consider. Additionally, if you wanted to perform a sanity check, you could perform these steps:

  1. Calculate the number of comparisons you expect and make sure it equals the number of samples in the output dataframe. (this is implemented as a test in Pairwise Compare with the NF1 dataset)
  2. Choose n pairs of samples from the input dataframes and compute the pearsons correlation of these pairs using an existing tool, such as a pandas correlation function, then compare these results to the output dataframe for the corresponding pairs. (this is not implemented as a test in Pairwise Compare)


results = []

for plate in plate_names:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also combine all of the plates into the same dataframe and specify the plate column name in _same_columns (e.g. Metadata_Plate) instead of using a for loop. This works too though, and may be a better solution if memory is a concern

_same_columns=["Metadata_cell_line", "Metadata_seeding_density", "Metadata_time_point"],
_different_columns=["Metadata_Well"],
_feat_cols=feat_cols,
_drop_cols=["Metadata_Concentration", "Metadata_Well"],
Copy link
Member

@MattsonCam MattsonCam Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you specify a column _drop_cols that is not specified in _same_columns or _different_columns then the PairwiseCompareManager will ignore it. Could consider removing Metadata_Concentration here to improve clarity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 51 to 57
comparer = PairwiseCompareManager(
_df=plate_df.copy(),
_comparator=pearsons_comparator,
_same_columns=["Metadata_cell_line", "Metadata_seeding_density", "Metadata_time_point"],
_different_columns=["Metadata_Well"],
_feat_cols=feat_cols,
_drop_cols=["Metadata_Concentration", "Metadata_Well"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider describing what you are aiming to compare here for people less familiar with the tool. You could also consider providing a link to the tool for those who are curious. It also may be worth commenting here, at the top of the notebook, or in both places describing the purpose of performing these comparisons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 91 to 101
worst_results = (
combined_df.loc[
combined_df.groupby("Metadata_cell_line__antehoc_group0")["pearsons_correlation"].idxmin(),
[
"Metadata_cell_line__antehoc_group0",
"Metadata_seeding_density__antehoc_group0",
"Metadata_time_point__antehoc_group0",
"pearsons_correlation"
]
]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the column name here could make sense, although, it may be worth changing in the PairwiseCompareManager. I kept it the same, because the other tool (PairwiseCompare.py) has no concept of same or different columns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@jaceybronte jaceybronte merged commit 1116d15 into WayScience:main Feb 18, 2025
@jaceybronte jaceybronte deleted the pairwise-compare branch February 18, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants