Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stability calculation #105

Merged
merged 8 commits into from
Feb 6, 2025
Merged

Add stability calculation #105

merged 8 commits into from
Feb 6, 2025

Conversation

TuomasBorman
Copy link
Collaborator

Related to this: #24 and #63

This PR implements draft for stability analysis. I tried to follow this publication in the implementation:

Lahti, L., Salojärvi, J., Salonen, A. et al. Tipping elements in the human intestinal ecosystem. Nat Commun 5, 4344 (2014). https://doi.org/10.1038/ncomms5344


This is from the current documentation of the implementation in this PR:

The stability is calculated by first defining the reference point, $(R_{f})$,
for each feature. This is defined by taking the mean of the range of values:

$$ R_{f} = \frac{x_{f, \min} + x_{f, \max}}{2} $$

where $(f)$ denotes a single feature and $(x)$ is the abundance.

Then, the difference between consecutive time points, $(\Delta_{f, x})$,

$$ \Delta_{f, x} = |x_{f, t} - x_{f, t-1}| $$

and the difference between the previous time point and the reference, $(\Delta_{f, R})$, are calculated:

$$ \Delta_{f, R} = |x_{f, t-1} - R_{f}| $$

where $(t)$ denotes the time point.

Optionally, the time difference $(\Delta_{f, t})$ is calculated:

$$ \Delta_{f, t} = t_{f, t} - t_{f, t-1} $$

The stability coefficient $(s_{f})$ is calculated with the correlation:

$$ s_{f} = \text{corr}(\Delta_{f, x}, \Delta_{f, R}) $$

Alternatively, the stability coefficient can be calculated using a linear model as follows:

$$ s_{f} = \frac{\Delta_{f, x} - \beta_{0} - \beta_{2} \cdot \Delta_{f, t} - \epsilon_{f}}{\Delta_{f, R}} $$


microbiome::intermediate_stability.R seems to calculate global reference point, i.e, the same reference point is used for all features:

https://github.com/microbiome/microbiome/blob/497576a6380bbe2334c09086ae9fcc374da59fc7/R/intermediate_stability.R#L161

Is this meaningful as the abundances can differ for each feature? Should the reference be calculated for each feature individually (as the function in this PR is doing)?

The function in this PR does not yet allow user to specify reference point; should user be able to do that (as in microbiome::intermediate_stability) or can/should we standardize this?

Moreover, microbiome::intermediate_stability seems to calculate the stability separately for values that are under and over the reference value: https://github.com/microbiome/microbiome/blob/497576a6380bbe2334c09086ae9fcc374da59fc7/R/intermediate_stability.R#L196

What is the justification of this, should we implement this also?

@antagomir

@antagomir
Copy link
Member

This is a rough heuristic method for stability calculations. It is fast & intuitive to calculate but it may not provide the most elegant measure and it makes many simplifying assumptions. It can be useful for exploration, and it can provide a baseline for more advanced measures. It would be good to mention these reservations in the documentation.

Reference should be calculated per feature if there is some idea how that should be done. By default, there is no information how reference point should be defined. In such case, median of the entire data matrix is as good as any other value. A good feature-specific reference is the median value. That seemed to approximate the unstable region in our experiments with HITChip Atlas in 2014 and before.

User should be able to define the reference point per feature. This can also help to scan the stability landscape for rough differences regarding this.

Left and right stability: the alternative basins of attraction might have different strengths. It is potentially interesting to see whether there is a difference between the directionality of stability. It can be ignored if there is a need to simplify but it can be interesting on its own right in some cases. Or one could calculate both left, right, and combined.

@TuomasBorman
Copy link
Collaborator Author

Now one can also provide own reference values (single value or value for each feature). However, there is a problem:

  • If the dataset includes multiple systems (patient or other sample grouping), user cannot currently give own reference samples for each system-feature pair.

It could be added but requires more thinking. Perhaps the easiest solution is to allow these reference samples through rowData. The biggest problem here is probably the added complexity from user-perspective.

TODO:

  • Left/right stability

@TuomasBorman
Copy link
Collaborator Author

Now one can also provide own reference values (single value or value for each feature). However, there is a problem:

* If the dataset includes multiple systems (patient or other sample grouping), user cannot currently give own reference samples for each system-feature pair.

It could be added but requires more thinking. Perhaps the easiest solution is to allow these reference samples through rowData. The biggest problem here is probably the added complexity from user-perspective.

TODO:

* Left/right stability

Alright, now it should work so that user can also specify reference samples for each system separately, i.e.:

  • system 1 & taxa 1 : reference value 1
  • system 2 & taxa 1 : reference value 2
  • system 1 & taxa 2 : reference value 3
  • system 2 & taxa 2 : reference value 4

... although, not sure if there is any use-case for this,. But good thing is that it did not add much complexity to code, just couple of lines

@TuomasBorman
Copy link
Collaborator Author

  • Left/right stability now works

TODO:

  • Update documentation
  • Add unit tests
  • Polish the code

@TuomasBorman
Copy link
Collaborator Author

This PR is now ready.

@TuomasBorman TuomasBorman merged commit 158f5b2 into devel Feb 6, 2025
3 checks passed
@TuomasBorman TuomasBorman deleted the stability branch February 6, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants