Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct permutation t-tests #684

Draft
wants to merge 13 commits into
base: develop
Choose a base branch
from
Draft

Conversation

gcattan
Copy link
Collaborator

@gcattan gcattan commented Feb 7, 2025

I think the out array should be initiated with zeros.
If not, then out can take nperms+1 as values, and the corrective statements (out[out == nperms] = nperms - 1) apply on the wrong sample.

@bruAristimunha what do you think?

@bruAristimunha
Copy link
Collaborator

hey @gcattan,

I am not a super expert on t-tests, I am pinging @sylvchev here.

@gcattan
Copy link
Collaborator Author

gcattan commented Feb 7, 2025 via email

@@ -121,15 +121,15 @@ def _pairedttest_random(data, nperms):
pvals: ndarray of shape (n_pipelines, n_pipelines)
array of pvalues
"""
out = np.ones((data.shape[1], data.shape[1]))
out = np.zeros((data.shape[1], data.shape[1]))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of exhaustive, this is not necessary.
But in the case of random, the initial statistic is not necessarily among those found within the permutation distribution.

Comment on lines 133 to 134
out[out == nperms] = nperms - 1
return out / nperms
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be out / (nperms + 1) instead?
So the correction apply to all p-value? And not to the extreme one only.

toncho11 and others added 4 commits February 10, 2025 16:08
Switch back to 1 for the random t-test to have at least 1 in the result.
And changed check for p=1 values for _pairedttest_exhaustive to be the same as for _pairedttest_random.
@toncho11
Copy link
Contributor

This last version was consulted with @Marco-Congedo.

Copy link
Contributor

@qbarthelemy qbarthelemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests could be completed to test that p-values are never equal to 0.

return out
out += randperm >= true

out[out >= nperms] = nperms - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add another control to avoid a p-value equal to 0:

Suggested change
out[out >= nperms] = nperms - 1
out[out >= nperms] = nperms - 1
out[out == 0] = 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this may deserve a comment in the code. If out is initialized to 1, or if it is initialized to 0 and we have random >= true, then this check should not be necessary.

out[out == nperms] = nperms - 1
out += randperm >= true

out[out >= nperms] = nperms - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out[out >= nperms] = nperms - 1
out[out >= nperms] = nperms - 1
out[out == 0] = 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, I will add a comment.

@@ -89,7 +89,7 @@ def _pairedttest_exhaustive(data):
pvals: ndarray of shape (n_pipelines, n_pipelines)
array of pvalues
"""
out = np.ones((data.shape[1], data.shape[1]))
out = np.zeros((data.shape[1], data.shape[1]), dtype=np.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialization of out should be similar between _pairedttest_exhaustive and _pairedttest_random:

Suggested change
out = np.zeros((data.shape[1], data.shape[1]), dtype=np.int32)
out = np.ones(data.shape[1:], dtype=np.int32)

Copy link
Collaborator Author

@gcattan gcattan Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok. But then we will take into account the original statistic two times, as we have random >= true

moabb/analysis/meta_analysis.py Outdated Show resolved Hide resolved
@toncho11
Copy link
Contributor

toncho11 commented Feb 11, 2025

The main problem was that there were p values produced equal to 1 or bigger than 1.

Co-authored-by: Quentin Barthélemy <[email protected]>
Signed-off-by: gcattan <[email protected]>
@PierreGtch
Copy link
Collaborator

Hi @gcattan, thank you for investing time into this issue!

As we can see here, custom re-implementations in our codebase are prone to errors. Do you think it would be possible to reimplement these using functions from common libraries, such as scipy.stats.permutation_test, which have been validated by a larger community?

If yes, I think it could be done in two steps:

  1. Test if the current implementation or your fix is consistent with the SciPy version.
  2. Replace the custom implementation with the SciPy version.

These two steps could be done in two PRs eventually.

@Marco-Congedo
Copy link

Hi,

the problem is not with the implementation, but with the fact that a permutation test can give a p-value in [1/nperm, 1] and that any p-value equal to 1 will screw up the Liptak (Stouffer) combination function as z(1)=Inf. The proposed solution is to allow p to be in [1/nperm, 1-1/nperm] only. This is not very straight, but i don't see any other solution if the Liptak combination function is to be used. In any case, using any other implementation, one would still need to constraint the p-values in [1/nperm, 1-1/nperm].

@gcattan
Copy link
Collaborator Author

gcattan commented Feb 12, 2025

@PierreGtch Then the question is rather if you want to change the current implementation of combine_pvalues:

image

What I can do as part of this PR is to validate the latest modification against the datasets we used and push some unit testing as suggested by @qbarthelemy.

@bruAristimunha
Copy link
Collaborator

bruAristimunha commented Feb 12, 2025 via email

@bruAristimunha
Copy link
Collaborator

bruAristimunha commented Feb 12, 2025 via email

@gcattan
Copy link
Collaborator Author

gcattan commented Feb 12, 2025

Sure, but let's wait for testing before merging, this is not covered by the current test suite. I will try to tackle this during the week.

@PierreGtch
Copy link
Collaborator

Hi,

the problem is not with the implementation, but with the fact that a permutation test can give a p-value in [1/nperm, 1] and that any p-value equal to 1 will screw up the Liptak (Stouffer) combination function as z(1)=Inf. The proposed solution is to allow p to be in [1/nperm, 1-1/nperm] only. This is not very straight, but i don't see any other solution if the Liptak combination function is to be used. In any case, using any other implementation, one would still need to constraint the p-values in [1/nperm, 1-1/nperm].

I never used this code from moabb, so your guess is better than mine :)
My remark was more general : it is a better habit to use code that is already validated than re-writing custom implementations

@gcattan
Copy link
Collaborator Author

gcattan commented Feb 12, 2025 via email

@Marco-Congedo
Copy link

The implementation is custom because you need to correct your p-value for the Stouffer method of combining p-value. If you really want to move away from custom implementation to a more standardized one, you need to use another method of combining p-value (in MoABB). Marco can probably advise what it is best to do in this situation. The trade off is that previously reported analysis may have slightly different result. Envoyé à partir de Outlook pour Androidhttps://aka.ms/AAb9ysg

________________________________ From: Pierre Guetschel @.> Sent: Wednesday, February 12, 2025 12:29:39 PM To: NeuroTechX/moabb @.> Cc: gcattan @.>; Mention @.> Subject: Re: [NeuroTechX/moabb] Correct permutation t-tests (PR #684) Hi, the problem is not with the implementation, but with the fact that a permutation test can give a p-value in [1/nperm, 1] and that any p-value equal to 1 will screw up the Liptak (Stouffer) combination function as z(1)=Inf. The proposed solution is to allow p to be in [1/nperm, 1-1/nperm] only. This is not very straight, but i don't see any other solution if the Liptak combination function is to be used. In any case, using any other implementation, one would still need to constraint the p-values in [1/nperm, 1-1/nperm]. I never used this code from moabb, so your guess is better than mine :) My remark was more general : it is a better habit to use code that is already validated than re-writing custom implementations — Reply to this email directly, view it on GitHub<#684 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABPQYJZV5KNNHJ53XFW3POL2PMWCHAVCNFSM6AAAAABWWNWHOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJTGQZTSNJQHA. You are receiving this because you were mentioned.Message ID: @.***>

Yeah, and who knows how the problem has been handled before, as it is not uncommon that a pipeline gives worse results as compared to another for all subjects, which by definition results in a p-value of 1 when running a paired t-test. The Fisher combination function does not have this limitation (and probably that is why Fisher did not propose the Liptak combination function) since it is based on the sum of logarithms of the p-values; now, for a permutation test those are in [1/nperm, 1], thus the sum is always defined. However, using the Fisher combination function weights cannot be used anymore and if i understand well currently MOABB weights the combinations to account for the different numerosity in different databases. That is why, all in all, if you wish to stick to the current methodology for comparing pipelines across databases in MOABB, i don't see for the moment a better soluation than artificially restricting the p-values in [1/nperm, 1-1/nperm].

@toncho11
Copy link
Contributor

One important thing for the future test is that we had undeterministic error. That's it. For the same code and results data it will sometimes produce p values equal or bigger than 1. This is probably because of the random test. So it was actually hard to detect. So if you make a test ... maybe the test should be run 20 times in order to be sure that the handling of the p values is correct. Because even before there were guards for the p values, but they were not working. Or this was the idea - to warn you that these p values are strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants