Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failing randomly with NaN arrays in state_dict_io test #368

Open
jdeschamps opened this issue Jan 22, 2025 · 3 comments
Open

CI failing randomly with NaN arrays in state_dict_io test #368

jdeschamps opened this issue Jan 22, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@jdeschamps
Copy link
Member

jdeschamps commented Jan 22, 2025

Describe the bug

A problem that had disappeared is not present again in #365: some tests randomly fail with NaN arrays.

 ______________________________ test_state_dict_io ______________________________

tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_state_dict_io0')
ordered_array = <function ordered_array.<locals>._ordered_array at 0x7f5f18672020>
pre_trained = PosixPath('/tmp/pytest-of-runner/pytest-0/test_state_dict_io0/checkpoints/last.ckpt')

    def test_state_dict_io(tmp_path, ordered_array, pre_trained):
        """Test exporting and loading a state dict."""
        # training data
        train_array = ordered_array((32, 32))
        path = tmp_path / "model.pth"
    
        # instantiate CAREamist
        careamist = CAREamist(source=pre_trained, work_dir=tmp_path)
    
        # predict (no tiling and no tta)
        predicted_output = careamist.predict(train_array, tta_transforms=False)
        predicted = np.concatenate(predicted_output, axis=0)
    
        # save model
        _export_state_dict(careamist.model, path)
        assert path.exists()
    
        # load model
        _load_state_dict(careamist.model, path)
    
        # predict (no tiling and no tta)
        predicted_loaded = careamist.predict(train_array, tta_transforms=False)

>       assert (predicted_loaded == predicted).all()
E       assert False
E        +  where False = <built-in method all of numpy.ndarray object at 0x7f5f186ca6d0>()
E        +    where <built-in method all of numpy.ndarray object at 0x7f5f186ca6d0> = [array([[[[na...type=float32)] == array([[[[nan...dtype=float32)
E             
E             Full diff:
E             + [
E             - array([[[[nan, nan, nan, ..., nan, nan, nan],
E             +     array([[[[nan, nan, nan, ..., nan, nan, nan],
E             ? ++++
E                        [nan, nan, nan, ..., nan, nan, nan],
E                        [nan, nan, nan, ..., nan, nan, nan],
E                        ...,
E                        [nan, nan, nan, ..., nan, nan, nan],
E                        [nan, nan, nan, ..., nan, nan, nan],
E             -          [nan, nan, nan, ..., nan, nan, nan]]]], dtype=float32)
E             +          [nan, nan, nan, ..., nan, nan, nan]]]], dtype=float32),
E             ?                                                                +
E             + ].all

tests/model_io/test_bmz_io.py:32: AssertionError

If we re-run the tests, then they often pass again. It is very inconsistent.

Anybody remembers if we ever figured it out? Did this happen locally as well? Or is this PR changing something that triggers it more?

@CatEek @melisande-c @veegalinova

@jdeschamps jdeschamps added the bug Something isn't working label Jan 22, 2025
@jdeschamps jdeschamps changed the title CI failing with NaN arrays randomly. CI failing randomly with NaN arrays in state_dict_io test. Jan 22, 2025
@melisande-c
Copy link
Member

Did we find (only empirically) that np.testing.assert_array_equal was more stable?

@melisande-c
Copy link
Member

But also I can't see any of the tests failing for that reason?

@jdeschamps
Copy link
Member Author

jdeschamps commented Jan 22, 2025

But also I can't see any of the tests failing for that reason?

Yeah because I manually run them again so that they pass.... I will edit the main post with this info.

@jdeschamps jdeschamps changed the title CI failing randomly with NaN arrays in state_dict_io test. CI failing randomly with NaN arrays in state_dict_io test Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants