Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] predict_insample with multivariate models gives NaN values on last input_size #1269

Closed
carusyte opened this issue Feb 19, 2025 · 6 comments
Assignees
Labels

Comments

@carusyte
Copy link

What happened + What you expected to happen

It's unclear what the expected result of predict_insample should look like. In previous version, it provided all predicted values based on the training data provided during fit. However using the latest commit from main branch, the last input_size sample prediction is NaN instead.

Versions / Dependencies

Main branch with commit 939056c

Reproduction script

import pandas as pd
import numpy as np
import logging
import torch
from neuralforecast.core import NeuralForecast
from neuralforecast.models import TSMixerx

# Prep dummy data
start_date = "2024-06-01"
end_date = "2025-02-19"
date_range = pd.date_range(start=start_date, end=end_date, freq="B")
np.random.seed(0)
df = pd.DataFrame(
    {
        "unique_id": "dummy",
        "ds": date_range,
        "y": np.random.randn(len(date_range)),
        "val1": np.random.randn(len(date_range)),
        "val2": np.random.rand(len(date_range)),
    }
)

logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
torch.set_float32_matmul_precision("medium")

horizon = 10
input_size = 30
val_size = 50
models = [
    TSMixerx(
        h=horizon,
        input_size=input_size,
        n_series=1,
        max_steps=100,
        random_seed=0,
        hist_exog_list=["val1", "val2"],
    ),
]

nf = NeuralForecast(
    models=models,
    freq="B",
    local_scaler_type="robust",
)
nf.fit(df=df, val_size=val_size)

Y_hat_insample = nf.predict_insample(step_size=horizon)
# Y_hat_insample = nf.predict_insample()

print(Y_hat_insample)

Issue Severity

High: It blocks me from completing my task.

@carusyte carusyte added the bug label Feb 19, 2025
@marcopeix
Copy link
Contributor

The issue happens only with multivariate models. Univariate models work fine. I need to investigate further.

@marcopeix marcopeix changed the title [Core] predict_insample gives NaN values on trailing sample data [Core] predict_insample with multivariate models gives NaN values on last input_size Feb 19, 2025
@marcopeix
Copy link
Contributor

marcopeix commented Feb 19, 2025

Note that this behaviour is also observed in previous versions for multivariate models. Our tests did not include multivariate models, so we missed it.

@carusyte
Copy link
Author

Note that this behaviour is also observed in previous versions for multivariate models. Our tests did not include multivariate models, so we missed it.

You're right, sorry for my incorrect descriptions. In fact previously, predict_insample may encounter error as recorded in this issue #1056, and I nearly forgot that I applied a stopgap from this #1056 (comment) just to continue my work. I'm not quite sure if this is the correct fix, anyway.

@carusyte
Copy link
Author

You may notice that the windows tensor shape began to lose some elements here in

if step == "predict":
predict_step_size = self.predict_step_size
cutoff = -self.input_size - self.test_size
temporal = batch["temporal"][:, :, cutoff:]

Whereas in the same function of _base_windows.py, it applied padding to the temporal:

if step == "predict":
initial_input = temporal.shape[-1] - self.test_size
if (
initial_input <= self.input_size
): # There is not enough data to predict first timestamp
padder_left = nn.ConstantPad1d(
padding=(self.input_size - initial_input, 0), value=0.0
)
temporal = padder_left(temporal)
predict_step_size = self.predict_step_size
cutoff = -self.input_size - self.test_size
temporal = temporal[:, :, cutoff:]

@marcopeix
Copy link
Contributor

Yes, saw that too! Seems that it fixes the issue. I'm testing if it impacts multivariate models' performance at all before pushing a fix. Thanks for pointing it out!

@marcopeix marcopeix linked a pull request Feb 21, 2025 that will close this issue
@marcopeix marcopeix self-assigned this Feb 21, 2025
@marcopeix
Copy link
Contributor

This is now fixed by the merge of #1023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants