-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large DataFrame in WASM causes infinite loop #3599
Comments
Thanks for the example. It does render the first time, but fails even on the first re-run for me. I'll continue to investigate, but tagging the pyodide maintainers if they have ideas (@hoodmane, @ryanking13, @agriyakhetarpal) |
Are you loading Pyodide from jsdelivr? Can you use the debug build so we can get symbols in the traceback? |
It is of course the prerogative of V8 isolates to say no to allocations of any size. I'm not personally familiar with the chromium code that determines how much memory a webpage is allowed to allocate but I think it's a bit complicated. |
@hoodmane, yea from jsdeliver. is there a debug build hosted on jsdeliver? is that with version |
Yes, for instance: |
@hoodmane, i ran this with debug locally and did not get additional logging or info |
If it helps, I can re-produce this in the pyodide REPL as well: First paste this code in: import datetime
import tzdata
import polars as pl
def gen_test_df():
timestamps = pl.datetime_range(
start=datetime.datetime(2024, 1, 1, tzinfo=datetime.UTC),
end=datetime.datetime(2025, 1, 1, tzinfo=datetime.UTC),
interval="60m",
eager=True,
)
return pl.DataFrame(
data={
"timestamp": timestamps,
"value": pl.Series([i % 2 for i in range(len(timestamps))]),
},
)
df = gen_test_df()
print(df)
def resample(df: pl.DataFrame, every: str) -> pl.DataFrame:
return (
df.group_by_dynamic(
index_column="timestamp",
every=every,
)
.agg(
pl.col("value").last(),
)
.upsample(time_column="timestamp", every=every)
.fill_null(strategy="forward")
)
res_df = resample(df, every="1s")
print(res_df) Then |
Right, this is a problem that bites us occasionally: we only use debug symbols for the Python interpreter, not for packages. The traceback you have is in polars frames so we would need a debug build of polars. But the emscripten polars is built against a fork of llvm so I'm not even really sure how to make it myself. Annoying. But presumably we should be able to set a breakpoint in the |
|
Never mind. I could reproduce it by calling |
I have tested some more and are able to reproduce the problem with a small DataFrame of only ~8k rows, or an estimated 0.13 mb size, where it consistently fails on the 8th run instead of the 2nd (including the initial run). To reproduce the issue with a small dataframe, use the original included code, but comment out def resample(df: pl.DataFrame, every: str) -> pl.DataFrame:
return (
df.group_by_dynamic(
index_column="timestamp",
every=every,
)
.agg(
pl.col("value").last(),
)
# .upsample(time_column="timestamp", every=every)
# .fill_null(strategy="forward")
) Running |
Describe the bug
I encountered some unexpected behavior while attempting to upsample a DataFrame by a large amount. I can reliably reproduce the behavior in WebAssembly notebooks, but never in Non-WebAssembly notebooks. I have verified that this occurs both in Community Cloud Notebooks and Notebooks generated using the "Create WebAssembly link" functionality.
Problem Description
When upsampling a polars DataFrame from ~8,000 rows to ~32,000,000 rows in a WASM notebook the cell will usually run fine the first time, but if I rerun the cell a few times it will suddenly get caught in an infinite loop and never complete.
Reproducability
I've included reproducable code below, but here is also a parmalink to a notebook that reproduces the problem.
https://marimo.app/l/yl88gn
Try re-running the last cell a few times to trigger the bug. Note that reducing the target sample rate, i.e. using
every="15m"
for instances, reduces and at some point eliminates the problem.Environment
Code to reproduce
The text was updated successfully, but these errors were encountered: