You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should really only end a running job if the program crashes on the executor or the user explicitly calls destroy_job.
When scheduling fails
On a call to a writing function or to collect, recorded lazy computation is scheduled and executed. If the scheduling fails, we currently destroy the job. If you're using Banyan Julia from a notebook, this is undesirable since then you have to restart the job (can take 1-2 minutes) just because a single cell failed. Instead, we should make it so that a call to a writing function or to collect does not modify global state but will roll back in the case of a failure.
When an exception occurs on the cluster
If the job crashes in the backend, we kind of have to destroy the job. But if there's just an exception that occurs, we should ideally propagate that back to the client side and roll back in the same way that we would roll back in the case of a scheduling failure.
The text was updated successfully, but these errors were encountered:
We should really only end a running job if the program crashes on the executor or the user explicitly calls
destroy_job
.When scheduling fails
On a call to a writing function or to
collect
, recorded lazy computation is scheduled and executed. If the scheduling fails, we currently destroy the job. If you're using Banyan Julia from a notebook, this is undesirable since then you have to restart the job (can take 1-2 minutes) just because a single cell failed. Instead, we should make it so that a call to a writing function or tocollect
does not modify global state but will roll back in the case of a failure.When an exception occurs on the cluster
If the job crashes in the backend, we kind of have to destroy the job. But if there's just an exception that occurs, we should ideally propagate that back to the client side and roll back in the same way that we would roll back in the case of a scheduling failure.
The text was updated successfully, but these errors were encountered: