You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the workflow engine itself cannot be scaled horizontally. The issue that prevents this is that the workflow invocations are not assigned to specific workflow engine instance. If you scale the engine right now, the multiple workflow engines will use the event queue to fetch all active invocations and try to continue executing them all, which leads to duplicate function executions.
To fix we need the following:
assign an "owner" to each invocation. This can be done by adding an owner explicitely to the workflow invocation model, or adding workflow engine namespaces to the event queue implementation
a way for workflow engine to recognize orphaned invocations. If might happen that during downscaling a workflow engine instance is killed while still managing invocations. Other instances should be able to recognize this and hand-off the orphaned invocations to another workflow engine instance.
cc @thenamly
The text was updated successfully, but these errors were encountered:
Reason for horizontally scaling workflow engine is:
Workflow engine's single pod sometimes uses memory up to 8 GB which causes OOM for smaller nodes and unavailability until recovery of workflow engine pod.
With multiple pods of workflow engine can deal with more concurrent connections
If there is a faulty pod, it doesn't takes down all workflow engine pods
Currently the workflow engine itself cannot be scaled horizontally. The issue that prevents this is that the workflow invocations are not assigned to specific workflow engine instance. If you scale the engine right now, the multiple workflow engines will use the event queue to fetch all active invocations and try to continue executing them all, which leads to duplicate function executions.
To fix we need the following:
cc @thenamly
The text was updated successfully, but these errors were encountered: