Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for Workflows #1887

Closed
taylordowns2000 opened this issue Mar 13, 2024 · 15 comments · Fixed by #2792
Closed

API for Workflows #1887

taylordowns2000 opened this issue Mar 13, 2024 · 15 comments · Fixed by #2792
Assignees
Labels

Comments

@taylordowns2000
Copy link
Member

taylordowns2000 commented Mar 13, 2024

WCS has a use case where they want to create and update workflows(with steps) from a step. This was available in V1, and it blocks migration to v2 for this edge case.

Image

A typical use case can be to build a new workflow every time a new form is created in CommCare.

We need to expose an API that developers can call from OpenFn or an external application to:

  • Get a list of workflows (incl. triggers, jobs, paths),
  • Create workflows, and
  • Update workflows (in their entirety… the whole thing at once.)

All endpoints should be authenticated with the user’s personal access token (Whatever is best, technically.)

We also need to add documentation on how to use the Workflow JSON API.

No changes to the UI are required here.

Two important things to note:

  1. In v1, we used the openfn adaptor to do this. While we may update this adaptor to use a personal access token for v2, that's probably out of scope for now.
  2. Our GovStack compliant JSON api for jobs (lib/lightning_web/controllers/api/job_controller.ex) and runs (lib/lightning_web/controllers/api/run_controller.ex) is now outdated. Please consider how the new workflows REST api may (??) replace the outdated jobs_controller and how it does or does not relate to the upcoming workorders/runs API for GovStack. Update RunController and JSON api for GovStack compliance #1656
@taylordowns2000 taylordowns2000 moved this to New Issues in v2 Mar 13, 2024
@christad92 christad92 moved this from New Issues to Backlog in v2 Mar 17, 2024
@taylordowns2000 taylordowns2000 changed the title Expose JSON API so that OpenFn workflows can be used to build OpenFn workflows API for Workflows Nov 22, 2024
@theroinaochieng
Copy link
Collaborator

@josephjclark added you to collaborate with @jyeshe

@jyeshe jyeshe moved this from Ready to In progress in v2 Dec 16, 2024
@jyeshe
Copy link
Member

jyeshe commented Dec 16, 2024

@taylordowns2000 for the update on a first draft/version, if there is any Presence on the workflow I am going to reject the update from the API okay? It's the simplest and safest validation for now and would give us time to think about prioritization rules involving the API.

@josephjclark
Copy link
Contributor

josephjclark commented Dec 16, 2024

The Workflow structure used by the API - the JSON body to GET and POST - needs to be consistent with other representations within Lightning.

To that end, I recommend we use the same { triggers, jobs, edges, options } format used by the workflow diagram and the Runs that are compiled for the worker (obviously some of the run-specific properties, like dataclip_id, would be excluded).

I think that means the workflows schema in workflow.ex is the correct structure. Minus runs and snapshots.

There's also the question of options: we recognise a bunch of options on the run which I think are saved to the workflow?

type LightningPlanOptions = {
  run_timeout_ms?: number;
  sanitize?: SanitizePolicies;
  start?: StepId;
  output_dataclips?: boolean;

  run_memory_limit_mb?: number;
  payload_limit_mb?: number;
};

I suppose we could exclude options for now and just use default values

Quick note that the CLI and Runtime use a different JSON workflow representation internally. That structure is a bit easier to write out by hand, meaning that WCS for now will have to write out a slightly more verbose JSON object to create their new workflows. This can be improved later when we update the openfn adaptor (and if we add CLI support (although this is what deploy does so I don't know if we need more) we can simplify the structure there)

@taylordowns2000
Copy link
Member Author

taylordowns2000 commented Dec 16, 2024 via email

@josephjclark
Copy link
Contributor

@taylordowns2000 Yes, worth bearing in mind.

For today we're doing a simple bulk upsert, where you POST the whole workflow and it'll all be saved.

The workflow editor will want to PUT or PATCH a partial diff, probably compliant with JSON patch to maintain parity with the current editor. I suggest we do this as a second pass in a couple of weeks.

@josephjclark
Copy link
Contributor

Roger and I have agreed on this:

GET workflows/:projectId  // get workflows for project
GET workflows/:projectId/:workflowId // get one workflow

POST workflows/:projectId // create a new workflow
PUT workflows/:projectId/:workflowId // update a workflow

This explicitly surfaces the data model - in Roger's words, "the project id is king".

We do not want an API which works like "we'll return all workflows that you're allowed to see with your credential), because it's arbitrary and PAT tokens aren't bound to a project or anything. So instead, we require that a project id must be included in the request (even if it's technically redundant)

POST and PUT must include a whole workflow object. The project id in the body must match the id in the path.

We'll probably add a PATCH later which accepts a diff in the body (probably JSON Patch).

You can easily imagine a GET/POST/PUT projects/:projectId API being added later.

@jyeshe
Copy link
Member

jyeshe commented Dec 17, 2024

Personally I would go with the proposed REST api having the GET /workflows a required project_id query parameter (it would be possible to specify that on an Open API spec). I believe a REST-like one would make it easier the adoption of the API by external applications.

On the other hand, this way having the :project_id in all paths makes it clear that the API usage is always bound to a project. So I agree it's an important aspect for the dev to be aware considering the PAT has a user scope not a project one. Another justification was that it wouldn't make much sense on a Workflow Job to receive workflows from another project. With the REST it would as well but I believe making it explicitly required will help some group of devs and we won't need a bot to reply "Please refer to the docs" (((:

Joe (@josephjclark) here is a draft with the GET 1, GET many and POST for creation in case it helps anyway on the client side: #2792
Image

@jyeshe
Copy link
Member

jyeshe commented Dec 18, 2024

@josephjclark I was thinking about the external applications API usage and to enable their systems to be fully automated, without manual input for the projects in this case, currently they would need a GET /workflows regardless of the project. May we go with the REST version making the project_id query param optional? This way the API could serve flexibility the Runtime and other apps whose constraints we might be not very sure of yet.

Sorry about it but another detail that passed through my eyes I am noticing now is that considering a desired replacement for the existing jobs API it behaves exactly like this. There is an optional project_id on the index (GET all) function.

@josephjclark
Copy link
Contributor

josephjclark commented Dec 18, 2024

Refinement of this after a call with Taylor. The project id should be at the base of the path always. That fixes everything and ensures the project id is still king.

The workflow API will be:

GET    api/projects/:projectid/workflows
GET    api/projects/:projectid/workflows/:workflowId
POST   api/projects/:projectid/workflows (return workflow id)
PUT    api/projects/:projectid/workflows/:workflowId
PATCH  api/projects/:projectid/workflows/:workflowId (JSON patch)

We could maybe implement "get projects" down at the root

GET  api/projects?billingaccount=
GET  api/projects/:projectid

Later we might add other APIs along these lines:

GET  api/projects/:projectid/runs
POST api/projects/:projectid/runs // start a run?
GET  api/projects/:projectid/runs/:runId
PUT  api/projects/:projectid/steps/:runId ?
GET  api/projects/:projectid/workorders
GET  api/projects/:projectid/workorders/workorderId

@jyeshe jyeshe moved this from In progress to In review in v2 Dec 19, 2024
@josephjclark
Copy link
Contributor

One question coming out of the PR is: what to do about IDs for new steps and edges in the body?

The strict solution is for Lightning to demand that all ids in the body are UUIDs, putting the emphasis on the client to generate them.

I am uncomfortable with this, simply because for many clients it's very hard to do. I also think it's a bit of an artificial problem and pretty easy for the server to solve.

In our WCS use-case for example, we don't provide a good mechanism to job code right now to generate a UUID. Like it's literally impossible. We'll need to extend the adaptor for API support. And even then, creating workflows with UUIDs is kinda hard because you need to not just generate an id for each step, but you need to cross-reference it elsewhere in the workflow.

More generally, the way the CLI works is that it's not very fussy about defining ids. It's quite happy for me to run a workflow with an id like step-2, which is the sort of id you want to use when creating workflows by hand. If you're used to writing workflows by hand and running them with the CLI, and you want to now POST those workflows to Lightning, you'll get a bit of a shock.

I appreciate that this is a technical and low-level API, but it is still a human facing one.

If I wanted to support a user-friendly API to allow third-parties to create Workflows, I wouldn't be strict with them about ID formats. I'd ask that IDs be internally consistent but arbitrary, and I would swap them out before loading them into Lightning's database.

So I'd really like to ask for a rule that says: when creating new steps or edges, the id you submit, and any references to it, will be replaced by a UUID by the server. Existing steps or edges must use the server's UUID (the mapping is not persisted). An updated workflow will be returned to you (an id map would be more efficient in the case of large workflows, maybe we should just do that).

Maybe I should speak to someone in more detail tomorrow about WCS and make sure I'm super sure of the requirements for that specific workflow.

@theroinaochieng theroinaochieng moved this from In review to In progress in v2 Jan 7, 2025
@josephjclark
Copy link
Contributor

I've gotten hold of the WCS workflow and I have a few observations:

  • All WCS needs is to GET a whole workflow and PUT it back. We have zero need for any kind of incremental/diffing API right now, and I suggest we cut cloth and raise an issue to do it later (via PATCH and a probably JSONPatch)
  • In V1, the way this worked was you'd POST a job or trigger without an ID, and you'd be returned an id by the server. The WCS workflow expects this behaviour (although that'll all have to be re-written anyway).
  • The WCS workflow for v1 creates multiple triggers within the same workflow. Is that going to be disallowed by validation rules in this API? In v2 I would actually expect multiple workflows to be created to support multiple triggers, so maybe that migrated project just needs to work differently

I still think the server should handle id replacement. But we could also solve this in the openfn adaptor, providing a clean interface to create jobs and edges with valid ids. That solves my problem of having a user-friendly solution. That is quite a lot of work in the adaptor though and I'm not sure we have the resources to do it.

@josephjclark
Copy link
Contributor

TBH I really don't think server-side id replacement is a big deal and we just should push on and do it. It could be implemented for about the same effort that I've spent arguing for it 🤷

@theroinaochieng theroinaochieng moved this from In progress to Blocked in v2 Jan 10, 2025
@theroinaochieng
Copy link
Collaborator

This issue is moved to blocked pending discussion with @stuartc on Monday to discuss:

  1. the way forward with ID replacement (server-side or adaptor implementation)
  2. a decision on using JSON patch on the Workflows API and the React frontend

@josephjclark josephjclark moved this from Blocked to In progress in v2 Jan 14, 2025
@jyeshe jyeshe moved this from In progress to In review in v2 Jan 15, 2025
@github-project-automation github-project-automation bot moved this from In review to Done in v2 Jan 15, 2025
@theroinaochieng
Copy link
Collaborator

Tracking documentation effort by @josephjclark for the Workflows API:

Input on docs requirements from the implementation team, we're keeping documentation minimal. Further iteration on the documentation can be scheduled on need basis

@jyeshe
Copy link
Member

jyeshe commented Jan 17, 2025

And for a full API spec, it can be found here:
https://github.com/OpenFn/lightning/blob/main/priv/static/workflow-api.yaml

@stuartc this might need to be publicly available via endpoint or ideally by a hosted API spec UI. I'd suggest postman or readme.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants