Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to dismissing a deployment #2522

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

sagerb
Copy link
Collaborator

@sagerb sagerb commented Jan 10, 2025

Intent

Resolves #2179

Type of Change

    • Bug Fix
    • New Feature
    • Breaking Change
    • Documentation
    • Refactor
    • Tooling

Approach

The challenge to resolving this issue was determining a way to reflect only the latest status within the UX when the agent could have multiple deployment threads active with the same Connect server for the same content target including where that status may be unknown due to a dismissed (cancelled) deployment.

I found that there were a few base issues with our existing implementation:

  1. The deployment file could not indicate that the last deployment was aborted.
  2. The content-record/deployment file had no concept of which deployment (when there are multiple in progress) it was trying to represent.
  3. We had no API for the agent which could be used to indicate that the user no longer wants status on a deployment (our version of canceling a deployment without Connect supporting cancel).

It was simple to add these concepts. Two fields were added to the content-record/deployment file:

  • aborted_at: a new field that is blank unless the deployment was aborted (or dismissed in our case).
  • local_id: a new field that mimics the value used within the agent and the UX to indicate a particular deployment run.

We also added a new cancel API endpoint to the agent. (POST /api/deployments/$NAME/cancel/$LOCALID)

Using these capabilities, I was then able to implement the functionality we needed to solve the challenges mentioned above:

  1. When a new deployment is initiated, the corresponding deployment file needs to be initialized to include the active thread's local_id.
  2. User cancellation of an active deployment flow from the UX needs to update the deployment file with an aborted status if the cancellation is for the same deployment thread of the latest deployment run (same local_id).
  3. A deployment thread should not update the deployment file with results or errors if the deployment file is for the same deployment thread of the latest deployment run (same local_id) OR the deployment file shows that the user has canceled this thread's run.
  4. The UX needs to show the status of the last deployment aborted if it has been aborted by the user, with an unknown status of success or failure. (It should also not display a link to the content since we don't know if it even exists on the server.)

With the above functionality in place, the scenarios are solved:

  1. The user sees the appropriate status immediately when canceling a deployment in progress.
  2. The user also sees this canceled status if the extension is reloaded
  3. If the user cancels and immediately initiates a new deployment, the UX displays correctly with the latest deployment updating the deployment file.

It is important to understand that the current implementation allows deployments in progress to continue to normal completion (success or failure). The big difference here is that if a deployment is dismissed, we no longer stay in sync with what that thread is doing and do not attempt to cache its status into any of the files.

Screenshot:
2025-01-09 at 5 05 PM

User Impact

Users will now have a better experience when dismissing a deployment in progress.

Automated Tests

Unit tests have been added to validate the difference combinations of update logic for the deployment file.

Existing unit tests have been updated with the function call signature differences introduced as part of this PR.

Directions for Reviewers

The base functionality on the agent resides within the internal/deployment/deployment.go module. It is within this file that the logic has been implemented to control the update requests for writing the deployment file.

Minor changes have been made to internal/publish/publish.go to properly initialize a deployment file with the latest local_id, and to support cancellation.

A schema change was included within this PR. We'll need to push up an update to S3 once we merge this PR.

The UX was updated to understand the cancellation status within extensions/vscode/src/views/deployProgress.ts.

The majority of remaining changes present with this PR reflect the impact of the function change signature for deployment.WriteFile().

Checklist

@sagerb sagerb self-assigned this Jan 10, 2025
…d filtering to include some log messages from connect which had the local ID value in snake_case rather than camelCase
Copy link
Collaborator

@dotNomad dotNomad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a handful of comments - overall this looks fantastic though.

I did encounter one bit of weirdness where I am getting the below error when I deploy, cancel, then deploy again quickly:
image

I mentioned this already and I believe you have it fixed up in the follow-ups. I'll take a look at those right now.

internal/deployment/deployment.go Show resolved Hide resolved
internal/publish/publish.go Show resolved Hide resolved
internal/publish/publish.go Show resolved Hide resolved
Comment on lines 175 to 181
if existingDeployment.LocalID != "" && existingDeployment.LocalID != string(localId) {
log.Debug("Skipping deployment record update since existing record is being updated by another thread.")
return existingDeployment, nil
}
if existingDeployment.AbortedAt != "" {
log.Debug("Skipping deployment record update since deployment has been cancelled")
return existingDeployment, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic breakdown of the two schema updates and their usage here.

AbortedAt was easy to understand as an additional since we want to communicate to the home view that a deployment was aborted, and not conflating deployedAt with abortedAt is a great way to capture both the state and the timestamp.

localID was less easy to understand the initial need, but because we can have any number of deployments occurring at once for any given deployment we need some way to know which we care about for updates to the file and therefor the rest of the extension.

The only alternative I found to checking localID on file, after a deep look, was cancelling our goroutine that was handling the deploy:

We could in theory use a channel or a shared context for each goroutine we have, but that would mean knowing where to bail out in all of the steps inside of PublisherDirectory and its sub-function calls. That option would remove the localID necessity in the deployment record, but feels a lot more difficult to maintain compared to avoiding writes like you are above.

To achieve this functionality we definitely need to either:

  • ignore certain local ID's
  • cancel related goroutines prior to acting on deployment records

In my mind, since the deployment record isn't user facing in the same way as the configuration, I think the balance between maintainability and schema complexity was managed very well here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks dude!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localID was less easy to understand the initial need, but because we can have any number of deployments occurring at once for any given deployment we need some way to know which we care about for updates to the file and therefor the rest of the extension.

This might be separate from the specifics of this PR, but what circumstances would generate greater than one deployment at once for a given deployment? Is it possible for someone to do this without having cancelled first? Or is this about possibly having multiple vscode/positron windows open at once and having multiple deployments running simultaneously?

Copy link
Collaborator

@dotNomad dotNomad Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what circumstances would generate greater than one deployment at once for a given deployment?

If a deployment is "cancelled" and another is started there are two deployments occurring and being watched by the binary - the original "cancelled" one and the new one. This occurs as both goroutines in the binary are still running. The change introduced here prevents the first (the "cancelled" one) from causing file writes.

Is it possible for someone to do this without having cancelled first?

It is possible to have multiple deployments occurring on Connect, however some of the ways to accomplish this in the VS Code extension also stop the binary (goroutines handling deploying) - for example refreshing the VS Code window or restarting VS Code.

The example you laid out where two VS Code windows are open at once is another option to make two happen simultaneously. That would cause multiple file writes and some timing issues, but that is pretty out of scope from what this PR is trying to accomplish - solving the multiple running deploys from a single window.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that info — yeah if we're stuck with the way the goroutines are being dispatched and changing / fixing that is too big of a lift / causes other issues that's ok.

I mainly was asking because it did seem that the parallel deployment is YAGNI and then we are needing to do more work to work around that. Again — that might be the expedient thing here, but it smelled funny to me.

The example you laid out where two VS Code windows are open at once is another option to make two happen simultaneously. That would cause multiple file writes and some timing issues, but that is pretty out of scope from what this PR is trying to accomplish - solving the multiple running deploys from a single window.

Yeah, I agree that we shouldn't actively support or encourage this, but it is an area where adding file-based identifiers like this make more problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that info — yeah if we're stuck with the way the goroutines are being dispatched and changing / fixing that is too big of a lift / causes other issues that's ok.

The deploy goroutines need a shared context which isn’t difficult, but the difficulty/maintenance of checking for cancellation to return early, or check if we are in the stream we care about is much more difficult. Mainly because every step where we potentially update the file, send events, etc would need a check. It is certainly possible, but the work here isolates the checking to only the file write which was a much easier bandage on the problem.

I mainly was asking because it did seem that the parallel deployment is YAGNI and then we are needing to do more work to work around that

I believe we need to keep support for parallel deployment to keep #2057 resolved unfortunately.

Copy link
Collaborator Author

@sagerb sagerb Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly was asking because it did seem that the parallel deployment is YAGNI and then we are needing to do more work to work around that.I mainly was asking because it did seem that the parallel deployment is YAGNI and then we are needing to do more work to work around that.

It's not so much of a design to support parallel deployments (I agree that would be YAGNI) but instead to protect against parallel threads in the same agent from updating common records. (We determined it was too expensive to abort these go functions that have been initiated.) The simple concept here is that one of them is always the owner of the deployment (for a given content item) and that ownership is established when a deployment is initiated.

One idea did come to mind as I was reading this thread: The code that we're having to implement to check ownership could certainly be removed from the file itself and instead simply rely upon a singleton interface within the agent itself. This would remove the need to add the local_id into the deployment record file, and remove that from the schema update. After talking this out with @dotNomad, we decided it had enough benefits to warrant me making that change within a fourth, child PR tomorrow. I think it does simplify some of this implementation, which is great.

extensions/vscode/src/views/deployProgress.ts Outdated Show resolved Hide resolved
@sagerb
Copy link
Collaborator Author

sagerb commented Jan 11, 2025

I have a handful of comments - overall this looks fantastic though.

I did encounter one bit of weirdness where I am getting the below error when I deploy, cancel, then deploy again quickly: image

I mentioned this already and I believe you have it fixed up in the follow-ups. I'll take a look at those right now.

Yes, that is fixed in #2525

@sagerb sagerb requested a review from dotNomad January 13, 2025 19:26
@sagerb
Copy link
Collaborator Author

sagerb commented Jan 13, 2025

@dotNomad Changes pushed. Let me know if you have any questions in my responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When canceling a deployment I immediately get a Last Deployment Successful message
3 participants