Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler Longhaul Tests #238

Merged
merged 15 commits into from
Nov 20, 2024
Merged

Conversation

cicoyle
Copy link
Contributor

@cicoyle cicoyle commented Oct 21, 2024

  1. Robust Jobs API test (all with ttl set)
  • schedule 100 oneshot jobs indefinitely (repeat = 1)
  • schedule 100 indefinite jobs indefinitely (repeat not set, trigger every 30s)
  • schedule repeat-job job indefinitely (repeat = 5, trigger every 1s due immediately)
  • indefinitely schedule and delete a create-delete-job job (repeat = 1, trigger every 1s)
  1. Fun Scheduler Actor Reminders test that simulates a player session where the players health decays over time and then simulates them getting a health pack or the like where their health is increased periodically. Once the players health gets to 0, the player dies, the reminders (health decay reminder and health increase reminder) are unregistered and then the player is revived and the reminders are started again. Note the SchedulerReminders config is set.

  2. Workflow app that repeatedly starts, pauses, and resumes workflows. It defines a TestWorkflow that interacts with activities, handles external events, and logs the workflow's progress through stages. The app continuously monitoring and terminating workflows as necessary. Note the SchedulerReminders config is set.

I added instructions for how to run locally.

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
@cicoyle cicoyle marked this pull request as ready for review October 30, 2024 15:05
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing @cicoyle ! I ran the local multi app run and it worked great. Before going into more detailed review of the apps I'll share a few notes so you can work on them in the meantime:

Also please doublecheck if we need to specify persistance for etcd, or are we using the default. I don't remember what we decided last.

Signed-off-by: Cassandra Coyle <[email protected]>
Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting feedback for the scheduler jobs app. I'm checking the other two tomorrow.

scheduler-jobs/README.md Outdated Show resolved Hide resolved
scheduler-jobs/scheduler-jobs.go Outdated Show resolved Hide resolved
scheduler-jobs/README.md Outdated Show resolved Hide resolved
scheduler-jobs/scheduler-jobs.go Outdated Show resolved Hide resolved
Comment on lines +221 to +222
log.Println("waiting a few seconds to let connections establish")
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the sdk provide a way to know this for certain? If not, we should probably think about adding one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im uncertain. I think I added this bc in my testing locally I noticed a discrepancy where I thought some jobs got dropped if they were scheduled too quickly upon startup

scheduler-jobs/scheduler-jobs.go Outdated Show resolved Hide resolved
scheduler-jobs/README.md Outdated Show resolved Hide resolved
scheduler-jobs/scheduler-jobs.go Outdated Show resolved Hide resolved
scheduler-jobs/scheduler-jobs.go Outdated Show resolved Hide resolved
scheduler-jobs/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@elena-kolevska elena-kolevska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding review for the scheduler-workflow app.
Looking at actor reminders next

scheduler-workflow/workflow.go Outdated Show resolved Hide resolved
scheduler-workflow/workflow.go Outdated Show resolved Hide resolved
@mikeee mikeee mentioned this pull request Nov 19, 2024
38 tasks
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
@cicoyle
Copy link
Contributor Author

cicoyle commented Nov 19, 2024

Thanks for the review @elena-kolevska - I've updated the code accordingly :) Ready for your review again

scheduler-actor-reminders/client/player-actor-client.go Outdated Show resolved Hide resolved

This tests the Scheduler for the underlying storage for Actor Reminders.

## How To Run the Code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to run the code, I keep getting the error:

== APP == dapr client initializing for: 127.0.0.1:65181
== APP == 2024/11/19 23:05:27 error starting health increase reminder: error invoking register actor reminder playerActorType/player-1: rpc error: code = PermissionDenied desc = operations on actor reminders are only possible on hosted actor types
DEBU[0001] api error: code = Internal desc = error invoke actor method: rpc error: code = Canceled desc = context canceled  app_id=player-actor-client instance=Elenas-MacBook-Pro.local scope=dapr.runtime.grpc.api type=log ver=1.14.4

These are the commands I used:

dapr run --app-id player-actor --app-port 3007 --log-level debug --config ../dapr/config.yaml -- go run player-actor.go

and

dapr run --app-id player-actor-client --log-level debug --config ../dapr/config.yaml --placement-host-address=localhost:50007,localhost:50008,localhost:50009 -- go run player-actor-client.go

I tried running with 1.14.4 and latest builds from master.
What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to run it as-is using the instructions from my readme and do not see that error. Did you try the exact commands from my readme? Im also using v1.14.4

I have seen that error before- outside of the context of my PR here -in the wf scheduler e2e PR in CI, but am not able to repro the issue you are seeing and not familiar enough actors to know what the issue is off the top of my head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I would try the commands noted in the readme since thats exactly what I've tested and what works for me. If that doesn't work you you, then I would try leaving --placement-host-address unset and see if that changes things. I noticed you are running placement in HA mode from your command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did run the original commands as they are in the read me, and I got this error:

== APP == 2024/11/20 17:25:02 error invoking actor method GetUser: error invoking binding playerActorType/player-1: rpc error: code = Internal desc = the state store is not configured to use the actor runtime. Have you set the - name: actorStateStore value: "true" in your state store component file?

Then I removed the --app-port 3008 argument for the client and got the error I shared above. Really weird. I'll look into it some more.

scheduler-actor-reminders/Dockerfile-client Outdated Show resolved Hide resolved
.github/workflows/scheduler-actor-reminders-client.yml Outdated Show resolved Hide resolved
.github/workflows/scheduler-actor-reminders-server.yml Outdated Show resolved Hide resolved
deploy/dapr-multi-app/dapr.yaml Show resolved Hide resolved
- appID: scheduler-actor-reminders-client
appDirPath: ../../scheduler-actor-reminders/client
appPort: 3008
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
appPort: 3008

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ik its a client, but I think for the conn to work with dapr this is still needed.

deploy/dapr-multi-app/dapr.yaml Show resolved Hide resolved
Comment on lines 49 to 50
- name: dapr
containerPort: 3008
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: dapr
containerPort: 3008

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed for dapr to be able to comm with my app in k8s

longhaul-test/scheduler-workflow-deploy.yml Show resolved Hide resolved
@elena-kolevska
Copy link
Contributor

elena-kolevska commented Nov 20, 2024

Thanks for updating @cicoyle , I went through the deployment-related files in the meantime and they mostly look good. I corrected the things I saw, you can just accept the changes and we'll see if we got everything right when we try to deploy.

The only thing that's left is verifying that the reminders app problem I ran into is just some config thing on my side. Does the program run for you right now? I tried it on the earlier version, before your commits from today.

I'll look into your updates first thing tomorrow.

cicoyle and others added 3 commits November 20, 2024 08:40
Co-authored-by: Elena Kolevska <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
@yaron2 yaron2 merged commit b0653a0 into dapr:master Nov 20, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants