Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker goes unhealthy after 10 minutes on sidekiq 7 #94

Open
w32-blaster opened this issue Jul 25, 2023 · 30 comments
Open

Worker goes unhealthy after 10 minutes on sidekiq 7 #94

w32-blaster opened this issue Jul 25, 2023 · 30 comments

Comments

@w32-blaster
Copy link

w32-blaster commented Jul 25, 2023

We are trying to upgrade our infrastructure to use sidekiq 7 and facing the following issue:
sidekiq_alive works fine when registering the worker in redis, then it becomes healthy.
After exactly 10 minutes it becomes unhealthy with this message: "Can't find the alive key" and pod gets restarted.
We have verified and during this 10 minutes period we can see it really alive, so it's not like our health check starts working after 10 minutes.
Any idea where to look to solve this problem?

@arturictus
Copy link
Owner

Hi @w32-blaster ,
Is the sidekiq-alive queue running? Can you check if it shows in the interface?

@w32-blaster
Copy link
Author

@arturictus do you mean to check it in the sidekiq UI?

@arturictus
Copy link
Owner

@w32-blaster, yes

@rakibulislam
Copy link

rakibulislam commented Aug 29, 2023

Hi @arturictus Yes, sidekiq-alive queue is running and we can see that in the sidekiq UI. Please let us know if you have any clue on why this might be happening with this sidekiq-alive gem when used with sidekiq 7.1. Just another quick info: we also use sidekiq-cron since this issue is happening every 10 minutes, would the sidekiq-cron gem play any role here somehow? Please let us know if you can think of anything to solve this issue. Thanks! cc: @w32-blaster

@andrcuns
Copy link
Collaborator

I am also using it with sidekiq 7.1 together with sidekiq-cron and it seems to work fine.

One problem that can happen is if your cron jobs spawn enough jobs that take all available threads, alive job will not get picked up and app will start reporting as unhealthy. I encountered such issue a while back in my project here: https://gitlab.com/dependabot-gitlab/dependabot/-/issues/293#note_1408148243

This is not really solvable in sidekiq < 7 but capsules in the new version should be able to address that. Potential fix: #96

@w32-blaster
Copy link
Author

That's not the case. Our worker threads are sitting there and doing nothing. Then after 10 minutes they got removed because the health check.

@andrcuns
Copy link
Collaborator

You should try and monitor the sidekiq scheduled jobs page and see if it consistently has alive job scheduled:
image

Given the default ttl value is 10 minutes, it basically means that sidekiq-alive is not working at all. You can change the option config.time_to_live = 10 * 60 in the config to validate that.

@rakibulislam
Copy link

You should try and monitor the sidekiq scheduled jobs page and see if it consistently has alive job scheduled: image

Given the default ttl value is 10 minutes, it basically means that sidekiq-alive is not working at all. You can change the option config.time_to_live = 10 * 60 in the config to validate that.

@andrcuns You are right. I just checked and verified this: sidekiq-alive queue is not scheduled in staging (where we deployed sidekiq 7), but it's present in other environments where we still have sidekiq 6.5.6.

Now, could you please help us a bit debug where is the actuall issue? Here are list of gems and their versions we are using along with sidekiq 7.1.2: sidekiq-cron-1.10.1, sidekiq-failures-1.0.4, sidekiq-limit_fetch-4.4.1, sidekiq-unique-jobs-8.0.3, sidekiq_alive-2.2.3. We are also using redis-5.0.7 and hiredis-client-0.16.0. Are you using hiredis-client as well?

@andrcuns Any clue on how to debug this issue would be highly appreciated.

@andrcuns
Copy link
Collaborator

andrcuns commented Sep 1, 2023

I am not using hiredis so can't really comment on that but it's possible it is somehow at fault here and is maybe causing issues for native redis commands to be executed.

You can start with checking the sidekiq startup logs and see sidekiq-alive is even started correctly. You should have something like this:

│ [2023-09-01 08:08:17 +0000] INFO  -- Starting sidekiq-alive: {:hostname=>"dependabot-gitlab-worker-fc46db7c4-kpdwp", :port=>"7433", :ttl=>60, :queue=>"healthcheck-dependabot-gitlab-worker-fc46db7c4-kpdwp", :reg │
│ ister_set=>"sidekiq-alive-hostnames", :liveness_key=>"SIDEKIQ::LIVENESS_PROBE_TIMESTAMP::dependabot-gitlab-worker-fc46db7c4-kpdwp", :register_key=>"SIDEKIQ_REGISTERED_INSTANCE::dependabot-gitlab-worker-fc46db7c │
│ 4-kpdwp"}                                                                                                                                                                                                          │
│ [2023-09-01 08:08:17 +0000] INFO  -- Successfully started sidekiq-alive, registered with key: SIDEKIQ_REGISTERED_INSTANCE::dependabot-gitlab-worker-fc46db7c4-kpdwp on set sidekiq-alive-hostnames                 │
│ [2023-09-01 08:08:18 +0000] INFO  -- WEBrick 1.8.1                                                                                                                                                                 │
│ [2023-09-01 08:08:18 +0000] INFO  -- ruby 3.1.4 (2023-03-30) [x86_64-linux]                                                                                                                                        │
│ [2023-09-01 08:08:18 +0000] INFO  -- WEBrick::HTTPServer#start: pid=7 port=7433                                                                                                                                    │

@rakibulislam
Copy link

I can see the sidekiq-alive is starting successfully:

{"ts":"2023-09-04T14:02:25.641Z","pid":1,"tid":"3qzh","lvl":"INFO","msg":"Starting sidekiq-alive: {:hostname=>\"website-sidekiq-564f44bf99-k29cc\", :port=>7433, :ttl=>600, :queue=>\"sidekiq-alive-website-sidekiq-564f44bf99-k29cc\", :register_set=>\"sidekiq-alive-hostnames\", :liveness_key=>\"SIDEKIQ::LIVENESS_PROBE_TIMESTAMP::website-sidekiq-564f44bf99-k29cc\", :register_key=>\"SIDEKIQ_REGISTERED_INSTANCE::website-sidekiq-564f44bf99-k29cc\"}"}
169
{"ts":"2023-09-04T14:02:25.648Z","pid":1,"tid":"3qzh","lvl":"INFO","msg":"Successfully started sidekiq-alive, registered with key: SIDEKIQ_REGISTERED_INSTANCE::website-sidekiq-564f44bf99-k29cc on set sidekiq-alive-hostnames"}

I will try removing hiredis and deploy, and see if that fixes the issue. Will reply back once I test it out. Thanks so much for your help thius far @andrcuns :-)

@andrcuns
Copy link
Collaborator

andrcuns commented Sep 4, 2023

If it starts successfully, the issue is then either with the job not being scheduled which is supposed to update alive key, the job being scheduled but being unsuccessful at updating the key. I believe it's one of these two.

The fact that it starts failing only after 10 minutes + the log shows that it creates liveness key successfully.

@rakibulislam
Copy link

@andrcuns So, I tried removing hiredis client but still the issue persists. I can see the sidekiq alive queue here: https://app.com.com/sidekiq/queues but it does not appear in the scheduled tab: https://app.com/sidekiq/scheduled that means the job is not being scheduled. Do you have any idea where I can look deep into in order to see what's the reason the job is not being scheduled? Thank you once again!

@andrcuns
Copy link
Collaborator

andrcuns commented Sep 7, 2023

@rakibulislam Not really sure. I imagine you can try enabling debug logging for sidekiq and see if that reveals anything.

We could probably use some additional debug log messages in this repo as well, like actions being performed during startup and inside worker.

If it's not getting scheduled, I'm guessing it means either this line did not schedule the initial job or it did but this action did not trigger the next job

@rakibulislam
Copy link

rakibulislam commented Sep 13, 2023

@andrcuns I was debugging this locally and found out that, this line is executed on sidekiq startup. But, this line is not executed automatically. And, I think that's the main issue we need to figure out. But, when I run this command directly from the rails console: SidekiqAlive::Worker.perform_async(hostname) then, it runs but gets this error ConnectionPool::TimeoutError:

{"ts":"2023-09-13T04:33:33.682Z","pid":45635,"tid":"3f5v","lvl":"INFO","msg":"start","ctx":{"class":"SidekiqAlive::Worker","jid":"6e47fd68626fdb61325b0445"}}
SidekiqAlive::Worker.perform begins: 1
SidekiqAlive::Worker.perform begins: 2
****************************************************************************************************
store_alive_key!!! current_lifeness_key: SIDEKIQ::LIVENESS_PROBE_TIMESTAMP::HOSTNAME_NOT_SET
****************************************************************************************************
{"ts":"2023-09-13T04:33:34.791Z","pid":45635,"tid":"3f5v","lvl":"INFO","msg":"fail","ctx":{"class":"SidekiqAlive::Worker","jid":"6e47fd68626fdb61325b0445","elapsed":1.109}}
{"ts":"2023-09-13T04:33:34.792Z","pid":45635,"tid":"3f5v","lvl":"WARN","msg":"{\"context\":\"Job raised exception\",\"job\":{\"retry\":false,\"queue\":\"default\",\"args\":[\"HOSTNAME_NOT_SET\"],\"class\":\"SidekiqAlive::Worker\",\"jid\":\"6e47fd68626fdb61325b0445\",\"created_at\":1694579613.643147,\"enqueued_at\":1694579613.646084,\"error_message\":\"Waited 1 sec, 0/2 available\",\"error_class\":\"ConnectionPool::TimeoutError\",\"processor\":\"Ks-MacBook-Pro.local:45635\",\"failed_at\":1694579614.7854972},\"_config\":{\"labels\":\"#<Set: {}>\",\"require\":\".\",\"environment\":\"development\",\"concurrency\":5,\"timeout\":25,\"poll_interval_average\":null,\"average_scheduled_poll_interval\":5,\"on_complex_arguments\":\"raise\",\"error_handlers\":[\"#<Proc:0x0000000105925350 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq-7.1.2/lib/sidekiq/config.rb:37 (lambda)>\",\"#<Raven::SidekiqErrorHandler:0x0000000108595d18>\"],\"death_handlers\":[\"#<Proc:0x00000001096d8170 /Users/k.m.rakibulislam/Development/Yoshi/website/config/initializers/sidekiq.rb:9 (lambda)>\",\"#<Proc:0x000000010a6c92c8 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq-unique-jobs-8.0.3/lib/sidekiq_unique_jobs/server.rb:8 (lambda)>\"],\"lifecycle_events\":{\"startup\":[],\"quiet\":[\"#<Proc:0x000000010a34a070 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq_alive-2.3.0/lib/sidekiq_alive.rb:45>\"],\"shutdown\":[\"#<Proc:0x000000010a34a048 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq_alive-2.3.0/lib/sidekiq_alive.rb:50>\",\"#<Proc:0x00000001096d7b08 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq-unique-jobs-8.0.3/lib/sidekiq_unique_jobs/server.rb:21>\"],\"heartbeat\":[],\"beat\":[\"#<Proc:0x000000010546b328 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq-7.1.2/lib/sidekiq/metrics/tracking.rb:133>\"]},\"dead_max_jobs\":10000,\"dead_timeout_in_seconds\":15552000,\"reloader\":\"#<Sidekiq::Rails::Reloader @app=Website::Application>\",\"backtrace_cleaner\":\"#<Proc:0x000000010a5693d8 /Users/k.m.rakibulislam/.asdf/installs/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/sidekiq-7.1.2/lib/sidekiq/rails.rb:59 (lambda)>\",\"pidfile\":\"tmp/pids/sidekiq.pid\",\"beta\":{\"concurrency\":5},\"staging\":{\"concurrency\":5},\"production\":{\"concurrency\":20},\"queues\":[\"charging\",\"high_pri\",\"order_positioning\",\"delivery_positioning\",\"membership_renew\",\"messaging\",\"mailers\",\"default\",\"low_pri\",\"set_first_effective_eta\",\"elasticsearch\"],\"limits\":{\"delivery_positioning\":1},\"logfile\":\"log/sidekiq.log\",\"config_file\":\"./config/sidekiq.yml\",\"tag\":\"website\",\"identity\":\"Ks-MacBook-Pro.local:45635:cd297817bf8b\",\"cron_poll_interval\":30,\"fetch_class\":\"Sidekiq::LimitFetch\"}}"}
{"ts":"2023-09-13T04:33:34.793Z","pid":45635,"tid":"3f5v","lvl":"WARN","msg":"ConnectionPool::TimeoutError: Waited 1 sec, 0/2 available"}

Do you see why this might be happening? Thanks for your time and help!

@rakibulislam
Copy link

@andrcuns Looks like its enqueuing the SidekiqAlive::Worker job but it's not being processed so nothing is scheduled. But, if I run the same command: SidekiqAlive::Worker.perform_async(hostname) from the rails console, it does enqueue and then gets processed and hence it shows up in the scheduled tab in the sidekiq UI. Any thoughts on what could be happening here?

@Javierchik
Copy link

Hey Team!
Can't find the alive key message is also showing to me in an Azure web app (service) that runs Sidekiq 6.5.9
I was able to reproduce the exact same behavior, unhealthy after 10 minutes, in my local.
It looks to me that somehow Sidekiq does not get the reference to the new queue that is created after 10min (default is 10 min)

Screenshot 2023-09-15 at 8 09 10 AM

Appreciate any help!

@rakibulislam
Copy link

rakibulislam commented Sep 16, 2023

Okay, I fixed this issue, finally! For me everything was working fine with sidekiq 6.5.6 but the sidekiq-alive stopped processing and scheduling jobs after we upgraded to sidekiq 7.1. Finally, the solution was to add the sidekiq-alive queue in the sidekiq.yml file and I defined the queue name this way: <%= "sidekiq-alive-#{ENV["HOSTNAME"] || 'HOSTNAME_NOT_SET'}" %>. After this, sidekiq-alive started working as expected with sidekiq 7.1!

@andrcuns
Copy link
Collaborator

That's not really a fix though, queue should be created automatically. In the latest version with sidekiq 7+ it uses a separate capsule as well.

It would be nice to understand why the queue was not there.

@rakibulislam
Copy link

rakibulislam commented Sep 16, 2023

@andrcuns I agree with you. It would be great to understand why it is not creating the sidekiq-alive queue automatically with sidekiq 7.1. Probably creating a new capsule might help? I will explore more in the future. But, the fix works for us for now.

@w32-blaster
Copy link
Author

This really sound like a workaround to me, but @rakibulislam did a great work to figure this out.
Wondering if a good approach and next step would be to add debug level logs to sidekiq-alive, so the actual root cause can be found or at least narrowed down.

@Javierchik
Copy link

Workaround from @rakibulislam also worked for me and now Sidekiq Azure web container has been Alive! after the 10 minutes threshold. This queue is now in scheduled tab and it refreshes every 5 min.

Thank you.

@andrcuns
Copy link
Collaborator

Yeah, some additional debug level messages wouldn't hurt. I'll see if I can add a pr sometime this week

@arturictus
Copy link
Owner

arturictus commented Sep 28, 2023

Hi there,
Trying to figure out here what could be the cause.

  1. queues not being added automatically:
    In @Javierchik picture looks like the queues are set: Worker goes unhealthy after 10 minutes on sidekiq 7 #94 (comment)
    But there are still doubts about that.
    QUESTION: could please confirm if the queues are listed in web or in the instance?
    HELP: could you queue a job in that queue and test if that queue executes the jobs?, even better if is not the SidekiqAlive::Worker

  2. Hostname not set:
    In case you are running multiple instances they will have the same redis key which will cause the opposite effect, return alive with blocked or inactive instances.

This library is intended to run with the HOSTNAME present and without it will produce false positives

  1. Worker not being scheduled:
    Looks improbable since these logs show success and the queuing is done before success: Worker goes unhealthy after 10 minutes on sidekiq 7 #94 (comment)

  2. Error in job connection pool:

  • We should retry the job in case this happens -> 🗒️
  1. Suspicion :
    Maybe sidekiq 7 loads/copies the queues in memory before we run on(:startup) effectively not running our queue but being visible on the web
    Best solution I can think of right now to that is the sidekiq.yml option that @rakibulislam proposed here. But is a workaround more than a fix.

  2. TODO:

  • Review the capsule functionality just in case we missed something in the implementation.

QUESTION: are you running redis in cluster? or single instance?, replicas?

@w32-blaster
Copy link
Author

@arturictus we are running redis in Elasticache as a single instance. Not in cluster mode. There are no replicas currently

@breiko83
Copy link

breiko83 commented Oct 3, 2023

Facing the same issue here. The 10 mins mark is actually based on the default TTL for Redis key in the config file:

  # ==> Time to live
  # Time for the key to be kept by Redis.
  # Here is where you can set de periodicity that the Sidekiq has to probe it is working
  # Time unit: seconds
  # default: 10 * 60 # 10 minutes

I reduced this to 60 and the worker goes unhealthy after 1 minute (Can't find the alive key) - even though the worker is fine.

+++
I had a look at Redis and the key SIDEKIQ::LIVENESS_PROBE_TIMESTAMP::MY-HOST is set correctly once Sidekiq starts but it is never updated. So after 10 minutes (or whatever TTL is set) the key gets removed (expired) and webrick returns "Can't find the alive key"

@amkisko
Copy link

amkisko commented Oct 6, 2023

Confirming the issue, exactly after 10 minutes worker gets unhealthy status, so basically sidekiq_alive returns "not alive", after that sidekiq starts failing with ConnectionPool::TimeoutError: Waited 1 sec, 0/2 available which seems to be relevant to missing workers.

ENV["HOSTNAME"] is present and available to both sidekiq and sidekiq_alive.

You should try and monitor the sidekiq scheduled jobs page and see if it consistently has alive job scheduled.

sidekiq_alive job was not present in scheduled jobs. During another run job is present. 🤔 I don't see any pattern yet.


What I just saw: the job was still scheduled, but sidekiq worker was already terminated by provisioner.
One option is to increase failureThreshold within kubernetes pod setup if you use kube, so sidekiq_alive job will have a bit more time.

Another option is to get rid of livenessProbe completely temporary.

UPD: So, tuning kubernetes pod settings helped. sidekiq_alive queue is present, job is enqueued, status updated without issues.


UPD 2: 2023-10-06T07:35:07.588Z pid=16 tid=1y2w ERROR: Can't find the alive key - that's the source of issue.


UPD 3: I checked manually in Rails console this:

SidekiqAlive.hostname
SidekiqAlive.alive?
SidekiqAlive::Worker.perform_sync

After running it manually pod became alive and basically it means that SidekiqAlive::Worker had never run before I called it manually. What makes job not run automatically at start?

Now I also see two jobs enqueued for SidekiqAlive::Worker. After waiting 10 minutes, there are still two jobs in the queue and webrick returns "Can't find the alive key". Something happens during processing of the job, waiting longer does not affect the situation at all, manually adding new job worker for sure resolves it.

Also spotted that hostname parameter is not used in the worker's perform method, so be aware about this one if you are trying to call worker on the other instance, it will take instance's hostname.


UPD 5: Worker job gets properly scheduled when manually called, so it goes under "Scheduled jobs" and waiting time is 5 minutes (600/2 as it follows the code). At some point job gets stuck under "Enqueued jobs" and never gets back to "Scheduled jobs".

2023-10-07T11:23:41.208Z pid=1 tid=1ydx class=SidekiqAlive::Worker jid=c517b1a855d55e53bac31816 INFO: start
2023-10-07T11:23:42.213Z pid=1 tid=1ydx class=SidekiqAlive::Worker jid=c517b1a855d55e53bac31816 elapsed=1.005 INFO: fail
2023-10-07T11:23:42.214Z pid=1 tid=1ydx WARN: {"context":"Job raised exception","job":{"retry":false,"queue":"default","class":"SidekiqAlive::Worker","args":["app-staging-sidekiq-abcdef"],"jid":"c517b1a855d55e53bac31816","created_at":1696677518.307089,"enqueued_at":1696677821.2067275}}
2023-10-07T11:23:42.214Z pid=1 tid=1ydx WARN: ConnectionPool::TimeoutError: Waited 1 sec, 0/2 available

UPD 6: Most probably will just go with simple implementation of web server like this:

# PATH: config/initializers/sidekiq_alive.rb

class SidekiqAliveServer
  def run!
    handler = Rack::Handler.get("webrick")
    Signal.trap("TERM") { handler.shutdown }
    handler.run(self, Port: 7433, Host: "0.0.0.0", AccessLog: [], Logger: Rails.logger)
  end
  def call(_env)
    workers = Sidekiq::Workers.new
    process_set = Sidekiq::ProcessSet.new
    process_set_size = process_set.size
    queues = Sidekiq::Queue.all
    queues_size = queues.size
    queues_avg_latency = queues.sum(&:latency) / queues_size if queues_size.positive?
    response = {
      workers_size: workers.size,
      process_set_size: process_set_size,
      queues_size: queues_size,
      queues_avg_latency: queues_avg_latency
    }
    is_alive = process_set_size.positive? && queues_size.positive? && queues_avg_latency < 30 * 60
    response_code = is_alive ? 200 : 500
    [response_code, { "Content-Type" => "application/json" }, [response.to_json]]
  end
end

Sidekiq.configure_server do |config|
  config.on(:startup) do
    @server_pid = fork { SidekiqAliveServer.new.run! }
  end

  config.on(:quiet) do
  end

  config.on(:shutdown) do
    Process.kill("TERM", @server_pid) unless @server_pid.nil?
    Process.wait(@server_pid) unless @server_pid.nil?
  end
end

Latest version is here: https://gist.github.com/amkisko/95662a67da9c7344ee538786ed3e9d6e

@jasondoc3
Copy link

I'm not sure if this will help anyone else, but I was also experiencing this problem with sidekiq 6.5.7. After some deep debugging, I discovered that the issue for me was actually the sidekiq-throttled gem.

On the throttled README, it states:

Load order can be an issue if you are using other Sidekiq plugins and/or middleware. To prevent any problems, add the .setup! call to the bottom of your init file.

Because sidekiq_alive uses an on(:startup) lifecycle event to modify the configuration, I modified my sidekiq initializer to setup the throttled gem in a separate on(:startup) lifecycle event to correct the load order:

# config/initializers/sidekiq.rb
require "sidekiq/throttled"
Sidekiq.configure_server do |config|
  config.on(:startup) { Sidekiq::Throttled.setup! }
end

@mleiseca
Copy link

mleiseca commented Feb 8, 2024

@amkisko I was having the same issue as your comment here

  1. SidekiqAlive.start executes and calls SidekiqAlive::Worker.perform_async
  2. When SidekiqAlive::Worker.perform executes, it calls store_alive_key. This fails with
ConnectionPool::TimeoutError: Waited 1 sec, 0/2 available

Fix:

In Sidekiq.configure_server add a section that creates a capsule for SidekiqAlive

Sidekiq.configure_server do |config|

  config.capsule(SidekiqAlive::CAPSULE_NAME) do |cap|
    cap.concurrency = 3
    hostname = ENV["HOSTNAME"] || "HOSTNAME_NOT_SET"
    cap.queues = ["sidekiq-alive-#{hostname}"]
  end
end

I don't really know anything about Sidekiq internals, but it seems like it's creating a Redis connection pool that's too small - somehow there are only two connections, but the pool is exhausted when this method is called:

    def store_alive_key
      redis.set(current_lifeness_key, time: Time.now.to_i, ex: config.time_to_live.to_i)
    end

setting cap.concurrency = 3 makes a slightly larger connection pool just for sidekiq alive

@stefanoc
Copy link

@jasondoc3 we were having the exact same problem with Sidekq 6.5.11 and luckily I found your comment. Thanks! :-)

@smoke
Copy link

smoke commented Aug 24, 2024

Update: I am not sure if what I described below is relevant, I have opted to implement a custom approach, however I was using sidekiq-scheduler to schedule health check job each minute. I just used dedicated queue for each worker with same job name, however this have caused that the last worker overrides schedules of others, so no way a health check job is scheduled on each worker.
Instead I have opted to do recursive perform_in and remove the schedule.
Sorry for misleading.

I believe this is caused because of race condition when Sidekiq is starting up, it sets up queues in Redis
When I start more than 1 new Workers all at once - the queues is not having each and every new worker that started.

For example I started 3 workers all at once, and only 1 have managed to register its queue:
360984535-77618eb6-c6ec-4e18-be07-7e6694a5a308

However if I start workers one by one, all is good.
360983761-bd4a9f35-1361-49d9-92bd-a27fef66d72b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests