-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File inventory finishes without updating FileInventoryRequest #1331
Comments
Re-running the File Inventory Job again at 2:37 PM ran fine for a while but then we saw some memory alerts in CheckMK for the server where the job was running (tigerdata-prod2) around 3:20 PM. As of 3:25 PM the job is still running. |
It does look like the kernel killed our job:
Notice that file for the job (
|
The Linux kernel is indeed killing our job:
Notice the process id |
I am surprised that we are running out of memory here. We are probably keeping the file inventory in memory before saving it to a file and for 2 million records that does not seem to be a good idea. We should look into saving the file as we go so that we only keep a one or a few batches in memory. |
@kayiwa has bumped up the memory to 32GB in |
There has been a few instances in which a
FileInventoryRequest
job "finishes" but does not update the record in the database with the file that it produced.Here are the details on a job that behaved like this today.
Notice that the
FileInventoryRequest
record in the database was never updated with a file inrequest_details
nor with acompletion_time
The file that was produced was last updated at 18:56 and there is nothing in the log to indicate that there was an error during the process:
Yet, the record in the database was never updated to point to the file and there is no entry in the log that the process got to that point, i.e. the code should have logged Export file generated but it never did.
There are no jobs in any of the Sidekiq queues either so the job did finish (or died) but again, nothing in the logs to track it.
There were a few CheckMK alerts around this time (1:55 PM EST) regarding
tigerdata-prod2
(which is the server where this job ran). All these alerts were short-lived and seem to have recovered...but I wonder if they messed up the job.(see also PR #1330 and issue #1274)
The text was updated successfully, but these errors were encountered: