-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting
Handy to see all of Django's output in the console, with
python3 manage.py runserver
It should usually only take <2 hours for a VCF import and running the annotation pipeline on any new variants. You'll see a hanging import via:
- Sequencing Runs - this page will have a "spinning" logo for ages where the VCF icons usually are...
- Data - import status of "importing"
There are 3 things that could happen here, and you can tell by the VCF "Vcf import stage". To find this out, click the "VCF" tab on the data page, then click on the link to the VCF that is hung with "importing"
Empty - it failed in the Import process. Click the "View upload processing" - a finished one will have a pie chart, but an unfinished one will just have a grid with a few jobs that are not SUCCESS. Click the "retry import" button. Hopefully it works this time.
Annotating Variants -
See the state of Variant Annotation Runs - via menu Annotation -> Variant Annotation Runs link on page
Login to server and run management commands (see below)
# Annotation run completed but job stuck as annotating
python3 manage.py annotation_set_all_complete
# Annotation run not started
python3 manage.py annotate_unannotated_variants
You can run either of these commands multiple times in whatever order as they check the state of things to make sure it's ok.
Annotation run broken
If the celery task is dead but the state isn't errored out, so you can't click "retry upload"
python3.8 manage.py shell
In [1]: from annotation.models import AnnotationRun
In [2]: ar = AnnotationRun.objects.get(pk=2722)
In [3]: ar.error_exception = "blah"
In [4]: ar.save()
calculating sample stats -
This is a really CPU/Databse intensive task, and we run up to 32 of them at a time so sometimes we have Celery jobs crash when the database doesn't allow new connections or something.
Login to server and run management commands (see below)
python3 manage.py calculate_sample_stats
# ssh onto server
sudo su variantgrid
cd /opt/variantgrid # on SAPath server, it's /mnt/variantgrid on VM
python3 manage.py # This will show you all of the commands you can run.
- On VCF page, click Sharing/Permissions tab, then delete
This should delete the project, which will be uploaded again. You can wait for a max of 2 hours for this to happen, or go to the sequencing page, click "manage disk scans" then trigger it manually.
If it doesn't re-load the project, try deleting the SequencingRun (click link, then "Admin" then delete) - this should reload everything.
Go to the Server Status page. (Settings -> Server Status if you're an admin user)
The celery workers should be in green, if they are in red something is wrong and Celery has crashed. In theory the service should restart, but if not try:
sudo bash
~/stop_services.sh
# wait a while
# maybe check ps aux | grep variant - there should be nothing running except the grep command
~/start_services.sh
- If the server gets reset due to power etc, it should come back up with the services running, but if it was down long enough, perhaps the IP address will have changed. If you turn on the monitor and use the keyboard under my desk, login and type ifconfig then tell everyone the new address.
If the services aren't running, see above to start them.
To see a list of running processes in the database:
#!bash
sudo su postgres -c 'psql -d snpdb'
Then run SQL:
#!sql
-- To see what queries are running and their PIDs
SELECT * FROM pg_stat_activity;
-- To kill something
SELECT pg_cancel_backend(PID);
-- To REALLY kill something
SELECT pg_terminate_backend(PID);
to kill everything
SELECT pg_cancel_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE datname = current_database()
AND pid <> pg_backend_pid();
See if you can see any errors here:
Copy the logs and email them to Dave Lawrence ([email protected])
mkdir vg_logs
scp -r [email protected]:/var/log/variantgrid vg_logs
tar cvf vg_logs.tar.gz vg_logs
If you get redis errors with "Read Only Filesystem" - you need to add the redis dir to the SystemD service - /etc/systemd/system/redis.service, eg:
ReadWriteDirectories=-/mnt/redis_database
(error) MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk.
Disable save, then do whatever to clean it (redis-cli flushall
, or celery purge --app variantgrid
) then allow saving again
127.0.0.1:6379> config get save
1) "save"
2) "900 1 300 10 60 10000"
127.0.0.1:6379> config set save ""
OK
127.0.0.1:6379> config set save "900 1 300 10 60 10000"
OK
127.0.0.1:6379> config get save
1) "save"
2) "900 1 300 10 60 10000"
Value: 'int' object has no attribute 'signature'.
as per:
Error: This file failed to import due to: Error: File "/home/dlawrence/localwork/variantgrid/upload/tasks/vcf/import_vcf_step_task.py", line 131, in schedule_pipeline_stage_steps parallel_tasks.append(task_class.si(upload_step.pk, 0)) File "/usr/local/lib/python3.6/dist-packages/celery/app/task.py", line 784, in si return self.signature(args, kwargs, immutable=True) Type: <class 'AttributeError'>, Value: 'int' object has no attribute 'signature'.
This is caused by not registering the Celery Task class, you need to do eg:
ClassificationImportLinkVariantsTask = app.register_task(ClassificationImportLinkVariantsTask())
You need to stop the workers first or you can't purge properly
To clear just 1 queue:
celery -A variantgrid amqp queue.purge seqauto_single_worker
To clear all the queues:
celery --app variantgrid purge
Troubleshooting Variants - dupes, deleting bad inserts etc