Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selfheal disk devicename change preventing ASDs to start automatically #1899

Open
FastGeert opened this issue Oct 10, 2018 · 0 comments
Open

Comments

@FastGeert
Copy link

When disks die, or get removed, the next time a computer boots, the disks will get a different kernel name (sda, ...)

When this happens ASDs fail to start and trigger the following HEALTH CHECK errors:
image

When this HC is in error, we need to check if this is caused by a disk devicename change:

1. Get a list of backends with ASD's in error:
image

2. Get the ASD guid to restart:
image

3. Restart the ASD
image

To get the node guid for restarting the ASD you need list the nodes (https://ovs-be-g8-4.gig.tech/api/alba/nodes/?sort=ip&contents=node_id%2C_relations&discover=false&timestamp=1539180202498) and get its guid via looking it up using the disk guid

4. Analyze the result of restarting the ASD
The previous call provided a task guid as a response. Using this task guid, poll for the result with the following call: https://ovs-be-g8-4.gig.tech/api/tasks/cd1e6539-6ee2-42df-9a71-28c50836159c/?timestamp=1539182979530

If the response contains UNIQUE constraint failed: disk.name like in the response below, then we should run the healing code in step 5
image

5. Heal the ASDs with the following piece of python

from source.dal.lists.disklist import DiskList
disks = DiskList.get_disks()
for d in disks:
    d.name = '{}_new'.format(d.name)
    d.save()

6. Restart the asd-manager
systemctl restart asd-manager

7. Retrigger the healthcheck
Make sure though that we do not go in an endless loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant