Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources stuck on DELETING #19

Open
kvaps opened this issue Oct 2, 2018 · 24 comments
Open

Resources stuck on DELETING #19

kvaps opened this issue Oct 2, 2018 · 24 comments

Comments

@kvaps
Copy link

kvaps commented Oct 2, 2018

I have some problem with removing my resources.
I presume it may be same problem connected with long lvm output #8
Any way I can't delete my resources even after correcting global_filter for my lvm.conf

satellite's log:

23:49:44.854 [MainWorkerPool-7] ERROR LINSTOR/Satellite - Problem of type 'java.lang.NullPointerException' logged to report number 5BAFAC32-CDA0B-000000

23:49:44.863 [MainWorkerPool-7] ERROR LINSTOR/Satellite - Access to deleted resource [Report number 5BAFAC32-CDA0B-000001]

ErrorReport-5BAFAC32-CDA0B-000000.log
ErrorReport-5BAFAC32-CDA0B-000001.log

This problem occurs quite often for my installation.

@kvaps
Copy link
Author

kvaps commented Oct 2, 2018

After restarting it is still stuck, but now another error, lvm timed out:

The most interesting, that data/tgt108_00000 already not existing in my system

ErrorReport-5BB322C9-CDA0B-000000.log
ErrorReport-5BB322C9-CDA0B-000001.log

@ghernadi
Copy link
Contributor

ghernadi commented Oct 2, 2018

I will look into this, but can you please re-upload ErrorReport-5BAFAC32-CDA0B-000000.log, as it has the same content as ErrorReport-5BAFAC32-CDA0B-000001.log

@kvaps
Copy link
Author

kvaps commented Oct 2, 2018

@ghernadi, sure, there is logs from another node, with the same problem:

ErrorReport-5BAF8C77-12FF9-000000.log
ErrorReport-5BAF8C77-12FF9-000001.log

@kvaps kvaps changed the title Resources suck on DELETING Resources stuck on DELETING Oct 2, 2018
@ghernadi
Copy link
Contributor

ghernadi commented Oct 2, 2018

Alright, the NullPointerException is already fixed, included in the next release.

The "Access to deleted resource" is most likely a consequence of the previous NullPointerException. However, the NullPointerException triggered this behaviour, but there is definitely a nasty bug in the TransactionMgr which we have to fix. In other words: Nice catch, thank you for finding this :)

Regarding the timeout of lvremove ... command - our default timeouts are 45 seconds as far as I remember. The other issue you referenced was that the satellite disconnects when any external command takes longer than 5 seconds (which is allowed, even by linstor's assumptions). This 5-second-timeout-disconnect bug is fixed.
However, I'm not sure what we can do about the 45 second lvremove ... issue, other than increase linstor's timeout which is not really a preferable choice...

@kvaps
Copy link
Author

kvaps commented Oct 2, 2018

This 5-second-timeout-disconnect bug is fixed.

In the current release?
I've tried to repeeat lvremove -f data/tgt108_00000 manually, it was executing little more than 5 second.

Nice catch, thank you for finding this

I like linstor and drbd, and I'm glad to help you, thanks for your work!

@ghernadi
Copy link
Contributor

ghernadi commented Oct 2, 2018

In the current release?

No, sorry, it is fixed internally and will be in the next release...

@kvaps
Copy link
Author

kvaps commented May 18, 2020

Hi, we just faced with the same situation (on 1.7.1):

# linstor r l -r pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 -a
+-------------------------------------------------------------------------------------------------+
| ResourceName                             | Node  | Port | Usage  | Conns             |    State |
|=================================================================================================|
| pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | m1c12 | 7426 | Unused | Ok                | UpToDate |
| pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | m1c6  | 7426 | Unused | Connecting(m1c10) | UpToDate |
| pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | m1c12 | 7350 | Unused | Ok                | UpToDate |
| pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | m1c6  | 7350 | Unused | Connecting(m1c5)  | UpToDate |
| pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | m1c12 | 7365 | InUse  | Ok                | UpToDate |
| pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | m1c4  | 7365 | Unused | Ok                | UpToDate |
+-------------------------------------------------------------------------------------------------+
# linstor r d m1c5 pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b
SUCCESS:
Description:
    Node: m1c5, Resource: pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b marked for deletion.
Details:
    Node: m1c5, Resource: pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b UUID is: 48e29d6f-3f80-4615-97a6-2a81b1af0b64
SUCCESS:
    Deleted 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' on 'm1c5'
ERROR:
Description:
    (Node: 'm1c6') Creation of the DRBD configuration file for resource 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' failed due to an I/O error
Cause:
    Creation of the DRBD configuration file failed due to an I/O error
Correction:
    - Check whether enough free space is available for the creation of the file
    - Check whether the application has write access to the target directory
    - Check whether the storage is operating flawlessly
Details:
    The error reported by the runtime environment or operating system is:
    No space left on device
Show reports:
    linstor error-reports show 5EBE5356-85D84-000080
SUCCESS:
    Notified 'm1c12' that 'pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b' is being deleted on Node(s): [m1c5]
command terminated with exit code 10
# linstor error-reports show 5EBE5356-85D84-000080
None
# linstor v l -r pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 -a
+-------------------------------------------------------------------------------------------------------------------------------------------+
| Node  | Resource                                 | StoragePool          | VolNr | MinorNr | DeviceName    | Allocated | InUse  |    State |
|===========================================================================================================================================|
| m1c10 | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | DfltDisklessStorPool |     0 |    1446 | /dev/drbd1446 |           |        |  Unknown |
| m1c12 | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | thindata             |     0 |    1446 | /dev/drbd1446 |  1.41 GiB | Unused | UpToDate |
| m1c6  | pvc-7fbdf685-39b0-43b9-80f3-a2bbe0e144a4 | thindata             |     0 |    1446 | /dev/drbd1446 |  1.41 GiB | Unused | UpToDate |
| m1c12 | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | thindata             |     0 |    1350 | /dev/drbd1350 |  1.40 GiB | Unused | UpToDate |
| m1c5  | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | DfltDisklessStorPool |     0 |    1350 | /dev/drbd1350 |           |        |  Unknown |
| m1c6  | pvc-9ed4b88e-3ba3-400c-bf9b-4750a3384b6b | thindata             |     0 |    1350 | /dev/drbd1350 |  1.40 GiB | Unused | UpToDate |
| m1c12 | pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | thindata             |     0 |    1365 | /dev/drbd1365 |  3.14 GiB | InUse  | UpToDate |
| m1c4  | pvc-f8aa2cba-8784-4a21-b8c1-fc192272d8d9 | thindata             |     0 |    1365 | /dev/drbd1365 |  3.15 GiB | Unused | UpToDate |
+-------------------------------------------------------------------------------------------------------------------------------------------+

The error shows No space left on device but all nodes have enough. ErrorReport is not generated for some reson.

@kvaps
Copy link
Author

kvaps commented May 18, 2020

my bad No space left on device was meaning rootfs but not a storagepool itself, it was overfilled on m1c6, but anyway I guessing you'd like to handle this error more correct in the future.

@ghernadi
Copy link
Contributor

Hehe,
I guess linstor was also not able to generate the ErrorReport due to No space left on device? :)

But yes, I will think about what we could do in such a situation :)

@raltnoeder
Copy link
Member

It used to write to stderr if writing the error report fails due to an IOException, so if that still works, it might be in a journal entry or syslog entry. Provided that those were not out of storage space as well...

@kvaps
Copy link
Author

kvaps commented Jan 22, 2021

Hi, I just faced with the similar problem on another installation,

╭───────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node     ┊ Port ┊ Usage ┊ Conns ┊    State ┊ CreatedOn ┊
╞═══════════════════════════════════════════════════════════════════════╡
┊ test2        ┊ proxmox1 ┊ 7077 ┊       ┊ Ok    ┊ DELETING ┊           ┊
╰───────────────────────────────────────────────────────────────────────╯

this command just stuck:

linstor r d proxmox1 test2

tace log:

14:03:25.527 [grizzly-http-server-5] DEBUG LINSTOR/Controller - SYSTEM - REST access api 'DelRsc' from 'RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)')'
14:03:25.527 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Delete resource' start
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Updating RESOURCES's flags from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2)
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - RESOURCES's flags updated from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2)
14:03:25.528 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Updating VOLUMES's flags from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2 VolNum=0)
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - VOLUMES's flags updated from [DELETE] to [DELETE] (NodeName=proxmox1 ResName=test2 VolNum=0)
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Delete resource' end
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Update for resource deletion' start
14:03:25.529 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer RestClient(10.28.28.76; 'PythonLinstor/1.4.2 (API1.0.4)'), Background operation 'DelRsc' scope 'Update for resource deletion' end
14:03:25.530 [grizzly-http-server-5] TRACE LINSTOR/Controller - SYSTEM - Peer Node: 'proxmox1', API call 87 'ChangedRsc' send
14:03:25.531 [MainWorkerPool-7] TRACE LINSTOR/Controller - SYSTEM - Peer Node: 'proxmox1', API call 87 answer received (seq 338)

@kvaps
Copy link
Author

kvaps commented Jan 22, 2021

UPD: After the manual removal via lvremove. I just got an interesting error in linstor, with no error report:

# linstor v l -r test2
╭──────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node     ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ proxmox1 ┊ test2    ┊ drbdpool    ┊     0 ┊    1074 ┊ None       ┊           ┊       ┊ Error ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
ERROR:
Description:
    Node: 'proxmox1', resource: 'test2', volume: 0 - Device provider threw a storage exception
Cause:
    The volume could not be found on the system.

@kvaps
Copy link
Author

kvaps commented Jan 22, 2021

Creation of new resource is also stuck:

linstor rd c test4
linstor vd create test4 1G
linstor r c proxmox1 test4  -s drbdpool

controller trace log:

satellite trace log:

@kvaps
Copy link
Author

kvaps commented Jan 22, 2021

Probably related stack trace:

Jan 22 15:18:04 proxmox1 Satellite[21162]: 15:18:04.607 [MainWorkerPool-11] INFO  LINSTOR/Satellite - SYSTEM - Resource 'test5' created for node 'proxmox1'.
Jan 22 15:18:05 proxmox1 Satellite[21162]: Exception in thread "DeviceManager" java.lang.NoSuchMethodError: java.nio.ByteBuffer.limit(I)Ljava/nio/ByteBuffer;
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer.<clinit>(MdSuperblockBuffer.java:40)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.WipeHandler.quickWipe(WipeHandler.java:47)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.AbsStorageProvider.createVolumes(AbsStorageProvider.java:426)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.AbsStorageProvider.process(AbsStorageProvider.java:299)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.storage.StorageLayer.process(StorageLayer.java:281)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.process(DeviceHandlerImpl.java:789)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.processChild(DrbdLayer.java:452)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.adjustDrbd(DrbdLayer.java:568)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.layer.drbd.DrbdLayer.process(DrbdLayer.java:387)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.process(DeviceHandlerImpl.java:789)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.processResourcesAndSnapshots(DeviceHandlerImpl.java:335)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceHandlerImpl.dispatchResources(DeviceHandlerImpl.java:145)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.dispatchResources(DeviceManagerImpl.java:268)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.phaseDispatchDeviceHandlers(DeviceManagerImpl.java:924)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.devMgrLoop(DeviceManagerImpl.java:627)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at com.linbit.linstor.core.devmgr.DeviceManagerImpl.run(DeviceManagerImpl.java:545)
Jan 22 15:18:05 proxmox1 Satellite[21162]:         at java.lang.Thread.run(Thread.java:748)

@kvaps
Copy link
Author

kvaps commented Jan 22, 2021

fixed by swaping java 1.8 to 1.11

ln -sf /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java /etc/alternatives/java

@ghernadi
Copy link
Contributor

ghernadi commented Jan 25, 2021

Hi!

The trace logs from #19 (comment) does not really help as that shows only that the controller set the DELETE flags on rsc and vlm, but not why the satellite does not respond to the update..

Regarding "no error report" from #19 (comment), even if Linstor does not tell you "here is the error report", you might still want to check linstor err list, as that will query the satellite for new error reports.
If an error report is generated, it should usually be included in the client answer, so we might want to look into this.

#19 (comment) again - ErrorReports would be helpful here.
Also the satellite-trace.log shows that you have quite a lot of *-deleted LVs. I guess those are from previous tests that somehow failed or got stuck as you described before. You might want to clean them up manually.

Regarding Java version 8 -> 11 from the last 2 comments.. Looks like there are some incompatibilities if a Java class was compiled with 9+ and being run with 8... Apparently Linstor was not compiled with Java 8 (or at least with 8 as target) before packaging. we will look into this, thanks for the hint!

@rp-
Copy link
Contributor

rp- commented Jan 25, 2021

How do you get your packages? self build?

@kvaps
Copy link
Author

kvaps commented Jan 25, 2021

@rp- this was proxmox-5 cluster upgraded to proxmox-6, linstor was installed directly from linbit's repository:

# cat /etc/apt/sources.list.d/linbit.list 
deb http://packages.linbit.com/proxmox/ proxmox-6 drbd-9.0

Unfortunately, I've got an access to it already after problem has occurred.
The system update and update of linstor components - this is the first thing which I've done there.

@kvaps
Copy link
Author

kvaps commented Jan 25, 2021

@ghernadi unfortunately I have not found any additional error reports :-(

@kvaps
Copy link
Author

kvaps commented Jan 9, 2023

Just faced this again:

# linstor r l -r pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node       ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-01 ┊ 7008 ┊        ┊ Ok    ┊ DELETING ┊ 2023-01-09 08:49:06 ┊
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-02 ┊ 7008 ┊ Unused ┊       ┊  Unknown ┊                     ┊
┊ pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d ┊ hf-virt-03 ┊ 7008 ┊ Unused ┊       ┊  Unknown ┊                     ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
# linstor rd d pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
SUCCESS:
Description:
    Resource definition 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' marked for deletion.
Details:
    Resource definition 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' UUID is: 51dc6292-36c8-4eb9-91b0-495f507b8ea6
SUCCESS:
    Notified 'hf-virt-01' that diskless resources of 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d' are being deleted
ERROR:
    (Node: 'hf-virt-02') An unknown exception occurred while processing the resource pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
Show reports:
    linstor error-reports show 63BA4220-57D8E-000054
ERROR:
    (Node: 'hf-virt-03') An unknown exception occurred while processing the resource pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d
Show reports:
    linstor error-reports show 63BA421E-94200-000063
command terminated with exit code 10
ERROR REPORT 63BA4220-57D8E-000054

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2023-01-09 08:53:29
Node:                               hf-virt-02

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'start', Source file 'ProcessBuilder.java', Line #1090


Error context:
    An error occurred while processing resource 'Node: 'hf-virt-02', Rsc: 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d''

Call backtrace:

    Method                                   Native Class:Line number
    start                                    N      java.lang.ProcessBuilder:1090
    start                                    N      java.lang.ProcessBuilder:1071
    exec                                     N      com.linbit.extproc.ExtCmd:128
    exec                                     N      com.linbit.extproc.ExtCmd:89
    getCurrentGID                            N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:318
    allVlmsMetaDataNew                       N      com.linbit.linstor.layer.drbd.DrbdLayer:1738
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1642
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:814
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:396
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.
ERROR REPORT 63BA421E-94200-000063

============================================================

Application:                        LINBIT�� LINSTOR
Module:                             Satellite
Version:                            1.20.0
Build ID:                           9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time:                         2022-11-07T16:37:38+00:00
Error time:                         2023-01-09 08:53:29
Node:                               hf-virt-03

============================================================

Reported error:
===============

Category:                           RuntimeException
Class name:                         NullPointerException
Class canonical name:               java.lang.NullPointerException
Generated at:                       Method 'start', Source file 'ProcessBuilder.java', Line #1090


Error context:
    An error occurred while processing resource 'Node: 'hf-virt-03', Rsc: 'pvc-484e23e7-f6a5-4d16-a46e-41620f8c4a2d''

Call backtrace:

    Method                                   Native Class:Line number
    start                                    N      java.lang.ProcessBuilder:1090
    start                                    N      java.lang.ProcessBuilder:1071
    exec                                     N      com.linbit.extproc.ExtCmd:128
    exec                                     N      com.linbit.extproc.ExtCmd:89
    getCurrentGID                            N      com.linbit.linstor.layer.drbd.utils.DrbdAdm:318
    allVlmsMetaDataNew                       N      com.linbit.linstor.layer.drbd.DrbdLayer:1738
    condInitialOrSkipSync                    N      com.linbit.linstor.layer.drbd.DrbdLayer:1642
    adjustDrbd                               N      com.linbit.linstor.layer.drbd.DrbdLayer:814
    process                                  N      com.linbit.linstor.layer.drbd.DrbdLayer:396
    process                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
    processResourcesAndSnapshots             N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
    run                                      N      java.lang.Thread:829


END OF ERROR REPORT.

@rp-
Copy link
Contributor

rp- commented Jan 9, 2023

is the drbdmeta binary accessible on that node?

@kvaps
Copy link
Author

kvaps commented Jan 9, 2023

It is accessible In pod with linstor-satellite

@raltnoeder
Copy link
Member

While it could be triggered by an installation problem, the fact that it is a NullPointerException means that there must be a programming error somewhere on the code path too.
It seems that somewhere in the generation of the command to be executed by ExtCmd, either the entire command arguments array or one of its elements is a null reference, because that's what's being handled in ProcessBuilder around line 1090.

@mhkarimi1383
Copy link

Hi

checkout #432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants