My cluster came apart today and I cant fix it.
Alright, so I logged into my Proxmox cluster today (4 nodes total) and found half of it missing. Two nodes were totally MIA. After some rebooting, I now have two groups
Two nodes see each other but not the rest
The other two nodes are doing the same thing, Except these two don't always see each other and are not really working that well. cmds lag.
And it gets weirder. Running qm list
on some nodes takes forever—if it doesn’t time out. Same with pvecm
commands, which is driving me nuts. I think this is mostly on the 2nd group.
I even went so far as to delnode
one of the problematic nodes, moved its VM config files (/etc/pve/nodes/<node-name>/qemu-server
) to another directory, rejoined it to the cluster, and moved the files back. That kind of worked? The VMs show up in the GUI now, but only sometimes with full details (RAM, name, etc.). But then that node went right back into clustered but cant reach other nodes.
What’s the deal with this? Why is everything so slow and unstable? Anyone dealt with this level of cluster drama before? You can all see each other nothing changed till you made me change things... WHY!!