# HowTo setup a 3-node Proxmox VE 7.1 cluster and configure it for High Availability on a single machine - Part 6 - for learning & testing purposes only

In our earlier [Part 5 article](https://blog.habitats.tech/howto-setup-a-3-node-proxmox-ve-71-cluster-and-configure-it-for-high-availability-on-a-single-machine-part-5-for-learning-and-testing-purposes-only)
 we summarised the 3-node PVE cluster operations to organise our thoughts prior to going fully automated High Availability.

In this guide we will explore the automated Guest failover mechanism. For a detailed understanding read the official [PVE Clustering Guide](https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvecm).

> **Guest** = a CT or VM allocated virtualised hardware resources to run

> **GID** = Guest ID (CTID or VMID) - a unique number > 100 across a cluster

> **CT** = Container (LXC)

> **VM** = Virtual Machine (KVM)

> **FS** = FileSystem

> **HA** = High Availability

> **PVE** = Proxmox VE

> **WAC** = Web Admin Console (the web browser GUI to administer PVE nodes & clusters)

In order for Guests to automatically failover from a failed node to a running node we must setup HA.

### Replication

As a preamble, read the [PVE Storage Replication Guide](https://192.168.0.11:8006/pve-docs/chapter-pvesr.html#pvesr_schedule_time_format).

Let's setup replication for a number of Guests as follows:

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644509603304/BxA4kgdIh.png)
![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644510187410/KaZNcXKzJ.png)

> A Guest cannot be replicated on the Node it is running on.

Replication Schedule status, can be listed per Node and per Guest.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644510479111/vFCFTdDj0.png)

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644511273081/CLkemrJMd.png)

### High Availability (HA)

The first thing to consider is if we require [HA Groups](https://192.168.0.10:8006/pve-docs/chapter-ha-manager.html#ha_manager_groups). A Group is a collection of cluster nodes. A Guest can be restricted to run only on the members of such group.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644513239841/lddwBduQO.png)

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644513411720/HpXhE34q6.png)

To setup automated failover for Guest with GID 100 create the following resource. We assign this Guest to HA Group hveg-0 because if Node pve-0 fails the Guest will automatically failover to the remaining Nodes in the Cluster; however when Node pve-0 comes back again, the Guest will automatically failover back to pve-0 because it has been assigned to the HA Group hveg-0 which has only one Node pve-0 assigned.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644513739040/eIaH6iaaV.png)

We assign all other Guests as follows:

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644514402446/fC1FqMnbH.png)

Let's test HA. We will stop VM **100 (pve-0)** on **pve-253**. This action will take offline Cluster Node pve-0.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515168889/LNqa5B52C.png)

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515212846/nBxstdTMh.png)

Because we have stopped Cluster Node pve-0, the WAC for pve-0 will stop as well. We now must use the WAC of pve-1 or pve-2 (https://192.168.0.11:8006 or https://192.168.0.12:8006) to access the Cluster.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515495690/FfhkQiYOK.png)

After a while the Guests of pve-0 should have migrated and spread evenly to the other nodes in the cluster.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515643947/0kEwgYb4a.png)

Let's start VM **100 (pve-0)** on Node **pve-253**. This action will bring back online Cluster Node pve-0.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515875659/grKMMysUH.png)

Let's go back to our Cluster WAC **pvec-0** and see if **100 (guest-ct-0)** and **101 (guest-vm-0)** migrate back to Cluster Node **pve-0**.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644515980647/-HC1js-kT.png)

After a while, almost there:

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644516032740/0cpl-3BRa.png)

A while longer, there are issues. The CT migrated, but the VM failed to migrate to pve-0.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644516438465/_Pz7wrxAl.png)

It seems the reason is there are VM Snapshots on pve-0.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644516164190/Uk5F1-KiY.png)

Let's delete the VM **101 (guest-vm-0)** snapshots:

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644516875135/G57Fj4EfQ.png)

Still migration fails (pve will attempt to automatically migrate 101 (guest-vm-0) to pve-0 until it succeeds):

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644517187596/fbfgAEd06.png)

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644517229188/ZNT9_YC2F.png)

Let's remove the snapshot volumes for VM **101 (guest-vm-0)**.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644517385499/Nqf1L1NyZ.png)

We need to shutdown VM **101 (guest-vm-0)** and then remove the CD/DVD drive attached to it.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644517893006/D7SBmzJLV.png)

VM **101 (guest-vm-0)** will now migrate instantly and automatically start.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644518389868/a-3FNxuHa.png)

Let's try to take down 2 cluster nodes, pve-0 and pve-1. Go to the WAC of pve-253 and first stop VM 100 (pve-2). Wait a few mins and then stop VM 101 (pve-1). Do you think the cluster will manage to migrate everything to pve-2? Unfortunately no, because the cluster has lost QUORUM. We need at least 2 members to achieve quorum. Therefore, if we want to sustain up to 2 node failures our cluster must be 4-node.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644553867643/xhZcBChtx.png)

What will happen if pve-0 comes up, while pve-1 is down. Let's see.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644554070380/zCYjN5j8S.png)

It might take a while, but assuming QUORUM is OK, everything should start going live automatically.

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1644554358320/GSXdvYATJ.png)

Now bring pve-1 into a running state and see what happens. Try to resolve the migration issues on your own.

> Note: Once you have completed VM installation from media CD/DVD ISO, you should always remove the associated drive from the VM as it is no longer needed, to avoid failover issues during automated migration.

In a future guide we will cover HA using CEPH. We will now take a break and cover other technology topics in our next posts.

---
Please consider subscribing to my blog, as you will only ever get quality content; no time wasting, advertising, spamming or other unproductive activities/practices.

Please also consider visiting and subscribing to our YouTube channel; we have recently started posting videos.

We are committed to improving and enhancing over time.

If there is something you would like us to cover in a future topic/guide please let us know.

> **Important Note:** From time to time we **enhance the content of our posts**. It is therefore recommended you link to our original post in our https://blog.habitats.tech, by either subscribing to our https://blog.habitats.tech or visiting our subreddit https://www.reddit.com/r/HabitatsTech/ (our posts in Reddit link to our original posts).
