15.6. Configuration

The HA stack is well integrated into the Proxmox VE API. So, for example, HA can be configured via the ha-manager command line interface, or the Proxmox VE web interface - both interfaces provide an easy way to manage HA. Automation tools can use the API directly.

All HA configuration files are within /etc/pve/ha/, so they get automatically distributed to the cluster nodes, and all nodes share the same HA configuration.

screenshot/gui-ha-manager-status.png

The resource configuration file /etc/pve/ha/resources.cfg stores the list of resources managed by ha-manager. A resource configuration inside that list looks like this:

<type>: <name>
        <property> <value>
        ...

It starts with a resource type followed by a resource specific name, separated with colon. Together this forms the HA resource ID, which is used by all ha-manager commands to uniquely identify a resource (example: vm:100 or ct:101). The next lines contain additional properties:

Here is a real world example with one VM and one container. As you see, the syntax of those files is really simple, so it is even possible to read or edit those files using your favorite editor:

Configuration Example (/etc/pve/ha/resources.cfg). 

vm: 501
    state started
    max_relocate 2

ct: 102
    # Note: use default settings for everything

screenshot/gui-ha-manager-add-resource.png

The above config was generated using the ha-manager command line tool:

# ha-manager add vm:501 --state started --max_relocate 2
# ha-manager add ct:102
screenshot/gui-ha-manager-groups-view.png

The HA group configuration file /etc/pve/ha/groups.cfg is used to define groups of cluster nodes. A resource can be restricted to run only on the members of such group. A group configuration look like this:

group: <group>
       nodes <node_list>
       <property> <value>
       ...
screenshot/gui-ha-manager-add-group.png

A common requirement is that a resource should run on a specific node. Usually the resource is able to run on other nodes, so you can define an unrestricted group with a single member:

# ha-manager groupadd prefer_node1 --nodes node1

For bigger clusters, it makes sense to define a more detailed failover behavior. For example, you may want to run a set of services on node1 if possible. If node1 is not available, you want to run them equally split on node2 and node3. If those nodes also fail, the services should run on node4. To achieve this you could set the node list to:

# ha-manager groupadd mygroup1 -nodes "node1:2,node2:1,node3:1,node4"

Another use case is if a resource uses other resources only available on specific nodes, lets say node1 and node2. We need to make sure that HA manager does not use other nodes, so we need to create a restricted group with said nodes:

# ha-manager groupadd mygroup2 -nodes "node1,node2" -restricted

The above commands created the following group configuration file:

Configuration Example (/etc/pve/ha/groups.cfg). 

group: prefer_node1
       nodes node1

group: mygroup1
       nodes node2:1,node4,node1:2,node3:1

group: mygroup2
       nodes node2,node1
       restricted 1

The nofailback options is mostly useful to avoid unwanted resource movements during administration tasks. For example, if you need to migrate a service to a node which doesn’t have the highest priority in the group, you need to tell the HA manager not to instantly move this service back by setting the nofailback option.

Another scenario is when a service was fenced and it got recovered to another node. The admin tries to repair the fenced node and brings it up online again to investigate the cause of failure and check if it runs stably again. Setting the nofailback flag prevents the recovered services from moving straight back to the fenced node.