vSphere Proactive HA is not a service of vSphere HA but rather a hardware monitoring feature that is available only when vSphere HA is enabled (as described in Chapter 19).

In this chapter, we will discuss the features and benefits of Proactive HA, how it works, and how you can use it to keep your virtual infrastructure up and running.

20.1 What Is vSphere Proactive HA?

vSphere Proactive HA is a feature that provides enhanced high availability protection for virtual machines by monitoring the health of the underlying physical servers and taking proactive measures to prevent potential failures.

Proactive HA monitors server hardware components, such as memory, CPU, and storage, for potential issues that could lead to failure. It then uses this information to predict when a server could fail and takes proactive measures to avoid that failure.

For example, suppose Proactive HA detects that a physical server is running low on memory because of a hardware memory failure. In that case, Proactive HA can automatically migrate VMs from one ESXi host to another ESXi host in the same Cluster with more available memory. This process allows the VMs to continue running without interruption, even if a server failure occurs.

Built on the core of vSphere HA, Proactive HA provides advanced monitoring and remediation capabilities to help prevent downtime and disruptions due to hardware failure.

20.2 How Does Proactive HA Work?

Proactive HA detects when an ESXi host is about to fail and then takes action to protect the VMs running on that host. It does this by using vSphere HA and Distributed Resource Scheduler (DRS) to monitor the health of each ESXi host in a cluster.

Proactive HA has the following four stages:

  • Monitoring: Proactive HA monitors hardware components of physical servers, such as memory, CPU, storage, and network. It also monitors environmental factors like temperature, power consumption, and other server health conditions.

  • Predictive Analysis: The monitoring data is analyzed by vSphere to predict potential hardware failures that might happen soon. For example, if memory usage runs high or the temperature rises above the acceptable limit, Proactive HA predicts that the server will soon encounter a problem.

  • Remediation: Once Proactive HA detects an impending hardware failure, it takes proactive measures to prevent it. For example, if a server is predicted to run out of memory, Proactive HA can automatically migrate VMs to other servers in the Cluster with more available memory. This ensures the VMs continue running without disruption, even if a server fails.

  • Notification: Proactive HA sends notifications to administrators when it detects an issue and takes action. The notification includes details of the problem and what action was taken, allowing administrators to quickly respond if further action is necessary.

Proactive HA provides additional protection for VMs in the vSphere HA and vSphere DRS by proactively identifying and mitigating potential issues before they cause downtime or data loss. It helps to increase the reliability of vSphere environments by detecting and addressing hardware issues early on.

20.3 How to Configure Proactive HA

To enable and configure Proactive HA, go to vCenter, select the Cluster for which you want to enable Proactive HA, click the Configure tab, select vSphere Availability, and click Edit (directly below the Edit button for vSphere HA), as shown in Figure 20-1.

Figure 20-1
A screenshot titled Vembu Cluster highlights the following options. v Sphere availability. Proactive H A is turned OFF.

(vSphere Proactive HA 1 of 3)

In the Edit Proactive HA dialog box, enable Proactive HA and select options for Automation Level and Remediation. Figure 20-2 shows the expanded drop-down lists for each setting, described next.

Figure 20-2
A screenshot titled edit proactive H A highlights the following option. Proactive H A. Automotive Level. Remediation.

(vSphere Proactive HA 2 of 3)

  • Automation Level: Determines whether host quarantine, maintenance mode, and VM migrations are recommendations or automatic.

    • Manual: DRS will suggest recommendations for VMs and hosts.

    • Automated: Virtual machines will be migrated to healthy hosts, and degraded hosts will be entered into quarantine or maintenance mode depending on the configured Proactive HA automation level.

  • Remediation: There are three types of Remediation mode in Proactive HA:

    • Quarantine Mode: This mode is applicable for all types of failures. It aims to strike a balance between performance and availability by limiting the use of partially degraded hosts. The system will continue to utilize these hosts as long as doing so does not impact VM performance. This approach is beneficial for ensuring availability while avoiding the potential overreaction of moving all operations from a host that is still capable of functioning adequately.

    • Mixed Mode: Designed for varying levels of failure, this mode uses a tiered approach. For moderate failures, it behaves similarly to Quarantine Mode by avoiding the use of moderately degraded hosts unless it impacts VM performance. For severe failures, it escalates the response to Maintenance Mode, ensuring that VMs are not run on hosts that are considered severely compromised.

    • Maintenance Mode: The most conservative of the three, Maintenance Mode is triggered for any level of failure. It guarantees that no VMs will continue to operate on a failing host by completely offloading all operational capabilities from the affected host to ensure maximum protection and stability.

For purposes of the test lab, set the options to fully automatic, as shown in Figure 20-3: Automation Level to Automated and Remediation to Mixed mode.

Figure 20-3
A screenshot contains the following options. Automation Level. Remediation. The automated and mixed mode drop-down menus are highlighted.

(vSphere Proactive HA 3 of 3)

20.4 The Difference Between Proactive HA and DRS

Distributed Resource Scheduler (DRS), discussed in several previous chapters, is sometimes confused with Proactive HA. They are not the same thing.

Proactive HA and DRS serve different purposes, but as previously discussed, DRS must also be enabled to enable Proactive HA.

Proactive HA primarily focuses on identifying and mitigating potential hardware failures before they occur, such as predicting server component failures (like power could be a failure in a power supply, storage, disks, and network) and monitoring issues with server temperature. Proactive HA takes proactive actions to avoid this type of service disruption.

On the other hand, DRS is a feature that provides intelligent resource management for virtualized environments. DRS is designed to optimize resource utilization and balance workloads across ESXi hosts by automatically migrating VMs to hosts with more available resources, such as CPU, memory, and storage. DRS helps to ensure that resources are used efficiently and workloads are evenly distributed, which can help to improve performance and reduce the risk of performance bottlenecks.

In summary, while Proactive HA and DRS are designed to improve virtualized environments’ performance and reliability, Proactive HA focuses on identifying and mitigating hardware failures before they occur, while DRS focuses on optimizing resource utilization and balancing workloads across ESXi hosts inside the Cluster.

Proactive HA is a valuable feature in VMware’s vSphere virtualization platform that provides enhanced high-availability protection for virtual machines by monitoring the health of the underlying physical servers and taking proactive measures to prevent potential failures.

Proactive HA helps to ensure that VMs continue running without disruption, even if a server failure occurs. Proactive HA can predict any hardware failures automatically in your VMware environment and takes remedial actions. This feature provides an additional layer of protection for your vSphere HA and vSphere DRS, helping to increase their reliability and minimize the risk of downtime or data loss.

vSphere Proactive HA is a powerful feature for VMware administrators to ensure their virtual machines remain highly available, even when faced with hardware outages.

Note

It is important to note that for Proactive HA to operate correctly, it requires integration with a specific hardware provider. This integration allows Proactive HA to receive detailed insights about the health and status of physical hardware components.

20.5 Summary

In this Chapter we explored vSphere Proactive HA, a feature that enhances the availability of virtual machines by monitoring the health of physical server components to predict and preempt potential failures. Proactive HA operates through stages of monitoring, predictive analysis, remediation, and notification to prevent server issues from disrupting virtual environments. It enables proactive VM migrations to healthier hosts, ensuring continuous operations.

The chapter also detailed how to configure Proactive HA, including setting automation levels and choosing remediation strategies to manage server health effectively. Additionally, it distinguished Proactive HA from vSphere DRS, emphasizing that Proactive HA focuses on preventing hardware failures, while vSphere DRS optimizes resource distribution and workload balance across ESXi hosts. This distinction underlines Proactive HA role in preemptively securing infrastructure, maintaining high availability, and minimizing downtime in your vSphere Cluster.

Chapter 21 introduces vSphere Fault Tolerance (FT). We’ll explore how it enhances the resilience of VMware environments by ensuring availability of virtual machines by creating duplicate instances. We will delve into the mechanism of FT, its advantages and limitations, and the requirements needed to implement it successfully. Moreover, we will discuss the configuration of FT.