Friday, July 24, 2009

In The Lab: vSphere DPM, Quirky but Functional

[caption id="attachment_821" align="alignright" width="278" caption="ESX Hosts in Standby Using DRS's DPM Extension"]ESX Hosts in Standby Using DRS's DPM Extension[/caption]

It would be hard to complain that virtualization is contributing to any kind of "global warming" hysteria. In fact and by its very consolidating nature, virtualization offers many advantages over "traditional" computing models that make it "green" even in its most basic format. VMware reinforces this argument with the claims that "every server that is virtualized saves 7,000 kWh of electricity and four tons of carbon dioxide emissions per year."

However, virtualization's promise was born out of the recognition that x86 servers were commonly operating with enormous "excess capacity." In typical deployments, virtual servers are driven to only 30-40% of capacity, and where excess capacity abounds, there exists a potential for slashing available resources in a quest for limiting on-line power consumption.

Enter VMware's power saving extension to its Distributed Resource Scheduler (DRS) that scrubs VMware clusters for unused capacity, automatically consolidates stray virtual machines away from near-idle members and shuts those members down to conserve power. This magic green genie is called Distributed Power Management (DPM) and, while simple to configure, has a few quirks that not only hinder its effectiveness but actually make no sense (more on that later).

[caption id="attachment_833" align="aligncenter" width="350" caption="DRS's DPM In Action"]DRS's DPM In Action[/caption]
"VMware DPM monitors the cumulative demand of all virtual machines in the cluster for memory and CPU resources and compares this to the total available resource capacity of all hosts in the cluster. If sufficient excess capacity is found, VMware DPM places one or more hosts in standby mode and powers them off after migrating their virtual machines to other hosts. Conversely, when capacity is deemed to be inadequate, DRS brings hosts out of standby mode (powers them on) and migrates virtual machines, using VMotion, to them. When making these calculations, VMware DPM considers not only current demand, but it also honors any user-specified virtual machine resource reservations."

vSphere Resource Management Guide, Managing Power Resources

Put more simply, while vCenter and DRS monitors and manages virtual machine loads and the distribution of resources to ensure sound utilization, DPM identifies excess capacity, creates a DRS event to migrate virtual machines to consolidate resources within the limits of DRS rules, and puts the now vacated and idle ESX hosts into "standby mode." This "standby mode" is actually powered-down, requiring the full boot time of a cold start, and - while this difference may seem semantic to some - others will appreciate the time implication when a sudden load spike appears.

Configuring DPM


Once an ESX host is put into "standby" vSphere employs one of three methods for returning it to active service. These three methods, in order of vSphere's preference, are: Intelligent Platform Management Interface (IPMI). HP's Integrated Lights-Out (iLO) interface and Wake-on-LAN (WOL).

Wake-on-LAN (WOL)


While experimentally supported in ESX 3.5, the Wake-on-LAN (WOL) protocol is "fully" supported in ESX 4 and vSphere. A word of caution here - use WOL only as a last resort. Rumblings from VMware engineers about WOL failures suggest that Wake-on-LAN is not 100% reliable, so proceed with caution.

Enabling WOL will most likely require a BIOS setting change in the power management section. For PCI-E network adapters, the "Resume on PCI-E Wake" event must be enabled. Older motherboards with PCI or PCI-X network adapters may have a WOL connector that needs to be connected along with the BIOS trigger (older than PCI 2.2).

[caption id="attachment_836" align="aligncenter" width="450" caption="Enable WOL in BIOS for PCI-E Adapters"]Enable WOL in BIOS for PCI-E Adapters[/caption]

It must be noted that that the switch port(s) used for the WOL function will need to be placed in "auto-negotiation" mode - both on the vSwitch configuration and the physical switch. While this is the default for most, the more "anal" administrators out there will need to watch out for this one as it will interfere with their fixed port configuration strategies. VMware explains it this way:
"The switch port that each WOL-supporting VMotion NIC is plugged into should be set to auto negotiate the link speed, and not set to a fixed speed (for example, 1000 Mb/s). Many NICs support WOL only if they can switch to 100 Mb/s or less when the host is powered off."

Before spending a ton of time on WOL, test with wolcmd.exe or some other WOL testing utility. Some LOM network adapters will not support WOL, and those that do may require an adjustment in the adapter's boot agent settings. Using a WOL utility to check these functions will save a lot of time struggling with vCenter.

IPMI/iLO Configuration


IPMI/iLO implies a "more reliable" mode of operation since it relies on out-of-band management to activate the power-on event. Likewise, IPMI/iLO events are more likely to span networks than WOL since WOL ultimately works at layer-2. If your vCenter is not directly connected to your WOL target network (i.e. ESX hosts) you should opt for IPMI as the DPM management protocol for your DRS cluster.

The first step is to configure your ESX host's IPMI interface for the vCenter "user" that will be allowed to issue power-on events over the network. For KIRA100 systems (Supermicro, Tyan, etc.), administrators should have the ability to create a vCenter user that has minimal control over the IPMI interface specifically for the purpose of issuing power-on events. This user will be configured in vCenter to power-on your "standby" hosts.

For example, use the administrative login to the BMC and create a new control group called "DPM". Modify DPM's such that none of the BMC functions are available to the group members and only the IPMI privilege level of "operator" is granted. Then, create a new BMC user named "vsphere" and assign it to the group "DPM" with a strong password.

[caption id="attachment_823" align="alignleft" width="240" caption="Configuring IPMI for DPM Host Management"]Configuring IPMI for DPM Host Management[/caption]

From vCenter, select the host for DPM control from the host inventory and open the "Configuration" tab. From the "Software" menu block, click on "Power Management" and click again on the "Properties..." link to open the IPMI configuration tab. In the "Edit IPMI/iLO Settings" requester, enter the username "vsphere" (or whatever you have chosen), password, the IP address of the BMC controller for this ESX host, and the BMC's Ethernet MAC address.

Once the information is entered, click "OK" and vCenter will try to login using the username and password to the device identified in the BMC settings. If any of this information is incorrect, vCenter will generate an error directing you to make the appropriate changes. When the information is correct, vCenter will be configured for IPMI control of your ESX host's power state.

Testing Before Adding to DRS


[caption id="attachment_825" align="alignright" width="223" caption="Say Again? Testing Before Adding to Cluster is not Possible"]Say Again? Testing Before Adding to Cluster is not Possible[/caption]

Here's the first quirk: you can't test DPM without adding the server to a DRS cluster, but vCenter complains that you should not add an untested (DPM) host to a cluster without testing, however you can only test DPM function if the host is a member of a DRS cluster. Confusing? Not really.

The best way to test a new intended DPM host is to create a temporary cluster and add the host-under-test to the test cluster. When you create the cluster, do not enable DPM - just add the host to the cluster and the "Enter Standby Mode" option should no longer be "ghosted out" on the host's menu.

Select the newly active "Enter Standby Mode" and wait. At this point it is helpful to have KVM or IP/KVM access to the ESX host to monitor the progression of the power events. Unlike the name implies, "Standby Mode" is not actually standby but a full power-down of the server.

vCenter will monitor the powering-down of the host and report its status with the familiar progress bar in the "recent tasks" window. Once complete, the host will appear in vCenter not as "disconnected" but as "standby mode" with a nice little, blue, crescent moon icon over the host. At this point the host is completely off.

Right clicking on the now-standby host's icon in the inventory, select the "Power On" menu item and watch your KVM monitor for success. Within a few seconds of the status bar reaching 25% you should be seeing the boot screen on your KVM. If not, call Houston because there's a problem. Once the host is fully rebooted, it will still take a few seconds for vCenter to become aware and re-enable the HA agent.

Why not simply put the host into maintenance mode to do this testing in a "live" HA cluster and save time? Sorry, Beavis, but only "Shutdown" and "Reboot" are available to hosts in maintenance mode. A host must be both active and in a DRS cluster for standby mode to be available.

Adding DPM to HA/DRS Clusters


Once all of your hosts have successfully been tested by this formula, it's time to enable the DPM features in your DRS cluster. This is fairly straight forward, but not much of the mechanics of DPM is exposed to the vCenter administrator. To enable DPM, edit the cluster settings and turn-on DRS if it is not already on.

Enabling DRS adds five more configuration options to the cluster settings control:

  • VMware DRS (manual, partially automated and fully automated),

  • DRS Rules (affinity and anti-affinity),

  • DRS Virtual Machine Options,

  • DRS Power Management (off, manual and automatic), and

    • DRS Power Management Host Options




The last two options are specific to DPM and will determine whether or not a host can go into standby mode and how likely that is to occur. If Power Management is off, no recommendations will be made to DRS concerning power events.



[caption id="attachment_819" align="alignright" width="300" caption="DPM Settings: Aggressive Power Savings Increases Chance of Standby"]DPM Settings: Aggressive Power Savings Increases Chance of Standby[/caption]

DRS Power Management


Two active modes of operation are available to vCenter administrators. In the "manual" mode, this is similar to the "manual" mode for DRS - DPM looks for opportunities to conserve power based on your configured DPM threshold and makes recommendations for action, however, no actions are taken by DRS/DPM. In the "automatic" mode, operation is similar to "fully automated" mode of DRS wherein the same recommendations offered by "manual mode" will be acted on by DRS/DPM and cause migration and standby events.

DPM's five levels of configuration - from conservative to aggressive - include:

  • Priority 1 recommendations based on HA requirements and/or user-specified capacity requirements only.

  • Priority 2 or higher recommendations as above with power-on recommendations only if host utilization becomes "much higher" than targeted; likewise power-off recommendations are applied only if host utilization become "extremely low" in comparison to the target.

  • Priority 3 or higher recommendations as above with power-on recommendations only if host utilization becomes "higher" than targeted; likewise power-off recommendations are applied only if host utilization become "very low" in comparison to the target.

  • Priority 4 or higher recommendations as above with power-on recommendations only if host utilization becomes "higher" than targeted; likewise power-off recommendations are applied only if host utilization become "moderately low" in comparison to the target.

  • Priority 5 or higher recommendations as above with power-on recommendations only if host utilization becomes "higher" than targeted; likewise power-off recommendations are applied only if host utilization becomes "lower" than the target.


You will notice that aggressive and conservative apply only to the conditions of the extremes and NOT power consumption. By this, we mean that an aggressive power conservation strategy would seek to turn-off hosts any time resource utilization was "moderately lower" than the target but turn-on hosts only when resource utilization was "much higher" than the target. However, the DPM settings available allow for loose or tight coupling to demand in symmetric proportions only.



[caption id="attachment_820" align="alignright" width="300" caption="DPM Host Options: Until a Host is Tested, DPM Operation is Unverified"]DPM Host Options: Until a Host is Tested, DPM Operation is Unverified[/caption]

DPM Host Options


To make sure that actions taken will translate into hosts being moved into standby, the DPM Host Options settings can be adjusted after each new host is added to the DPM cluster. By default, each new host inherits the DPM Power Management setting as set above.

However, and as long as either manual or automatic DPM is enabled, each host can be set to either disabled, manual or automatic individually. This allows administrators to mix-and-match DPM functionality so that one or more manually specified hosts can be "always on" even though the cluster is running with fully automated DRS and fully automated DPM otherwise.

Quirky Behavior


It's true that you cannot test DPM on a host without first adding the server to a DRS cluster. Guidance (in the form of a note on the "Power Management" cluster settings control) indicates that DPM not be turned-on until "exit standby" tests are completed for each host. This doesn't work very well for new hosts being added to existing clusters, and we've resolved this catch-22 by testing in a temporary cluster where we performed all standby/resume tests until a successful formula is reached.

We discovered that IPMI configuration is immutable - that is, once configured for IPMI you cannot go back to WOL. If your IPMI/iLO card should fail, there is no way to revert to WOL without removing the host from vCenter and re-adding it with the default (cleared) IPMI settings. While this is inconvenient at best, it still means a significant penalty to administrators recovering from IPMI/iLO card failures or misconfiguration.

Here's one thing we really hate: "Reboot" does not offer autopilot guidence and instead, takes down the ESX host and all associated templates and virtual machines. We'd like to see VMware react like "maintenance mode" and follow the DRS shutdown process of migration before termination - that is at least for DRS members. What we found happens is that a shutdown of a DRS/DPM host traps any host-associated VM's in a unmanageable state while DRS tries to figure-out what happened to the host. If the reboot is delayed, the host reverts to a "standby mode" status from which DPM tries to "exit" from. This locks the VM's that had been attached to the host on shutdown - including those in power-off state - while DPM waits for the host to return. If no return is imminent, the DPM "exit" cannot be "canceled" and one must wait for the process to fail. VMware needs to fix this "feature."

Closing Thoughts


DPM does not appear to be smart from a host-resources perspective. Host core counts and memory configurations do not seem to play a (large enough) role in chosing which host to place into "standby" - DRS/DPM only seems to value socket count. For instance, in a 3-host DPM cluster, DPM placed a 6-core/24GB RAM node into standby over a 4-core/16GB host. Was more power saved? Maybe, but the ability to respond to load variations would be better served by the better equipped host that was left to shadow an 8-core/24GB host. Minor quip? Maybe, but we're looking at enterprise license features and - coupled with the lack of tuning for "true power savings" DPM 1.0 (if it was 0.9b in 3.5) is still looking a bit nascent.

Even with these things said, DPM will work for enterprise customers looking to shave a few dollars of the power and cooling budget. At 0.350 kWh/host savings over 18-hours per day (average) we're looking at a small savings of about $400/host/year including cooling. We'll continue to play with DPM to determine how well a DPM cluster tuned for single-host viability can work even in the smallest of enterprises.

1 comment: