Ask the Core Team

The Windows Server 2008 Failover Clustering feature provides high availability for services and applications. To ensure applications and services remain highly available, it is imperative the cluster service running on each node in the cluster function at the highest level possible. Providing redundant and reliable communications connectivity among all the nodes in a cluster plays a large role in ensuring for the smooth functioning of the cluster. Configuring proper communications connectivity within a failover cluster not only provides access to highly available services required by clients but also guarantees the connectivity the cluster requires for its own internal communications needs. The sections that follow discuss Windows Server 2008 Failover Clustering networking features, functionality and recommended processes for the proper configuration and implementation of network connectivity within a cluster.

The following sections provide the information needed to understand failover cluster networking and to properly implement it.

Windows Server 2008 Failover Cluster networking features

Windows Server 2008 Failover Clustering introduces new networking capabilities that are a major shift away from the way things have been done in legacy clusters (Windows 2000\2003 and NT 4.0). Some of these take advantage of the new networking features that are included as part of the operating system and others are a result of feedback that has been received from customers. The new features include:

A new cluster network driver architecture
The ability to locate cluster nodes on different, routed networks in support of multi-site clusters
Support for DHCP assigned IP addresses
Improvements to the cluster health monitoring (heartbeat) mechanism
Support for IPv6

New cluster network driver architecture

The legacy cluster network driver (clusnet.sys) has been replaced with a new NDIS level driver called the Microsoft Failover Cluster Virtual Adapter (netft.sys). Whereas the legacy cluster network driver was listed as a Non-Plug and Play Driver, the new fault tolerant adapter actually appears as a network adapter when hidden devices are displayed in the Device Manager snap-in (Figure 1).

Figure 1: Device Manger Snap-in

The driver information is shown in Figure 2.

Figure 2: Microsoft Failover Cluster Virtual Adapter driver

The cluster adapter is also listed in the output of an ipconfig /all command on each node (Figure 3).

Figure 3: Microsoft Failover Cluster Virtual Adapter configuration information

The Failover Cluster Virtual Adapter is assigned a Media Access Control (MAC) address that is based on the MAC address of the first enumerated (by NDIS) physical NIC in the cluster node (Figure 4) and uses an APIPA (Automatic Private Internet Protocol Addressing) address.

Figure 4: Microsoft Failover Cluster Virtual Adapter MAC address

The goal of the new driver model is to sustain TCP/IP connectivity between two or more systems despite the failure of any component in the network path. This goal can be achieved provided at least one alternate physical path is available. In other words, a network component failure (NIC, router, switch, hub, etc…) should not cause inter-node cluster communications to break down, and communication should continue making progress in a timely manner (i.e. it may have a slower response but it will still exist) as long as an alternate physical route (link) is still available. If cluster communications cannot proceed on one network, the switchover to another cluster-enabled network is automatic. This is one of the primary reasons that each cluster node must have multiple network adapters available to support cluster communications and each one should be connected to different switches.

The failover cluster virtual adapter is implemented as an NDIS miniport adapter that pairs an internally constructed virtual route with each network found in a cluster node. The physical network adapters are exposed at the IP layer on each node. The NETFT driver transfers packets (cluster communications) on the virtual adapter by tunneling through the best available route in its internal routing table (Figure 5).

Figure 5: NetFT traffic flow diagram

Here is an example to illustrate this concept. A 2-Node cluster is connected to three networks that each node has in common (Public, Cluster and iSCSI). The output of an ipconfig /all command from one of the nodes is shown in Figure 6.

Figure 6: Example Cluster Node IP configuration

Note: Do not be concerned with the name ‘Microsoft Virtual Machine Bus Network Adapter’ as these examples were derived from cluster nodes running as Guests in Hyper-V.

The Microsoft Failover Cluster Virtual Adapter configuration information for each node is shown in Figure 7. Keep in mind; the default port for cluster communication is still TCP\UDP: 3343.

Figure 7: Node Failover Cluster Virtual Adapter configuration information

When the cluster service starts, and a node either Forms or Joins a cluster, NETFT, along with other components, is responsible for determining the node’s network configuration and connectivity with other nodes in the cluster. One of the first actions is establishing connectivity with the Microsoft Failover Cluster Virtual Adapter on all nodes in the cluster. Figure 8 shows an example of this in the cluster log.

Figure 8: Microsoft Failover Cluster Virtual Adapter information exchange

Note: You can see in Figure 8 that the endpoint pairs consist of both IPv4 and IPv6 addresses. The NETFT adapter prefers to use IPv6 and therefore will choose the IPv6 addresses for each end point to use.

As the cluster service startup continues, and the node either Forms or Joins a cluster, routing information is added to NETFT. Using the three networks mentioned previously, Figure 9 shows each route being added to a cluster.

Route between 1.0.0.31 and 1.0.0.32

Route between 192.168.0.31 and 192.168.0.32

Route between 172.16.0.31 and 172.16.0.32

Figure 9: Routes discovered and added to NETFT

Each ‘real’ route is added to the ‘virtual’ routes associated with the virtual adapter (NETFT). Again, note the preference for NETFT to use IPv6 as the protocol of choice.

The capability to place cluster nodes on different, routed networks in support of Multi-Site Clusters

Beginning with Windows Server 2008 failover clustering, individual cluster nodes can be located on separate, routed networks. This requires that resources that depend on IP Address resources (i.e. Network Name resources), implement an OR logic since it is unlikely that every cluster node will have a direct local connection to every network the cluster is aware of. This facilitates IP Address and hence Network Name resources coming online when services\applications failover to remote nodes. Here is an example (Figure 10) of the dependencies for the cluster name on a machine connected to two different networks.

Figure 10: Cluster Network Name resource with an OR dependency

All IP addresses associated with a Network Name resource, which come online, will be dynamically registered in DNS (if configured for dynamic updates). This is the default behavior. If the preferred behavior is to register all IP addresses that a Network Name depends on, then a private property of the Network Name resource must be modified. This private property is called RegisterAllProvidersIP (Figure 11). If this property is set equal to 1, all IP addresses will be registered in DNS and the DNS server will return the list of IP addresses associated with the A-Record to the client.

Figure 11: Parameters for a Network Name resource

Since cluster nodes can be located on different, routed networks, and the communication mechanisms have been changed to use reliable session protocols implemented over UDP (unicast), the networking requirements for Geographically Dispersed (Multi-Site) Clusters have changed. In previous versions of Microsoft clustering, all cluster nodes had to be located on the same network. This required ‘stretched’ VLANs be implemented when configuring multi-site clusters. Beginning with Windows Server 2008, this requirement is no longer necessary in all scenarios.

Support for DHCP assigned IP addresses

Beginning with Windows Server 2008 Failover Clustering, cluster IP address resources can obtain their addressing from DHCP servers as well as via static entries. If the cluster nodes themselves have at least one NIC that is configured to obtain an IP addresses from a DHCP server, then the default behavior will be to obtain an IP address automatically for all cluster IP address resources. The new ‘wizard-based’ processes in Failover Clustering understand the network configuration and will only ask for static addressing information when required. If the cluster node has statically assigned IP addresses, the cluster IP address resources will have to be configured with static IP addresses as well. Cluster IP address resource IP assignment follows the configuration of the physical node and each specific interface on the node. Even if the nodes are configured to obtain their IP addresses from a DHCP server, individual IP address resources can be changed to static addresses (Figure 12).

Figure 12: Changing DHCP assigned to Static IP address

Improvements to the cluster ‘heartbeat’ mechanism

The cluster ‘heartbeat’, or health checking mechanism, has changed in Windows Server 2008. While still using port 3343, it is no longer a broadcast communication. It is now unicast in nature and uses a Request-Reply type process. This provides for higher security and more reliable packet accountability. Using the Microsoft Network Monitor protocol analyzer to capture communications between nodes in a cluster, the ‘heartbeat’ mechanism can be seen (Figure 13).

Figure 13: Network Monitor capture

A typical frame is shown in Figure 14.

Figure 14: Heartbeat frame from a Network Monitor capture

There are properties of the cluster that address the heartbeat mechanism; these include SameSubnetDelay, CrossSubnetDelay, SameSubnetThreshold, and CrossSubnetThreshold (Figure 16).

Figure 16: Properties affecting the cluster heartbeat mechanism

The default configuration (shown here) means the cluster service will wait 5.0 seconds before considering a cluster node to be unreachable and have to regroup to update the view of the cluster (One heartbeat sent every second for five seconds). The limits on these settings are shown in Figure 17. Make changes to the appropriate settings depending on the scenario. The CrossSubnetDelay and CrossSubnetThreshold settings are typically used in multi-site scenarios where WAN links may exhibit higher than normal latency.

Figure 17: Heartbeat Configuration Settings

These settings allow for the heartbeat mechanism to be more ‘tolerant’ of networking delays. Modifying these settings, while a worthwhile test as part of a troubleshooting procedure (discussed later), should not be used as a substitute for identifying and correcting network connection delays.

Support for IPv6

Since the Windows Server 2008 OS will be supporting IPv6, the cluster service needs to support this functionality as well. This includes being able to support IPv6 IP Address resources and IPv4 IP Address resources either alone or in combination in a cluster. Clustering also supports IPv6 Tunnel Addresses. As previously noted, intra-node cluster communications by default use IPv6. For more information on IPv6, please review the following:

Microsoft Internet Protocol Version 6

In the next segment, I will discuss Implementing networks in support of Failover Clusters (Part 2). See ya then.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Hello, my name is Manoj Sehgal. I am a Senior Support Engineer in the Windows group and today’s blog will cover How to enable Bitlocker in Windows 7 and avoid one of the most common issues we see when enabling Bitlocker using GPOs.

A common problem we have seen since the release of Windows 7 has been in properly capturing the Bitlocker recovery keys in Active Directory. This is most likely due to incorrect policies settings for Bitlocker using GPO.

How to enable Bitlocker using GPO.

1. Open Group Policy Management Console and create a new Group Policy.

2. Right click on the policy and click Edit; you will see a Group Policy Management Editor window.

3. Expand Computer Configuration à Policies àAdministrative Templates à Windows Components à Bitlocker Drive Encryption.

You should see the below policy options for Bitlocker:

4. The policy we need to configure is: Provide Unique Identifiers for your organization.

5. Under the Fixed Data Drive section; Enable the below two policies as shown below. For more information on each policy refer to the Help tab for each policy.

6. Under the Operating System Drive section: Enable the below three policies as shown below. For more information on each policy refer to the Help tab for each policy.

· Require additional authentication at startup – Set this policy as per your requirement.

Configure TPM Startup; Configure TPM Startup PIN; Configure TPM Startup Key; Configure TPM Startup Key and PIN.

I f you want to use TPM + PIN as the startup type, see screen shot below.

7. Under the Removable Data Drives section: Enable the three policies as shown below. For more information on each policy refer to the Help tab for each policy.

8. Turn on TPM Backup to AD Domain Services.

In Group Policy Management Editor; Expand Computer Configuration à Policies àAdministrative Templates à System à Trusted Platform Module Service

Apply the policy to the specific OU or Domain where on the computers you want to be enable Bitlocker.

Run gpupdate /force on the client machine and run rsop.msc to see if the policies are applied.

If you don’t see the msFVE-RecoveryInformation in AD, most likely the policies are not set correctly. Also you can use Bitlocker AD Recovery Password Viewer to view the Recovery Password.

For a video walkthrough of the steps in this blog, check out the following video. NOTE: It’s best viewed in full-screen high resolution.

Manoj Sehgal
Senior Support Engineer
Microsoft Enterprise Platforms Support

“Why can’t I have a drive bigger than 2 TB?”

This is a common question. And as storage has gotten bigger and cheaper, I see it more and more. So let’s take a few minutes to talk about the mysterious 2 TB limit.

There are actually three different 2 TB limitations that are commonly hit...

Partition size
Number of clusters
SCSI goo

Partition Size Limitation

The partition size is pretty straight forward. On an MBR (Master Boot Record) disk, the locations where the partition sizes are stored are only 4 bytes long. Since this is in hexadecimal, the largest value we can stuff in there is all F’s. So the max value would 4,294,967,295 in decimal.

FF FF FF FFh = 4294967295d

This maximum partition size is not in bytes, it is in number of sectors. Since currently sectors are limited to 512 bytes, the maximum size ends up being 2 TB.

4,294,967,295 sectors * 512 bytes/sectors = 2,199,023,255,040 bytes or 2TB.

Number of Clusters

The second limitation is harder to spot. It is a limitation of NTFS. NTFS is limited to (2^32 -1) clusters….no matter what. The smallest cluster size possible is 512 bytes (1 sector). So again the math leaves us at 2,199,023,255,040 or 2TB.

(2^32)-1 = (4,294,967,296)-1 = 4,294,967,295 clusters

4,294,967,295 clusters * 512 bytes/cluster = = 2,199,023,255,040 bytes or 2TB

Here is a quick reference chart to help you see the maximum NTFS size for each cluster size.

Cluster size

NTFS Max Size

512 bytes

2,199,023,255,040 (2TB)

1024 bytes

4,398,046,510,080 (4TB)

2048 bytes

8,796,093,020,160 (8TB)

4096 bytes

17,592,186,040,320 (16TB) Default cluster size

8192 bytes

35,184,372,080,640 (32TB)

16384 bytes

70,368,744,161,280 (64TB)

32768 bytes

140,737,488,322,560 (128TB)

65536 bytes

281,474,976,654,120 (256TB)

Cluster size really depends on your needs. While 512 is fine if you just have a bunch of tiny files, it isn’t as efficient for say a volume with just SQL DBs. Also, a tiny cluster size can adversely affect VSS. But that is a topic for another time.

SCSI Goo

This is by far the hardest to understand as it requires some basic SCSI knowledge. Microsoft Windows operating systems support two different SCSI standards when it comes to reads and writes. There is a third but it is very old and is mostly just used one tape devices. So let’s just forget about that one and stick to the two that are relevant.

These two standards are Read10/Write10 and Read16/Write16. This all has to do with the way the CDB (Command Descriptor Block) is structured.

Read10/Write10 – This standard reserves bytes 2-5 to define the LBA (Logical Block Address). Think of LBA as sector numbers….it makes it easier on your brain. So we have 4 bytes that can define the addressable sectors. Just like in the ‘partition size limitation’ we are back to dealing with a 4 byte number used to define all the addresses on the drive.

FF FF FF FFh = 4294967295d

And just like before, the above is just the possible number of address (number of sectors). Multiply by the standard sector size of 512 bytes and we get…

4,294,967,295 sectors * 512 bytes/sectors = 2,199,023,255,040 bytes or 2TB.

What this all means is that when Windows is using the Read10/Write10 standard, then the biggest drive that will be supported is 2TB.

Read16/Write16 – Sometimes called LBA64 by some vendors, this standard reserves bytes 2-9 to define the LBAs. That would be 8 bytes, each byte being 8 bits in size. Now here is where we start getting into some really big numbers. The approximate size comes out to be around 8ZB (zettabytes). Here’s a quick chart to put it in some perspective.

1024KB = 1MB (megabyte)
1024MB = 1GB (gigabyte)
1024GB = 1TB (terabyte)
1024TB = 1PB (petabyte)
1024PB = 1EB (exabyte)
1024EB = 1ZB (zettabyte)

So it is going to be a while before we have to worry about running into the limitation of Read16/Write16.

Exceeding the limitations

Each limitation has a way of getting around it. Otherwise we’d still be stuck at 2TB.

Partition size limitation – There are actually two ways around this. The first way is to convert your disks to dynamic disks and create volume sets. This functionality has been around since Windows 2000. This doesn’t really allow you to increase the partition size. What it does is give you the ability to chain your partitions together to form larger volumes. So I could use two drives of 2TB and create a volume of roughly 4TB in size.

The second method to bypass the partition size limitation is to use a GPT (Guid Partition Table) configuration. In Windows 2003 SP1 Microsoft introduced its implementation of the GPT. A disk configured to be GPT rather than the MBR style would have a 32 sector partition array instead of a tiny 64 byte partition table.

NOTE: 32 sectors is equal to 16,384 bytes

The partitions that can be defined on a GPT disk can be up to 16EB in size.

Number of clusters – There really isn’t a way around this limitation. NTFS is currently still limited in its number of clusters. However, you can get past 2TB by making sure your cluster size is larger than the minimum size of 512 bytes.

It is important to note that for the most part, if you create a FAT partition and then convert it to NTFS the cluster size will come out as 512 bytes. There are ways around this but most of them require that you know ahead of time that you are going to be converting your FAT partition to NTFS.

SCSI goo – There isn’t anything in Windows you can do to get around the limitation of Read10/Write10 as Windows is already able to use Read16/Write16. However, the hardware MUST support it as well. Windows will query the storage devices and negotiate if Read10/Write10 or Read16/Write16 is to be used. When in doubt, check with your storage vendor.

Robert Mitchell
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Today’s blog will cover burning CD/DVD’S in Windows Server 2008 R2. When you are logged into a Windows Server 2008 R2 computer you may notice that you cannot burn a CD/DVD and/or the recording tab is missing from the properties of your CD/DVD drive in my computer.

By default accounts other than localsystem\administrator (regardless of the groups it belongs to) on server SKUS are considered to be “remote desktop” and so have the same security restrictions that come with remote desktop sessions.

In order to burn CD/DVD’S you can do one of the following:

Login as localsystem\administrator
Run a 3^rd party burning utility elevated

Note: You must have the Desktop Experience pack installed to get the built in Windows ISO burning applet.

Scott McArthur
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

In Part 1, I discussed Windows Server 2008 Failover Cluster networking features. In this segment, I will discuss implementing networks in a Failover Cluster.

Implementing networks in support of Failover Clusters

The main consideration when designing Failover Cluster networks is to ensure there is built-in redundancy for cluster communications. This is typically accomplished by having a minimum of two physical Network Interface Cards (NICs) installed in each node that will be part of the cluster. These cards must be supported by two separate and distinct buses (e.g. Two PCI NICs). Many people think a single multi-port NIC card meets this requirement – it does not as this configuration creates a single point of failure for all cluster communications. The best configuration would be two multi-port NICs running on separate buses and having fault tolerance implemented by way of NIC Teaming software (provided by 3^rd Party vendors.) and being physically connected to separate network switches.

Note: NIC Teaming is not supported on iSCSI connections. Please review the iSCSI Cluster Support: Frequently Asked Questions. The appropriate fault-tolerant mechanism for iSCSI connectivity would be multi-path software. Please review the Microsoft Multi-path I/O: Frequently Asked Questions.

There are two primary design scenarios when planning for Failover Cluster network connectivity. In the first scenario (and the most common), all nodes in the cluster are located on the same networks. In the second scenario, nodes in the cluster are located on separate and distinct routed networks (this is very common in multi-site cluster implementations). Figure 18 shows an example of the second scenario.

Figure 18: Multi-site cluster (network connectivity only)

Note: Even though it is supported to locate cluster nodes on separate, routed networks, it is still supported to connect nodes in a multi-site cluster using stretched Virtual Local Area Networks (VLAN). This configuration places the nodes on the same network(s).

It is important in any cluster that there are no NICs on the same node that are configured to be on the same subnet. This is because the cluster network driver uses the subnet to identify networks and will use the first one detected and ignore any other NICs configured on the same subnet on the same node. The cluster validation process will register a Warning if any network interfaces in a cluster node are configured to be on the same network. The only possible exception to this would be for iSCSI (Internet Small Computer System Interface) connections. If iSCSI is implemented in a cluster, and MPIO (Multi-Path Input/Output) is being used for fault-tolerant connections to iSCSI Storage, then it is possible that the network interfaces could be on the same network. In this configuration, the iSCSI network in the Failover Cluster Manager should be configured such that cluster would not use it for any cluster communications.

Note: Please consult the iSCSI Cluster support: Frequently Asked Question.

As previously mentioned, Windows Server 2008 accommodates cluster nodes being located on separate, routed networks by including a new logic, called an OR logic, when it comes to IP Address resources. Figure 19 illustrates this.

Figure 19: IP Address Resource OR logic

When a Network Name resource is configured with an OR dependency on more than one IP Address resource, this means at least one of the IP Address resources must be able to come Online before the Network Name resource can come Online. Since a Network Name resource can be associated with more than one IP Address, there is a property of a Network Name resource that can be modified so DNS registrations will occur for all of the IP Addresses. The property is called RegisterAllProvidersIP (See Figure 20).

Figure 20: Network Name resource properties

Note: In Figure 20 above, Failover Cluster PowerShell cmdlets were used to access cluster configuration information. This is new in Windows Server 2008 R2. For more information, review the TechNet Cmdlet Reference.

The default registration behavior is to register only the IP Address that can come Online on the node. Implementing this other behavior by modifying the setting to (1) can assist name resolution in a multi-site cluster scenario.

Note: Please review KB 947048 for other things to consider when deploying failover cluster nodes on different, routed subnets (multi-site cluster scenario).

While Failover Clusters require a minimum of two NICs to provide reliable cluster communications, there are scenarios where more NICs may be desired and\or required based on the services or applications that are running in the cluster. One such scenario we already mentioned – iSCSI connectivity to storage. The other scenario involves Microsoft’s virtualization technology – Hyper-V.

The integration of Failover Clustering with Hyper-V was introduced in Windows Server 2008 (RTM) in the form of making Virtual Machines highly available in a cluster by being able to move (Failover) the Virtual Machines between the nodes in the cluster using a process called Quick Migration. In Windows Server 2008 R2, additional capabilities were introduced including Live Migration and Cluster Shared Volumes (CSV). These features improved the high availability story for Virtual machines, but also introduced new networking requirements. The inner workings of Hyper-V networking will not be discussed here. For more information, please download this whitepaper (http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=3fac6d40-d6b5-4658-bc54-62b925ed7eea).

The networking requirements in a Hyper-V Cluster supporting Live Migration and using Cluster Shared Volumes (CSV) can add up quickly as illustrated in Figure 21.

Figure 21: Hypothetical Networking Requirements

For more information on Live Migration and Cluster Shared Volumes in Windows Server 2008 R2, visit the Microsoft TechNet site.

Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2

Hyper-V: Using Live Migration with Cluster Shared Volumes in Windows Server 2008 R2

In the next segment I will discuss Troubleshooting cluster networking issues (Part 3). See ya then.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

In Part 2, I discussed implementing networks in a Failover Cluster. In this final segment, I will discuss troubleshooting cluster networking issues.

Troubleshooting cluster networking issues

As previously stated, it is important that redundant and reliable cluster communications connectivity exist between all nodes in a cluster. However, there may be times when communications connectivity within a cluster gets disrupted either because of actual network failures or because of misconfiguration of network connectivity. A loss of communications connectivity with a node in a cluster can result in the node being removed from cluster membership. When a node is removed from cluster membership, it will terminate its cluster service to avoid problems or conflicts as other nodes in the cluster take over the services or applications and resources that were hosted on the node that was removed. The node will attempt to rejoin the cluster when the cluster service restarts. This problem can also have broader effects because the loss of a node in a cluster affects ‘quorum’. Should the number of nodes participating in a cluster fall below a majority; all highly available services will be taken Offline until ‘quorum’ is re-established (The quorum model, No Majority: Disk Only, is the one exception. However, this model is not recommended).

Here are some recommended troubleshooting procedures for cluster connectivity issues:

1. Examine the system log on each cluster node and identify any errors reporting a loss of communications connectivity in the cluster or even broader network related issues. Here are some example cluster related error messages you may encounter:

Figure 22: Cluster Network Connectivity error messages

Source: http://technet.microsoft.com/en-us/library/cc773562(WS.10).aspx

Figure 23: Network Connectivity and Configuration error messages

Source: http://technet.microsoft.com/en-us/library/cc773417(WS.10).aspx

2. If the system logs provide insufficient detail, generate the cluster logs and inspect the contents for more detailed information concerning the loss of network connectivity.

Note: Generate the cluster logs by running this PowerShell cmdlet –

3. Verify the configuration of all networks in the cluster.

4. Verify the configuration of network connectivity devices such as Ethernet switches.

5. Run an abbreviated cluster validation process by selecting only the Network tests.

The tests that are executed are shown here:

The desired end result is this:

As an example, here is the section in the validation report that shows the results for the List Network Binding Order test –

Some of the common issues seen with respect to the network validation tests include, but may not be limited to:

· Multiple NICs on a cluster node configured to be on the same subnet.

· Excessive latency (usually > 2 seconds) in ping tests between interfaces on cluster nodes.

· Warning that the firewall has been disabled on one or more nodes.

6. Conduct simple networking tests, such as a ‘ping’ test, across all networks enabled for cluster communications to verify connectivity between the nodes. Use network monitoring tools such as Microsoft’s Network Monitor to analyze network traffic between the nodes in the cluster (Refer to Figures 13 and 14).

7. Evaluate hardware failures related to networking devices such as Network Interface Cards (NICs), network cabling, or network connectivity devices such as switches and routers as needed.

8. Review the change management log (if one exists in your organization) to determine what, if any, changes were made to the nodes in the cluster that may be related to the disruption in communications connectivity.

9. Consider opening a support incident with Microsoft because if a node is removed from cluster membership, this means there were no networks configured on that node that could be used to communicate with other nodes in the cluster. If there are multiple networks configured for cluster use, as recommended, then cluster membership loss indicates a problem that affects all the networks or the system’s ability to send or receive heartbeat messages.

Note: For additional information on Troubleshooting Windows Server 2008 consult TechNet.

Hopefully, the information provided in this three part blog was helpful and will assist in properly configuring network connectivity in Windows Server 2008 Failover Clusters.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

I thought I’d post a quick blog on this topic since we run into cases where evicting a cluster node is used as a troubleshooting step. That being said, evicting a node should NEVER be a primary troubleshooting step.

Evicting a node to try and resolve a cluster issue may get you deeper in the hole and ultimately make the issue more complex than it started out. As an example, you originally started with a failover issue. You evict the node but now you can’t get the node back into the cluster. Since you can no longer add the node back, you have this secondary issue that must be resolved before you can address your original problem.

In my experience of working many cluster issues, I have never resolved an issue by evicting a node. The only times you should ever evict a node are under the following scenarios.

Replacing a node with different hardware.
Reinstalling the operating system.
Permanently removing a node from a cluster.
Renaming a node of a cluster.

Let’s take a look at some very common scenarios where I’ve seen evicting a node used improperly.

Cluster service won’t start on node 2 of a cluster. Node 2 is evicted from the cluster. The original problem with why the cluster service didn’t start is still there but now that same problem also prevents node 2 from coming back into the cluster.

Resources don’t failover to node 2. Every time a failover occurs, the disks don’t come online and fail back to node 1. One of the nodes is evicted and then added back to the cluster. None of this addresses the disk issue so problem still remains.

If the reason for the disk failure is an Error 2, then the drives not seen properly by the evicted node. So when you go to try and add the evicted node back in and take the defaults, it could error trying to join back with this error in CLCFGSRV.LOG

Major Task ID: {B8C4066E-0246-4358-9DE5-25603EDD0CA0}
Minor Task ID: {3BB53C9E-E14A-4196-9066-5400FB8860C9}
Progress (min, max, current): 0, 1, 1
Description:
Checking that all nodes have access to the quorum resource
Status: 0x800713de
The quorum disk could not be located by the cluster service.
Additional Information:
For more information, visit Help and Support Services at
http://go.microsoft.com/fwlink/?LinkId=4441.

I could go on and on but the point I am trying to make is that unless you fall into the four specific scenarios I mention, don’t evict your cluster nodes. Your Microsoft Support Engineers thank you and your users will thank you.

Jeff Hughes
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Today’s blog will cover scenario where you would like to customize the Start menu and TaskBar(also called SuperBar) as part of your deployment.

In Windows 7 the start menu looks like this

The top 5 icons in the start menu are not customizable. Note that over time these icons will be replaced by the users most frequently used programs. You do have the option though to replace the bottom 5 icons in the start menu using an answer file.

Additionally the default Taskbar looks like this

The default icons in the taskbar are not customizable but you can add 3 additional icons to the Taskbar using unattend.xml

This could be done during an initial install of Windows 7 or as part of running sysprep to create an image. The answer file components you are going to use include

Microsoft-Windows-Shell-Setup\StartPanelLinks
Microsoft-Windows-Shell-Setup\TaskBarLinks

Both components should be added to the OobeSystem phase of setup. Here is sample unattend.xml code

<Link0>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\sound recorder.lnk</Link0>

<Link1>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\sync center.lnk</Link1>

<Link2>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\wordpad.lnk</Link2>

</TaskbarLinks>

<Link0>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\system tools\disk cleanup.lnk</Link0>

<Link1>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\system tools\resource monitor.lnk</Link1>

<Link2>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\system tools\system restore.lnk</Link2>

<Link3>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\system tools\task scheduler.lnk</Link3>

<Link4>%ALLUSERSPROFILE%\Microsoft\Windows\Start Menu\Programs\accessories\system tools\windows easy transfer.lnk</Link4>

</StartPanelLinks>

After running setup with autounattend.xml the start menu and superbar would look like this

Notes:

If you are doing a new install of Windows 7 using autounattend.xml any new user should get these same icons.
If you are creating a sysprep image you should make sure your answer file used with sysprep contains the CopyProfile=true entry.
If you remove any of the default icons as part of creating a sysprep image they will be recreated when a new user logs in even if you use CopyProfile=true. There is no supported method to prevent this from occurring

If you would like to pin items outside of an answer file see the following blog post

http://blogs.technet.com/deploymentguys/archive/2009/04/08/pin-items-to-the-start-menu-or-windows-7-taskbar-via-script.aspx

Note: CSS provides no support for custom scripts.

Hope this helps with your deployments.

Scott McArthur
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

While working with customers, I have frequently run into situations where they need help with Hyper-V snapshots. Help with understanding what snapshots are, how to use them, best practices with them, or, in worst case scenarios, they may need help in recovering them in order to use a VM.

What I’d like to do in this post, is run down some quick and helpful topics that will keep you on track to being successful each time you use the Hyper-V Snapshot feature.

Do not use Snapshots as a backup strategy

Snapshots can be very useful during the ‘Test and Development’ stage in your environment. And in fact, that’s pretty much what they were designed for.
Snapshots should not be used in a production environment as a replacement for having a well thought out backup strategy.
There are multiple backup solutions and configurations available for Hyper-V at this time and I’ll have some links at the end of this post for more information.

We encourage our customers to use Snapshots in a production environment prior to patching servers. This creates a point in time to fall back to if something doesn’t go according to plan. Once the patch or other system update is applied successfully to the VM, I usually suggest letting the VM run a day or so, and if it’s operating as expected, remove the Snapshot.

Manually interacting with Snapshot files

Snapshot files have long file names made up of the GUID associated with the Virtual Machine they are from and have extensions of .AVHD, .XML, .BIN, and .VSV. I’ve run into a couple of instances where customers have found these strange files, and not knowing what they were, removed them because some were rather large and the server was low on disk space. In these instances, the customers run into problems when rebooting the VMs that had Snapshots configured because the files are now gone

If you need to use Snapshots in your testing environment, and you are not using R2, I suggest the following configuration changes to avoid running space concerns with the system volume:

Change the default location of where Virtual Machines will reside.
Place this location on other media than the system volume to avoid running out of space.
Choose a location that is sufficiently large enough to accommodate the number of snapshots you expect to have in your environment.

Fig.1a shows you were to make these location changes within Hyper-V Manager.

Fig.1a

If you are using R2, then as previously mentioned, the Snapshot location will default to the same location as Virtual Hard Disks.

A key thing to remember with snapshots is that moving, renaming, deleting, or otherwise altering files associated with snapshots may result in the associated VM not working properly next time it’s restarted.
Once a snapshot, or multiple snapshots are created for a VM, all of the associated files must be present when the VM starts in order for the process to be successful.

The “Merge Rule”

Another area where customers can sometimes run into some trouble with snapshots is when they want to start pruning their snapshot tree and they aren’t sure which ones will be merged and which ones will be outright deleted.
This is where “The Rule” comes in:

1) If there are snapshots above the current running state (Now) that are deleted, then a merge will occur when the VM is shut down, turned off or saved.

2) If there are snapshots below the current running state (Now) that are deleted, then a merge will NOT occur when the VM is shut down, turned off or saved.

When a snapshot is deleted, saved state files (.bin and .vsv) get deleted immediately, even if a merge will occur when the VM is shutdown, turned off, or saved.
When you delete a snapshot, the .avhd file(s) that store the snapshot data remain in the storage location until the virtual machine is shut down, turned off, or put into a saved state

NOTE: What happens to the AVHD file(s) depends on the location of the deleted snapshot relative to the running state of the VM as noted in the 2 part table above.

Change Control

In the times where customers have ended up needing to call into support for help with recovering from deleted snapshots, or have needed some other type of assistance with a heavily nested snapshot tree, I have noticed that only rarely have they taken the little extra effort to rename the snapshots to something effective and descriptive.
Choosing the default naming convention is fine for 1 or 2 snapshots, but frequent snapshots result in a multi-branch, multi-level listing; not having the snapshots named something effective can result in conversations having a lot of “I don’t know”’s in them.

Fig.3 shows an example of what it would look like with the default names being chosen.

Fig.3

As you can see by this example, this is not a very clear indication of the differences between the snapshots on the 20th, versus the one I took on the 21st.

How to rename your snapshots

One thing you’ll notice is that when you right click a VM through Hyper-V Manager, and choose snapshot, you don’t get a choice of names. It just stamps the snapshot with the VM name, date and time. How boring!
There’s two ways you can add some helpful text to these titles.

1. You can right click on the snapshot of your choice and choose the rename option as shown in Fig.4

Fig.4

2. From the VMConnect window, select Action and then Snapshot as shown in Fig.5 and Fig.6, and you will be prompted to choose a name.

Fig.5

Fig.6

Choosing descriptive snapshot names may take a few seconds extra time up front, but in my experience, this effort can save hours of labor on the other end should there be a need for troubleshooting of a problem.

In closing I wanted to summarize some quick Best Practices when dealing with snapshots.

Best Practices

Do not use them as a backup strategy
DO use them for point in time “safety nets” during patching or VM system updates.
Do not store snapshots on the C:\ drive.
Do not move, delete, rename, edit, the associated snapshot files.
DO make sure you name your snapshots in a helpful & descriptive manner

I hope you found this post helpful!

Sean Dwyer
Support Escalation Engineer
Microsoft Enterprise Platforms Support

Additional Information:
Ben Armstrong’s blog: Virtual Machine Snapshotting under Hyper-V
http://blogs.msdn.com/virtual_pc_guy/archive/2008/03/11/virtual-machine-snapshotting-under-hyper-v.aspx
Ben Armstrong’s blog: Managing Snapshots with Hyper-V
http://blogs.msdn.com/virtual_pc_guy/archive/2008/01/16/managing-snapshots-with-hyper-v.aspx
Hyper-V Snapshot FAQ
http://technet.microsoft.com/en-us/library/dd560637(WS.10).aspx
Video discussion with Ben Armstrong discussing Snapshot common issues.
http://www.microsoft.com/showcase/en/us/details/c47aee4d-b89a-47b5-8c38-3a1d6e1997cf

Backing up VMs:
How to backup Hyper-V vms using Windows Server Backup
http://support.microsoft.com/kb/958662

Hello, my name is Manoj Sehgal. I am a Senior Support Engineer in the Windows group and today’s blog will cover How to initialize TPM successfully when you enable Bitlocker in Windows 7.

A common problem we have seen since the release of Windows 7 has been to initialize TPM successfully so that you can successfully turn ON Bitlocker. This is most likely due to incorrect permissions for the SELF account in AD for ms-TPMOwnerInformation attribute.

When you try to turn on Bitlocker on Windows 7 Operating System Drive, you may get the Access Denied Error message while initializing TPM.

Additionally, when you open the TPM Management Console and you try to initialize TPM you get error message 0x80070005.

NOTE: If you are using SCCM to build Windows 7 machines and using Bitlocker Task Sequencer you may see the following error message(s) logged in smsts.log for OSDbitlocker.

pTpm->TakeOwnership( sOwnerAuth ), HRESULT=80070005 e:\nts_sms_fre\sms\client\osdeployment\bitlocker\bitlocker.cpp,480)OSDBitLocker 3032 (0x0BD8)
Failed to take ownership of TPM. Ensure that Active Directory permissions are properly configured.
Access is denied. (Error: 80070005; Source: Windows) OSDBitLocker 3032 (0x0BD8)

Resolution:

To set correct permissions, follow the instruction below:

1. Open Active Directory Users and Computers.

2. Select the OU where you have all computers which will have Bitlocker turned ON.

3. Right Click on the OU and click Delegate Control.

4. Click Next and then click Add.

5. Type SELF as the Object Name.

6. Select create a custom task to delegate.

7. From the object in the folder, select Computer Objects.

8. Under show these permissions, select all 3 checkbox.

9. Scroll down in permissions and select the attribute Write msTPM-OwnerInformation.

10. Click Finish.

After you have done the above steps, you should be able to initialize TPM successfully.

More Information:

Backing Up BitLocker and TPM Recovery Information to AD DS

http://technet.microsoft.com/en-us/library/dd875529(WS.10).aspx

Author:

Manoj Sehgal
Senior Support Engineer
Microsoft Corporation

Hello, my name is Kaushik Ainapure. I am a Support Engineer in the Windows group and today’s blog I am going to discuss an issue with BitLocker drive preparation tool. When you try to run the BitLocker Preparation tool you may encounter the following error message:

The new active drive cannot be formatted. You may need to manually prepare your drive for BitLocker.

This can occur for the following reason:

If you have the “Fixed Data Drive read-only policy” called “Deny write access to fixed drives not protected by BitLocker” enabled

In order for BitLocker to operate, the hard disk requires at least two NTFS-formatted volumes: one for the operating system and another with a minimum size of 100MB from which the operating system boots. BitLocker requires the boot volume to remain unencrypted, so the boot should not be used to store confidential information.

This configuration helps protect the operating system and the information in the encrypted drive. The system drive may also be used to store the Windows Recovery Environment (Windows RE) and other files that may be specific to setup or upgrade programs. For example, using the system drive to store Windows RE along with the BitLocker startup file will increase the size of the system drive to 300 MB. This drive is not assigned a drive letter.

For Machines that do not have system reserved partition, BitLocker tool will create a system reserved partition of around 300MB on its own either by shrinking the existing partition or creating a partition from unallocated space, if available, on the system. During the creation of this partition, depending on which process needs to occur, you may see the following:

1. Shrink scenario: successfully shrunk and created a RAW partition. Failed to format it. Error message:

“The new active drive cannot be formatted. You may need to manually prepare your drive for BitLocker.”

2. Unallocated scenario: similar to the shrink case. Failed to format the newly created RAW partition, same error message.

For shrink/unallocated case, when Drive Prep failed, the new partition is left as formatted.

In these cases of Drive Prep failure, the machine is still able to boot as there is no change of the boot files and the active partition.

I have seen this happen mostly for customers who have upgraded from XP or Vista to Windows 7 and do not have the system reserved partition.

If you have the “Fixed Data Drive read-only policy” called “Deny write access to fixed drives not protected by BitLocker” enabled:

In order to resolve this issue you need to disable the policy. This policy setting determines whether BitLocker protection is required for fixed data drives to be writable on a computer. This policy setting is applied when you turn on BitLocker. If you enable this policy setting, all fixed data drives that are not BitLocker-protected will be mounted as read-only. If the drive is protected by BitLocker, it will be mounted with read and write access. If you disable or do not configure this policy setting, all fixed data drives on the computer will be mounted with read and write access.

How to disable BitLocker Drive Encryption Fixed Data Drive read-only policy using GPO.

1. Open Group Policy Management Console and create a new Group Policy.

2. Right click on the policy and click Edit; you will see a Group Policy Management Editor window.

3. Expand Computer Configuration –> Policies –> Administrative Templates –> Windows Components –> BitLocker Drive Encryption.

You should see the below policy options for BitLocker:

4. To require BitLocker protection on fixed data drives, in the details pane, double-click Deny write access to fixed drives not protected by BitLocker to open the policy setting.

5. Click Not Configured, click Apply to apply the setting, and then close the dialog box.

6. Close the Local Group Policy Editor.

7. Restart the computer.

Kaushik Ainapure
Support Engineer
Microsoft Enterprise Platforms Support

Greetings CORE blog fans! It has been awhile so I thought it was time for another blog. In recent weeks, we have seen an issue where the Windows Server 2008 R2 storage validation test List All Disks is failing with a Status 87. Figure 1, is an example of what is displayed in the cluster validation report.

Figure 1: List All Disks failure in Cluster Validation Report.

This error is also reflected in the ValidateStorage log (Figure 2) located in %systemroot%\Cluster\Reports directory.

000016f4.00001714::01:02:06.180 CreateNtFile: Path \Device\HarddiskVolume2, status 87
000016f4.00001714::01:02:06.180 GetNtldrDiskNumbers: Failed to open device \Device\HarddiskVolume2, status 87
000016f4.00001714::01:02:06.180 GetNtldrDiskNumbers: Exit GetNtldrDiskNumbers: status 87
000016f4.00001714::01:02:06.180 CprepPrepareNodePhase2: Failed to obtain boot disk list, status 87
000016f4.00001714::01:02:06.180 CprepPrepareNodePhase2: Exit CprepPrepareNodePhase2: hr 0x80070057, pulNumDisks 0

Figure 2: ValidateStorage log entry

The decode for these errors is shown in figure 3.

# for decimal 87 / hex 0x57 :
ERROR_INVALID_PARAMETER winerror.h
# The parameter is incorrect.

Figure 3: Error decode

The cause for this failure to this point is unknown. What we do know is the path that is called out as seen in Figure 3: above always points to the 100 megabyte partition that is created at the root of the system drive. This partition is created by default and is in place to support BitLocker. The approved workaround is to assign a drive letter to the 100 megabyte partition and re-run the validation process. The List All Disks storage test should pass at that point. There is no adverse impact to assigning the drive letter to this partition. As a reminder, BitLocker is not supported in a cluster environment. This is documented in KB 947302. If an attempt is made to enable BitLocker in a cluster node, the error in Figure 4 is displayed.

Figure 4: Error when trying to enable BitLocker on a cluster node

I have an ‘ask’ of our readership. If anyone reading this blog can ‘on demand’ repro this issue, we want to hear from you. This goes beyond just telling us, “Yeah, I’ve had that issue myself.” I am interested in hearing from anyone who has perhaps manipulated a setting in their controller card that can either cause validation to fail in this way or make it pass. I am interested in hearing from someone who had this failure, changed a setting of some kind, either in software or hardware, and the error went away. Be sure to provide the details (Make and model of controller, Firmware and driver versioning information, steps to reproduce the issue, etc…)

As always, we hope this has been informative for you.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Hello, my name is Manoj Sehgal. I am a Senior Support Engineer in the Windows group and today’s blog will cover “How to backup recovery information in AD after Bitlocker is turned ON in Windows 7.”

A common question we are asked is how do I save the recovery information for a Windows 7 machine which has Bitlocker turned ON.

This situation can arise when any of the following conditions are true, but is also not limited to this list:

a)    The machine is Bitlocker’ed prior to joining the Domain.
b)    The machine is not physically connected to the Network when enabling Bitlocker.
c)    When the GPO for Saving Recovery Information for Bitlocker is not setup correctly.

So when we open Active Directory Users and Computers portion of server manager you do not see msFVE-RecoveryInformation for the machine which was encrypted.

In this situation we can use manage-bde command from the client machine to save the recovery information in AD, instead of decrypting and encrypting the Operating system drive again for storing recovery information in AD.
First verify that the client machine is in the correct OU in AD where the Bitlocker group policies are applied and then follow the below steps:

Open elevated command prompt on the client computer and run the below command.

Note: You require local admin rights to run manage-bde commands.

c:> manage-bde -protectors -get c:

Example:

Bitlocker Drive Encryption: Configuration Tool version 6.1.7600
Copyright (C) Microsoft Corporation. All rights reserved.
Volume C: [Old Win7]
All Key Protectors
    External Key:
      ID: {F12ADB2E-22D5-4420-980C-851407E9EB30}
      External Key File Name:
        F12ADB2E-22D5-4420-980C-851407E9EB30.BEK

    Numerical Password:
      ID: {DFB478E6-8B3F-4DCA-9576-C1905B49C71E}
      Password:
        224631-534171-438834-445973-130867-430507-680922-709896

TPM And PIN:
ID: {EBAFC4D6-D044-4AFB-84E3-26E435067AA5}

If you see results above you should see ID and Password for Numerical Password.

Now run the below command, replace id for ID of Numerical Password.

c:> manage-bde -protectors -adbackup c: -id {DFB478E6-8B3F-4DCA-9576-C1905B49C71E}

Bitlocker Drive Encryption: Configuration Tool version 6.1.7600
Copyright (C) Microsoft Corporation. All rights reserved.
Recovery information was successfully backed up to Active Directory.

Now if you go to AD, and check the client computer you should see msFVE-RecoveryInformation for this client computer.

For more information on Group Policies for Bitlocker, see my blog below.
http://blogs.technet.com/askcore/archive/2010/02/16/cannot-save-recovery-information-for-Bitlocker-in-windows-7.aspx

Manoj Sehgal
Senior Support Engineer
Microsoft Enterprise Platforms Support

One of the major changes from 2003 to 2008 is the way we handle logging of cluster debug events. I thought I’d do a quick write-up on how cluster debug logging works in 2008.

In Windows 2003 Failover Clustering, the cluster service on each node constantly writes to a live debug output file. These files are located in the %SystemRoot%\Cluster folder on each node in the cluster and the name of the file is CLUSTER.LOG. The cluster log is local and specific to each node’s cluster service. Each node has a unique log file that represents its views and actions. This file is in a basic text format and can be easily viewed with Word, Notepad, etc.

In Windows Server 2008, a change was made to make the cluster debug logging mechanism more in line with how the rest of Windows handles event logging. The Win 2003 legacy CLUSTER.LOG text file no longer exists. In Win 2008 the cluster log is handled by the Windows Event Tracing (ETW) process. This is the same logging infrastructure that handles events for other aspects you are already well familiar with, such as the System or Application Event logs you view in Event Viewer.

For more information on the topic of Windows Event Tracing, see the following MSDN articles:

Improve Debugging And Performance Tuning With ETW

Event Tracing

About Event Tracing

How to Generate a Cluster Log

First we need to look at the cluster logs in some user friendly format. You can run the following commands to generate a text readable version of the ETL files. We’ll talk more about ETL files in a bit. The trace sessions are dumped into a text file that looks very similar to the legacy CLUSTER.LOG

If you are running Windows Server 2008, you can use the ‘cluster.exe’ command line . If you are running Windows Server 2008 R2, you can use either ‘cluster.exe’ command line or the cluster PowerShell cmdlets.

Command Line

c:\>cluster log /gen

Some useful switches for ‘cluster log /gen’ are:

Switch	Effect
c:\>cluster log /gen /COPY:”directory”	Dumps the logs on all nodes in the entire cluster to a single directory
c:\>cluster log /gen /SPAN:min	Just dump the last X minutes of the log
c:\>cluster log /gen /NODE:”node-name”	Useful when the ClusSvc is down to dump a specific node’s logs

For more detailed information on the ‘cluster log /gen’ command, see the TechNet article here.

Powershell

C:\PS> Get-ClusterLog

Some useful switches for ‘Get-ClusterLog’ are:

Switch	Effect
C:\PS> Get-ClusterLog –Destination	Dumps the logs on all nodes in the entire cluster to a single directory
C:\PS> Get-ClusterLog –TimeSpan	Just dump the last X minutes of the log
C:\PS> Get-ClusterLog –Node	Useful when the ClusSvc is down to dump a specific node’s logs

For more detailed information on the ‘Get-ClusterLog’ cmdlet, see the TechNet article here.

Failover Cluster Tracing Session

You can see the FailoverClustering ETW trace session (Microsoft-Windows-FailoverClustering) in ‘Reliability and Performance Monitor’ under ‘Data Collector Sets’, ‘Event Trace Sessions.

Cluster event tracing is enabled by default when you first configure the cluster and start the cluster service. The log files are stored in:

%WinDir%\System32\winevt\logs\

The log files are stored in an *.etl format.

Every time a server is rebooted, we take the previous *.etl file and append it with a 00X suffix

By default, we keep the most recent three ETL files. Only one of these files is the active or “live” log at any given time.

Format in Windows Server 2008

ClusterLog.etl.001
ClusterLog.etl.002
ClusterLog.etl.003

Format in Windows Server 2008 R2

Microsoft-Windows-FailoverClustering Diagnostic.etl.001
Microsoft-Windows-FailoverClustering Diagnostic.etl.002
Microsoft-Windows-FailoverClustering Diagnostic.etl.003

The default size of these logs is 100MB each. Later in this post, I’ll explain how to determine if this size is adequate for your environment.

The important thing to understand about the cluster logs is we use a circular logging mechanism. Since the logs are a finite 100MB in size, once we reach that limit, events from the beginning of the current or “live” ETL log will be truncated to make room for events at the end of the log.

Here’s an example of the timeframes captured in a series of ETL files.

In this example, every night the server is rebooted at midnight. The three most recent ETL logs would look like this:

REBOOT

The ETL.001 file is the active file being used by the live cluster service to write debug entries. Let’s say there were many entries being written to the current trace session file and we hit the 100MB limit at 3pm. At that point, events from 12am to 3am were truncated to make room for additional entries in the log. At that point, the ETL.001 log would look like this:

Now I want to view the cluster logs in some text readable format so I run either the ‘cluster log /gen’ command or the ‘Get-ClusterLog’ cmdlet

These commands take all the ETL.00X files and “glues” them together in the order they were created to make one contiguous text file.

<- LIVE TRACE SESSION

The file that gets created is located in \%WinDir%\Cluster\Reports\ and is called CLUSTER.LOG

If you were looking at the debug entries in the CLUSTER.LOG file, you might notice that since we glued all three logs together, there is an apparent gap in the CLUSTER.LOG from 11:59pm on 1/2 to 3am on 1/3. The assumption may be that either there is information missing during that time or there was nothing being logged. The missing time is not in and of itself a problem, just a side effect of concatenating all three logs together.

How Large Should I Set My Cluster Log Size?

So now that you understand why there may be “gaps” in the cluster debug log, let’s discuss how this relates to understanding how large your cluster log size should be. The first thing you need to determine is “How much time is covered in each ETL file?”. If you were looking at the CLUSTER.LOG text file from the above examples, you could see that the most recent ETL.001 file contained about 12 hours of data before it was truncated. The other ETL files contain about 24 hours of data. When we are talking about the cluster log size, we are ONLY talking about the live ETL trace session.

The amount of data written to the ETL files is very dependent on what the cluster is doing. If the cluster is idle, there may be minimal cluster debug logging, if the cluster is recovering from a failure scenario, the logging will be very verbose. The more data being written to the ETL file, the sooner it will get truncated. There’s no one “recommended” value for the size of the cluster log that will fit everyone.

Since the default cluster log size in Windows Server 2008 is 100MB, the above example shows that 100MB ETL file covers about a 12-24 hour timeframe. This means if I am troubleshooting a cluster issue, I have 12 hours from the failure to generate a cluster log or risk information being truncated. If it may not be feasible to generate a cluster log that soon after a failure, you should consider increasing the size of the cluster log. If you changed the size of the cluster log to 400MB, you are accounting for the need to have at least 2 days (12 hours X 4) of data in the live ETL file.

Why Should I Care About All This?

While reading cluster logs may not be something you generally do in your leisure time, it’s a critical log Microsoft Support uses to troubleshoot cluster issues. If there’s any other message I could convey here it’s that in order for us to do a good job of supporting our customers, we need to have the data available. We often get customers who open a case to try and figure out why their cluster had problems and those problems occurred over a week ago. As hard of a message it is to deliver, without the complete cluster log, root cause analysis becomes extremely difficult. If you intend to open up a support case with Microsoft, keep the following in mind.

The sooner after the failure we can capture the cluster logs, the better. The longer you wait to open a case and have us collect diagnostic information, the greater chance the cluster log will truncate the timeframe we need.
If you can’t open a case right away, at least run the ‘cluster log /gen’ or the ‘Get-ClusterLog’ command and save off the cluster log files for when you do have a chance to open a case.
It is generally recommended that your CLUSTER.LOG have at least 72 hours worth of continuous data retention. This is so that if you have a failure occur after you went home on Friday, you still have the data you need to troubleshoot the issue on Monday morning.

How To Change the Cluster Log Size

Changing the default cluster log size is fairly straightforward.

First, open an elevated command prompt and type the following:

C:\>cluster /prop

This will output the current properties of the cluster.

To change the cluster log size, open an elevated command prompt and type the command:

Powershell

C:\PS>Set-ClusterLog –Size X

Command Line

c:\>cluster log /Size:X

Where X is the size you want to set in MB. Note in the above screenshot, the default is 100.

How to Change the Detail Verbosity of the Cluster Log

In the above screenshot, the default cluster log level (ClusterLogLevel) is 3. Generally, you shouldn’t need to change the default level unless directed by a Microsoft Support Professional. Using debug level ‘5’ will have a performance impact on the server.

Level

Error

Warning

Info

Verbose

Debug

0 (Disabled)

3
(Default)

To change the cluster log level, open an elevated command prompt and type the command:

Powershell

C:\PS>Set-ClusterLog –Level X

Command Line

c:\>cluster log /Level:X

Where X is the desired log level.

Hope you found this information useful.

Jeff Hughes
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

The Windows Server 2008 Failover Clustering: Networking three-part blog series has been out for a little while now. Hopefully, it has been helpful. Little did I know there would be an opportunity to write another part. This segment will be short as it covers a very specific scenario. One that we rarely see, but we have encountered it enough that I felt it might be worth writing about it.

There are applications written to access resources that are being hosted in Microsoft clusters running on Windows Server 2008 (RTM + R2). The resource could be a File Server, could be a SQL database, or whatever. The point is that the required resource is being hosted in a Failover Cluster. It is hoped that applications that need to function in this manner are written properly to locate the required resource being hosted in a cluster. By that I mean I would expect an application to be written in a manner where it would first query a name server (DNS server) and then use the information obtained to make a proper connection to the required cluster resource. In a Failover Cluster, that connection point is known as a Client Access Point (CAP). A CAP consists of a Network Name (NetBIOS) resource and one or more IP Address resources. The default behavior in a Windows Server 2008 cluster is to dynamically register CAP information in a DNS server provided it is configured to support Dynamic Updates. This occurs when the CAP is brought Online in the cluster. There are applications that are not written in this manner. There are some application that are written in such a way that they will make a local connection on a cluster node by binding to the first network adapter and then use the IP address configured for that adapter. The end result is in a cluster, the first connection listed in the binding order by default is the Microsoft Failover Cluster Virtual Adapter. This adapter uses an IP address that is drawn from the APIPA (Automatic Private IP Addressing) address range which is non-routable and not registered in DNS.

To assist with helping make these types of applications work better, we can use a utility that has been released for public download on the Microsoft MSDN site. The utility is called ‘nvspbind.’ So, the first step is to download and install the utility on each cluster node. The options we will be using are shown in Figure 1.

Figure 1: Options for nvspbind

First we need to identify the adapter that is the Microsoft Failover Cluster Virtual Adapter by using the nvspbind /n command (Figure 2). The adapter is ‘Local area connection* 9’.

Figure 2: Identify the Microsoft Failover Cluster Virtual Adapter

Next, we use the 'nvspbind /o ms_tcpip’ to determine the binding order for IPv4 (Figure 3).

Figure 3: Listing the bindings for IPv4

We can see here, that the adapter is listed at the top of the binding order for IPv4 which is causing the problem for some applications. We need to move the adapter down in the binding order so we will use the following command to accomplish that –

C:\nvspbind /- “local area connection* 9” ms_tcpip (Figure 4).

Figure 4: Moving the adapter down in the binding order for IPv4

Note: The adapter can be moved further down by using /-- if desire.

Once the adapter has been positioned correctly in the binding order, the application can be tested to see if it now works as desired.

To further highlight the effect of this utility, we can inspect the registry. First, we need to locate some information for the Microsoft Failover Cluster Virtual Adapter. Navigating to the following registry key (Figure 5), and locate the adapter –

HKEY_LOCAL_MACHINE\SYSTEM\CurrenControlSet\Class\{4D36E972-11CE-BFC1-08002BE10318}

Figure 5: Microsoft Failover Cluster Virtual Adapter NetCfgInstanceId

The same information shown in Figure 5 is also displayed in Figure 2.

With the information in hand, navigate to the following registry key (Figure 6) to verify the adapter is no longer listed at the top of the binding order.

Figure 6: HKLM\SYSTEM\CurrentControlSet\services\Tcpip\Linkage

That’s about it. Thanks for your time and, as always, we hope the information here has been useful to you.

Chuck Timon
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

The purpose of this article is to explain a troubleshooting methodology with Hyper-V that goes a little deeper than the Event Logs. I’m creating a simple problem, starting a Virtual Machine with a corrupt .vhd file, to show how to proceed. I’m simply starting my VM, z2008_VS1 from the Hyper-V Manager. The following error pops up (showing “Details”) here.

The first place we should check is the Event Viewer logs under Microsoft, Windows, Hyper-V. Note the many areas logged.

In the Admin folder, this error appears.

In text format the error is:

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          4/5/2010 5:47:03 PM
Event ID:      12010
Task Category: None
Level:         Error
Keywords:
User:          NETWORK SERVICE
Computer:      z2008.northamerica.corp.microsoft.com
Description:
'z2008_VS1' Microsoft Emulated IDE Controller (Instance ID {83F8638B-8DCA-4152-9EDA-2CA8B33039B4}): Failed to power on with Error 'The file or directory is corrupted and unreadable.' (0x80070570). (Virtual machine EC5C9061-A086-46A4-864C-AC3DAE1BD0FE)

In order to enable tracing open a command prompt and navigate to the directory: "%appdata%\Microsoft\Windows\Hyper-V\Client\1.0\" In other words, at a command prompt type "CD %appdata%\Microsoft\Windows\Hyper-V\Client\1.0\" (without the quotes) and this will open the correct directory. You can also use Start, Run, and enter "%appdata%\Microsoft\Windows\Hyper-V\Client\1.0\" (without the quotes) to open the directory in Explorer. There, create a file called "VMClientTrace.config" (from now on, assume without the quotes...). Use notepad or an xml editor to paste the following.

<?xml version="1.0" encoding="utf-8"?>
<configuration>
    <Microsoft.Virtualization.Client.TraceConfigurationOptions>
        <setting name="TraceTagFormat" type="System.Int32">
            <value>3</value>
        </setting>
        <setting name="BrowserTraceLevel" type="System.Int32">
            <value>6</value>
        </setting>
        <setting name="VMConnectTraceLevel" type="System.Int32">
            <value>6</value>
        </setting>
        <setting name="VHDInspectTraceLevel" type="System.Int32">
            <value>6</value>
        </setting>
    </Microsoft.Virtualization.Client.TraceConfigurationOptions>
</configuration>

You can adjust the trace levels (more on this later), but for most purposes the values used above should be just right. You’ll need to stop and restart the UI (close Hyper-V Manager and reopen it). And then re-produce your issue.

Note that the UI must be restarted, we do not support dynamically turning on tracing. The tracing files are written to %temp%. Again you can get to the correct location by "CD %TEMP%" at a command prompt or running /start /run "%TEMP%" The file will be named VMBrowser_Trace_20100404084318.log - the numbers in the filename will change to reflect the timestamp of when the UI was started. You can read the log in notepad or Word, the latter I fund useful to use large page sizes with very small type. If you search the log for "ERROR" you'll come across something like the following.

2010-04-04 08:43:32.606, ERROR [7] VMBrowser ServerNodeViewControl:WaitForTaskCompletion() 'z2008_VS1' failed to start.

Microsoft Emulated IDE Controller (Instance ID {83F8638B-8DCA-4152-9EDA-2CA8B33039B4}): Failed to power on with Error 'The file or directory is corrupted and unreadable.'

Failed to open attachment 'M:\#VMM.library\VHDs\z2008_VS1_Disk0.vhd'. Error: 'The file or directory is corrupted and unreadable.'

Failed to open attachment 'M:\#VMM.library\VHDs\z2008_VS1_Disk0.vhd'. Error: 'The file or directory is corrupted and unreadable.'
       at Microsoft.Virtualization.Client.Management.ThrowHelper.ThrowVirtualizationOperationFailedException(String errorMsg, String errorDescriptionMsg, VirtualizationOperation operation, Int64 errorCode, ErrorCodeMapper mapper, Exception innerException)
   at Microsoft.Virtualization.Client.Management.View.EndMethodReturnInternal(IVMTask task, VirtualizationOperation operation, Boolean affectedElementExpected)
   at Microsoft.Virtualization.Client.Management.VMComputerSystemView.EndSetState(IVMTask setStateTask)
   at Microsoft.Virtualization.Client.VMBrowser.ServerNodeViewControl.WaitForTaskCompletion(Object taskWaitObject)

If you’re saying to yourself that looks a lot like what we’ve already seen, keep in mind this is just to demonstrate how to enable UI Tracing. In a more serious situation (that couldn’t be resolved from the EV logs entries alone), a Microsoft Support Professional would be able to more definitively determine the issue. Microsoft may also provide you with a custom VMClientTrace.config file for the specific issue at hand.

Configuring Tracing Levels
The tracing configuration file provides options to control which traces are gathered and how they are formatted. The first tag is TraceTagFormat. A value of 1 instructs the tracing system to tag each message with a time stamp. Other configuration file tags specify what tracing level to use for each UI app. UI tracing supports six different levels of tracing. Tracing levels and their corresponding tag values are shown in the following table.

Each higher level includes all of the tracing from lower levels and additional information. For example, setting the value to 5 includes all exceptions, WMI calls, user actions, and WMI events, as well as general information.

Tag Value	Description
0	None
1	Exception (all caught exceptions will be traced)
2	WMI call (information about each WMI call made by the UI)
3	UserAction (each user action, such as the user is launching the New Virtual Machine Wizard)
4	Events (Turning on/off or receiving WMI events from the server)
5	Information (General information)
6	Verbose

Summary

This article is intended as a possible troubleshooting step, i.e. a “how to” as opposed to a specific solution.

In this case the vhd was intentionally corrupted by disconnecting storage while the virtual drive was being created. We’ve seen corruption for this and other reasons, including 3rd party encryption and anti-virus programs.

Author
Thomas E. Acker
Microsoft Corporation

What we will be covering in this blog, is a reason for an Access Denied error message in PowerShell and the options for resolving them. The particular Windows PowerShell cmdlet we will be referencing, is Get-WmiObject for accessing WMI object calls against a remote Windows Server 2008 Failover Cluster node.

Before continuing, we want you to note some differences of Windows PowerShell between Windows Server 2008 and Windows Server 2008 R2 operating systems.

1. In Windows Server 2008 by default, Windows PowerShell v1.0 is not installed. It needs to be installed from the Features Wizard in Server Manager.

2. In Windows Server 2008 R2 by default, Windows PowerShell v2.0 is already installed.

NOTE: When you go to the Feature Wizard in Server Manager as shown, don’t confuse Windows PowerShell Integrated Scripting Environment (ISE) with Windows PowerShell v2.0. This new ISE is an add-on host application for Windows PowerShell that enables you to write, run, and test scripts and modules in a friendly environment. Key features such as syntax-coloring, tab completion, visual debugging, Unicode-compliance, and context-sensitive Help provide a rich scripting experience.

Now that we have the operating system differences out of the way, let's move on to the Get-WMIObject calls made between the two versions of Windows PowerShell. This cmdlet has a convenient alias, gwmi, which I’ll use for most of my examples. The local node name we will be referring to in these examples will be W2K8-CLUSNODE1 and the remote node name will be W2K8-CLUSNODE2.

Whenever you try executing the following syntax from Windows PowerShell v1.0 on Windows Server 2008 Failover Cluster, you will experience an Access denied error message when calling the the specific Get-WMIObject query to a remote Failover Cluster node:

gmwi mscluster_resourcegroup -computer W2K8-CLUSNODE2 -namespace ROOT\MSCluster

gwmi mscluster_resourcegroup -computer W2K8-CLUSNODE1 -namespace ROOT\MSCluster

gwmi mscluster_resourcegroup -namespace ROOT\MSCluster

There are also other Get-WMIObject queries (non Failover Cluster specific) that work just fine when executing them against a remote node providing the proper credentials, such as the following syntax used below:

qwmi win32_service -credential domain\user_name -computer W2K8-CLUSNODE2

Then a dialog box will appear, so that credentials can be passed along with enough elevated privileges to query the Get-WMIObject calls. Such as the example provided below, which displays a list of services running on the remote node.

Now you may ask yourself, why are we getting Access Denied with some Get-WMIObject queries and not with others? The specific reason Get-WMIObject queries fail against a remote node of the Failover Cluster using Windows PowerShell v1.0, is because it requires Authentication of Packet Privacy that cannot be passed along in version 1.0.

This Authentication of Packet Privacy is part of a well-defined set of common API calls of the Microsoft® Security Support Provider Interface (SSPI), which is used for obtaining integrated security services for authentication, message integrity, message privacy, and security quality of service for any distributed application protocol.

This was built in to Windows PowerShell v2.0 that ship with Windows Server 2008 R2. The switches that need to be passed, are the -Authentication and -Impersonation using the following syntax as an example:

GWMI MSCluster_ResourceGroup -Authentication PacketPrivacy -Impersonation Impersonate -Computer W2K8-CLUSNODE2 -Namespace ROOT\MSCluster

NOTE: In order for the new switches to work and be able to connect to a remote machine, the
-Impersonation must be set to Impersonate (or higher) and the -Authentication level must be set to PacketPrivacy (or higher).

These same levels are the same ones used in WBEMTest tool. For more information on using WBEMTest tool, go reference the Using WBEMTest user interface link.

A simple get-help get-wmiobject will provide you the exact syntax and options you need as listed below.

Syntax

Get-WmiObject [-List] [-EnableAllPrivileges] [-Locale [<string>]] [-Authority [<string>]] [-Amended] [-Authentication {Default | None | Connect | Call | Packet | PacketIntegrity | PacketPrivacy | Unchanged}] [-ComputerName [<string[]>]] [-Namespace [<string>]] [-Impersonation {Default | Anonymous | Identify | Impersonate | Delegate}] [-Credential [<PSCredential>]] [<CommonParameters>]

Get-WmiObject [-Filter [<string>]] [-Class] [<string>] [[-Property] [<string[]>]] [-EnableAllPrivileges] [-Locale [<string>]] [-Authority [<string>]] [-Amended] [-Authentication {Default | None | Connect | Call | Packet | PacketIntegrity | PacketPrivacy | Unchanged}] [-ComputerName [<string[]>]] [-Namespace [<string>]] [-Impersonation {Default | Anonymous | Identify | Impersonate | Delegate}] [-Credential [<PSCredential>]] [<CommonParameters>]

Get-WmiObject [-DirectRead] -Query [<string>] [-EnableAllPrivileges] [-Locale [<string>]] [-Authority [<string>]] [-Amended] [-Authentication {Default | None | Connect | Call | Packet | PacketIntegrity | PacketPrivacy | Unchanged}] [-ComputerName [<string[]>]] [-Namespace [<string>]] [-Impersonation {Default | Anonymous | Identify | Impersonate | Delegate}] [-Credential [<PSCredential>]] [<CommonParameters>]

There are a couple of options to get around the Access Denied error message when using Get-WMIObject in Windows PowerShell:

1. Our strongest recommendation would be, upgrade the operating system to Windows Server 2008 R2 for a lot of reasons. But to name a few, read the following references “Top 10 Reasons to Upgrade to Windows Server 2008 R2” and “What's New in Windows Server 2008 R2”.

2. An alternative option if you have a compelling reason not to upgrade to Windows Server 2008 R2, would be to download and install the latest Windows Management Framework as outline in article: 968930.
968930 Windows Management Framework package (Windows PowerShell 2.0 and WinRM 2.0)
NOTE: You need to first uninstall Windows PowerShell v1.0 before installing Windows Management Framework.

3. Another alternative you can try, would be using the WBEMTEST tool to the particular namespace. Then provide the level accesses and entering the credentials.

Author:
Mike Rosado
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

As discussed in previous blogs and articles, there is no longer a Cluster Service account in Failover Clustering. However, there are still some rights needed in Active Directory. The rights of the logged on user and the Cluster Name Object are part of these rights. This blog is only going to cover the rights of the logged on user.

Cluster Validation is a subset of tests that are run to verify the Failover Cluster is going to both be supported or if there are configuration issues that need to be corrected. For the purposes of this blog, I want to discuss the single test of “Validate Active Directory Configuration” under System Configuration. What this is going to do is validate the logged on user can create accounts in Active Directory. When creating a Failover Cluster, it is going to use the current logged on user to create the Cluster Name Object (CNO). Therefore, it must have the rights to do it. If it does not, you will see the below in the Validation Report.

Validate Active Directory Configuration

Validate that all the nodes have the same domain, domain role, and organizational unit.

The user running validate, does not have permissions to create computer objects in the “x” domain.

To successfully create a cluster either, the installer must have the privileges needed to create computer objects in the default container for computers, or a computer object must be pre-created by a domain administrator.

The user creating the cluster requires the 'Create Computer Object' permission on the container where computer objects are created in the domain. If the default container has been modified, then this privilege will need to be granted to the user for the new container.

If a pre-existing computer object is used, please ensure that the computer object is in a Disabled state and that the user creating the cluster has 'Full Control' permission to that computer object using the Active Directory Users and Computers tool prior to creating the cluster.

If you were to not run Validation and try to create a Failover Cluster, you would receive this error if the account does not have the proper permissions.

As we discussed, we are using the logged on user to create the computer object. So we must look at the rights that the logged on user has. As per our documentation on the rights needed, we say this.

Steps for configuring the account for the person who installs the cluster

Membership in the Domain Admins group, or equivalent, is the minimum required to complete this procedure. In addition, your account must be in the local Administrators group on all of the servers that will be nodes in the failover cluster.

Now, there are numerous organizations that do not have Domain Administrators create Failover Clusters. So the question becomes, exactly what is the “equivalent” rights that are needed for this user. Below are the rights that are needed in the OU.

o Create Computer Objects

o Read All Properties

With the above rights, Cluster Validation will pass and the Cluster object can be created.

Author:

John Marlin
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Technorati Tags: Failover Cluster

I am here today to discuss the troubleshooting switches used to start a Windows 2008 and 2008 R2 Failover Cluster. From time to time, the Failover Cluster Service will not start on its own. You need to start it with a diagnostic switch for troubleshooting purposes and/or to get it back to production.

In Windows 2003 Server Cluster, we had the following switches:

More detailed information on the above switches can be found in KB258078. However, the above switches have changed for Windows 2008 and 2008R2 Failover Clusters. The only switch that is available for Windows Server 2008 Failover Cluster is the FORCEQUORUM (or FQ for abbreviation) switch. The behavior differs from the FORCEQUORUM (or FO abbreviation) that was used previously in Windows Server 2003.

So for our example, let’s say we a 2-node Failover Cluster that is set for Node and Disk Majority. That means that we have a total of three votes. To achieve “quorum”, it needs a majority of votes (two) for fully bring all resources online and make it available to users.

In Windows 2008 Failover Cluster, when you tell the Cluster Service to start, it just immediately starts. The next thing it does is send out notifications to all the nodes that it wants to join a Cluster. It is also going to calculate the number of votes needed to achieve “quorum”. As long as there is another node running or it can bring the Witness Disk online, it will join and merrily go on its way. If there is not another node up and it cannot bring the Witness Disk online, the Cluster Service will start; however, it will be in a “joining” type mode. This means it will be sitting idle waiting for another node to join and achieve “quorum”. If this is the case, you would see something like this:

As discussed, we need at least 2 votes to achieve “quorum”. We currently have one node up, so we have one vote. The other node is down and the Witness Disk is unavailable which would account for the other two votes. But you can see that the Cluster Service itself is started. The reason it stays started is that is sitting there just listening for another node to join and give it a majority. Once it does, the Cluster resources will be made available for everyone to use. If you were to run the command to get the state of the nodes, you would see this:

This is where the FORCEQUORUM switch comes into play. When using this, it will force the Cluster to become available even though there is no “quorum”. There are multiple ways of forcing the Cluster Service to start. However, please keep in mind that there are some implications when running this. The implications are explained in this article.

     1. Go into Service Control Manager and start the Cluster Service with /FORCEQUORUM (or /FQ)
     2. Go to an Administrative Command Prompt and use:
          a. net start clussvc /forcequorum
          b. net start clussvc /fq
     3. In Failover Cluster Management, highlight the name of the Cluster in the left pane, and
          on the far right pane in the Actions column, there is a FORCE CLUSTER START option that
         you can select shown below.

This switch differs from Windows 2003. When you use it on Windows 2003 Server Clusters, you must also specify all other nodes that will be joining while in this state. If I was to just use the commands above and not specify the additional nodes, the other nodes will not be allowed to join the Cluster. I would need to basically fix the problem of the other nodes not being up, then stop the Cluster Service and start it again without the switch. This causes downtime and no one wants that. In Windows 2008 Failover Cluster, the switch will remain in effect until “quorum” is achieved. All you would need to do is start the other node Cluster Service and it will join. Once “quorum” is achieved, mode of the Cluster dynamically changes.

In Windows Server 2008 R2 Failover Cluster, there is the same FORCEQUORUM (or FO) switch as well as a new switch.

This new switch is /IPS or /IgnorePersistentState. This switch is a little different in what it does. What this switch does is to start the Cluster Service as well as make the resources available; but, all groups and resources will be in an offline state.

Under normal circumstances, when the Cluster Service starts, the default behavior is to bring all the resources online. What this switch does is ignore the current PersistentState value of the resources and leave everything offline. When you go into Failover Cluster Management and look at the groups, you will see all resources offline.

I do need to bring up a couple of important notes about this switch.

1. The Cluster Group will still be brought online. This switch will only affect the Services
and Applications groups that you have in the Cluster.

2. You must still be able to achieve “quorum.” In the case of a Node and Disk Majority,
the Witness Disk must still be able to come online.

This switch is not one that would be used that often, but when you need it, it is a blessing. Here are a couple of scenarios where the /IPS switch would come in handy.

SCENARIO 1

I have a Failover Cluster that held the limit of 1000 Hyper-V Virtual Machines. If you are trying to troubleshoot an issue, you can use the switch and then manually bring online only a couple of them. Do whatever troubleshooting you need to accomplish without the stress that all these machines coming online would put on the node. Once your troubleshooting is complete, you can then start the other nodes, bring the other virtual machines online, go about your business, etc.

SCENARIO 2

I am the administrator of the Failover Cluster and get called that my Cluster node that holds the John’s Cluster Application resource is in a pseudo hung state. Both Explorer and Failover Cluster Management hang up while the rest of the machine is real slow. If I try and move this group over to another node, that node experiences the same problems and errors. So I reboot them and when the Cluster Service starts, the machine goes into this pseudo hung state again. Looking through the event logs, I see that the Cluster Service starts fine. But I do see that John’s Cluster Application is throwing errors in the event log and those were the last things listed. I do some research on the errors and see that it is caused by a log file this application uses as being corrupt. All I have to do is delete this file and the application will dynamically recreate the file, start fine, and no longer hang the machine. That seems simple enough. But wait, I do not have access to the Clustered Drive that this application is on as Explorer hangs and I also cannot get to it from a command prompt.

In the days before Windows 2008 R2 Failover Cluster, I would have to:

Power off all other nodes.
Set the Cluster Service to MANUAL or DISABLED
Disable the Cluster Disk Driver
Reboot this machine
Delete the file
Re-enable the Cluster Disk Driver
Set the Cluster Service to AUTOMATIC and start it
Power up all other nodes

The above was the only way I was going to be able to get access to the drives. Something like this can be painful and time consuming. If the nodes take about 15 minutes to boot because of the devices and the memory, it just adds to the frustrations.

This is where the /IPS Switch comes in. Your steps would now be:

Stop the Cluster Service on all other nodes
Reboot this one node since it is hung
While that node is rebooting, on the other node, start the Cluster Service with the IPS Switch:

Net start clussvc /ips

Go to the group that has the disk
Bring the disk online
Delete the file
Bring the rest of the group online

For those who like to see stuff on MSDN, you can get a little more information on the /IPS switch here.

So as a recap, these are the only switches available for Windows Server 2008 and 2008 R2 Failover Clusters.

The switches can make things easier, less frustrating, and causes less downtime. This can mean production/dollars lost are more at a minimum and that makes everyone happy.

Author:

John Marlin
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Technorati Tags: Failover Cluster,ForceQuorum,IgnorePersistentState

Windows Server 2008 and 2008 R2 has a validation test that needs to be run against all nodes confirm that your hardware and settings are compatible with Failover Clustering. For the purpose of this blog, I am not going to go into all the tests and what they do. For more information on Cluster Validation and the tests it runs, you can go here.

We have started seeing an error with the Failover Clustering Validation Report in regards to the “List Fibre Channel Host Bus Adapters” test. The error we have seen will appear similar to this:

The test we are making is a WMI query to MSFC_FCAdapterHBAAttributes for the following information:.

     Manufacturer
     HBAStatus
     VendorSpecificID
     NumberOfPorts
     SerialNumber
     Model
     ModelDescription
     HardwareVersion
     DriverVersion
     DriverName

This call is being made to the fiber adapter driver on the systems. If the driver does not respond or respond with the information requested, it will fail with the above Validation error.

This does not mean there is a problem with the Cluster Validation process or your system. All it means is that driver did not respond with the information we requested. The driver and adapter may very well work just fine with Failover Clustering. In most all the cases where we have seen this error, the Storage Validation tests pass with no warnings or errors.

I have created a script that you can run from a command prompt to see exactly what is returned. You can create a text file with the below information.

Set oWbemServices = GetObject("winmgmts:{impersonationLevel=impersonate}!root/wmi")

Set enumAdapter = oWbemServices.InstancesOf("MSFC_FCAdapterHBAAttributes")

   For Each eAdapter in enumAdapter
         Wscript.Echo "Manufacturer : " & eAdapter.Manufacturer
         Wscript.Echo "HBAStatus : " & eAdapter.HBAStatus
         Wscript.Echo "VendorSpecificID : " & eAdapter.VendorSpecificID
         Wscript.Echo "NumberOfPorts : " & eAdapter.NumberOfPorts
         Wscript.Echo "SerialNumber : " & eAdapter.SerialNumber
         Wscript.Echo "Model : " & eAdapter.Model
         Wscript.Echo "ModelDescription : " & eAdapter.ModelDescription
         Wscript.Echo "HardwareVersion : " & eAdapter.HardwareVersion
         Wscript.Echo "DriverVersion : " & eAdapter.DriverVersion
         Wscript.Echo "DriverName : " & eAdapter.DriverName
    Next

WScript.Echo "Done"

As an example, you can save the file as FCATTRIBUTES.VBS. You would then need to run an administrative command prompt and run the CSCRIPT.EXE command from the directory you created the file in.

C:\TEST>cscript fcattributes.vbs

The output will be on the screen using the above. What should be normally returned when there are no errors would be similar to the below.

     Manufacturer : John Marlin Company
     HBAStatus : 0
     VendorSpecificID : 583882643
     NumberOfPorts : 1
     SerialNumber : J15109
     Model : JRM6950
     ModelDescription : Marlin JRM6950 Fibre Channel Adapter
     HardwareVersion : 7750206A
     DriverVersion : 16.1.0.63
     DriverName : marlinhba.sys

If the data returned is only, as an example, the HBAStatus, then you will receive the Cluster Validation error. As mentioned, this is a WMI Query to the driver itself. If it is not returning the information, you should consult with the vendor of the adapter driver for assistance.

This error is not something that Microsoft can help with from a troubleshooting perspective. It is possible the driver may be an older version and simply needs an update. The vendor of the adapter should be able to assist. If it is an older driver, keep in mind that an update may also require a firmware update as well. So keep this in mind when speaking with them.

Author:

John Marlin
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Technorati Tags: failover cluster,Fibre Channel Host Bus Adapter

Windows Server 2008 Failover Clusters: Networking (Part 1)

Cannot Save Recovery Information for Bitlocker in Windows 7

Understanding the 2 TB Limit in Windows Storage

Windows Server 2008 R2: No Recording Tab for CD/DVD burner

Windows Server 2008 Failover Clusters: Networking (Part 2)

Windows Server 2008 Failover Clusters: Networking (Part 3)

When should I evict a cluster node?

How To: Customize the Windows 7 Start Menu and Taskbar Using unattend.xml

Hyper-V Snapshots: Suggestions for Success

Do not use Snapshots as a backup strategy

Manually interacting with Snapshot files

The “Merge Rule”

Change Control

How to rename your snapshots

Best Practices

Access Denied Error 0x80070005 message when initializing TPM for Bitlocker

How to Disable BitLocker Drive Encryption Fixed Data Drive Read-Only Policy Using GPO

Cluster Validation Storage Test ‘List All Disks’ Fails with Status 87

How to backup recovery information in AD after Bitlocker is turned ON in Windows 7

Understanding the Cluster Debug Log in 2008

How to Generate a Cluster Log

Failover Cluster Tracing Session

How Large Should I Set My Cluster Log Size?

Why Should I Care About All This?

How To Change the Cluster Log Size

How to Change the Detail Verbosity of the Cluster Log

Windows Server 2008 Failover Clusters: Networking (Part 4)

Hyper-V Client Tracing – Tracing the User Interface (“UI”)

Reason for Access Denied error in PowerShell using Get-WMIObject calls to a remote Windows Server 2008 Failover Cluster node

Rights needed for user account to create a Cluster Name Object (CNO) on Windows Server 2008 R2 Failover Cluster

Steps for configuring the account for the person who installs the cluster

Windows Server 2008 and 2008R2 Failover Cluster Startup Switches

Cluster Validation Error with Fiber Adapter