Quantcast
Viewing all articles
Browse latest Browse all 270

Understanding DiskRunChkdsk in 2008?

My name is Sean Dwyer and I am a Support Escalation Engineer with the Microsoft CORE team.

 

I’d like to share a quick tip for handling 2008 Windows Server Cluster admins.

 

There may come a time, for whatever reason, that a cluster managed volume is flagged as dirty and you will see an event ID message indicating that CHKDSK needs to run against the volume.

 

In a best case scenario, you can take the volume out of production, run CHKDSK on the volume if needed (refer to: http://technet.microsoft.com/en-us/library/cc772587.aspx, and then put the volume back into production.

 

In most situations though, the volume that needs attention is a heavily utilized production volume and it will be extremely disruptive to have the volume offline for any length of time.

 

For example, a recent case I was involved with had a 14Tb* (see note 1 below) volume that was being flagged for CHKDSK to run on it about once a month. The volume had about 9tb of data on it. Apart from the concern of why the volume was continually being flagged as corrupt, the length of time that CHKDSK took to run on the volume was extremely painful for the customer’s business. When it ran initially, it took roughly 80 hours to complete a run on the volume.

 

It may be necessary to temporarily configure a problem volume to block CHKDSK from running against it while troubleshooting continues to determine why the volume is being flagged for CHKDSK to run.

 

I stress the word temporary here.

Turning off the health monitoring tool for the file system as a permanent solution will only lead to more downtime in the future, and you may end up on the phone with one of the File Systems experts on my team, such as Robert Mitchell.

 

Ok – so let’s talk specifics about temporarily blocking CHKDSK from doing work on a Cluster volume.

 

Say we’ve determined that we need to suspend CHKDSK from running on a problem volume. For you old school Cluster admins, the first command that probably jumps to mind is SKIPCHKDSK=1.

 

This works just fine for 2003 Clusters, but will NOT work for 2008.

 

If SKIPCHKDSK is used for a 2008 volume, it will be ignored when the disk is next brought online and CHKDSK will run against the volume. In a situation where the volume is 18tb, the volume will remain unavailable for use until CHKDSK finishes* (See note 2 below).

 

The correct way to configure a volume to block CHKDSK from running on it, is to use the DiskRunChkdsk switch.

Keep in mind that these two switches we’re discussing only apply to the Cluster environment.

If the machine is restarted, the OS will prompt for CHKDSK to run on the affected volumes.

 

For information on how to configure the OS to ignore the dirty bit, refer to:

158675  How to Cancel CHKDSK After It Has Been Scheduled

http://support.microsoft.com/default.aspx?scid=kb;EN-US;158675

 

Let’s walk through an example of setting this Cluster specific switch configured for a volume to give you a better idea how to do it should you need to one day.

 

Step 1: Determine which disk to work with

Image may be NSFW.
Clik here to view.
clip_image001

(I’ll pick Disk 8 for this example)

 

Step 2: Determine the resource name as seen by Cluster

Image may be NSFW.
Clik here to view.
clip_image002

 

Step 3: Open an Admin command prompt and run the command

Image may be NSFW.
Clik here to view.
clip_image004

Note: For the setting to WORK, the disk must be brought offline, and then online.

 

Step 4: Bring the disk offline, then online again.

Image may be NSFW.
Clik here to view.
clip_image005
 Image may be NSFW.
Clik here to view.
clip_image002[1]

 

 

Step 5: Verify the setting is applied

Image may be NSFW.
Clik here to view.
clip_image006

 

Step 6: Actively start troubleshooting what could cause the volume to end up flagged dirty and needing CHKDSK.

 

Footnotes:

Note 1: It’s not suggested to run with volumes this large. In my experience once they exceed 2tb in size, they rapidly become an administrative liability, especially in a situation where CHKDSK has to run against the volume. We strongly suggest that mount points be used to carve up larger volumes like this, into more administratively friendly chunks. Chkdsk runs against mount points just fine, too.

Note 2: While it’s not suggested to interrupt CHKDSK while it’s running, an admin is not locked into having to let CHKDSK finish once it starts. The process can be terminated if absolutely required.

However, we cannot guarantee that the end result will be positive. If the process is interrupted during the “magic moment” when CHKDSK is making changes, the results may be worse than the initial reason for the volume being flagged as corrupt.

 

Additional reading material related to the components and tools mentioned in this post:

300415  A Description of the Diskpart Command-Line Utility

http://support.microsoft.com/default.aspx?scid=kb;EN-US;300415

947021  How to configure volume mount points on a server cluster in Windows Server 2008 http://support.microsoft.com/default.aspx?scid=kb;EN-US;947021

The shared disk on Windows Server 2008 cluster fails to come online

http://support.microsoft.com/default.aspx?scid=kb;en-US;2517696

FSUTIL utility; marking a volume dirty for testing

http://technet.microsoft.com/en-us/library/bb490641.aspx

 

In summary; try to keep your production volumes’ size under control, be aware that command line switches may not persist through all versions of a product, and continue being successful with Windows Server 2008!

 

I hope this post has been helpful!

 

Sean Dwyer

Support Escalation Engineer

Windows CORE Team

Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 270

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>