Thursday, July 29, 2010

The Overheating EVA

I had a "interesting" experience recently, namely had a EVA 4400 overheating due to environmental issues (fancy-talk for Aircon failure).  The client phoned me, complaining that half of their Hyper-V VM's are not running.  Further investigation revealed that the CSV's were offline.  Hmmm, this was getting serious.  I logged into Command View and saw that most of my VDisks were faulted, this was due in no small matter to the fact that all the drives in one of my shelves were faulted.

Event Logs

I had a look at the relevant EVA logs and discovered the following relevant entries:
  •  Temperature within an HSV300 controller becoming too hot.
    View corrective actions.  Corrective action code: 2e
  •  A drive enclosure temperature sensor out of range condition has been reported by one of the drive enclosure link modules.
  • A physical disk drive has disappeared.
    View corrective actions.  Corrective action code: 42
  • A Volume has transitioned to the MISSING state.
    View corrective actions.  Corrective action code: bf
What Happened

In retrospect it was a fairly simple sequence of events, as evidenced by the entries above.  The Air Conditioner failed, which caused the temperature within the Drive Shelf to rise (this is the HSV300 controller referred to in the event log).  To prevent damage to itself, the drive then switched itself off, which prompted the log entry about the physical drive disappearing.

We then started seeing volumes transitioning to the missing state, i.e. our VDisks went missing.  Hardly surprising considering that the drives containing them switched themselves off.

Resolution

  1. Restored Air Conditioning (goes without saying I guess)
  2. Powered off the EVA and all attached disk shelves
  3. Powered on disk shelves and waited for the Numeric ID LED's at the back to display the proper IDs.
  4. Powered up the Controller
  5. Lo and behold!  All the previously failed physical disks came on-line, meaning that my missing VDisks also made a most welcome return
  6. Unfortunately my Hyper-V Hosts still couldn't access the Vdisks, so I had to unpresent and re-present them via Command View.  I assume the EVA assigned new WWN's to the LUNs.
  7. I re-scanned for storage from the Disk Management MMC on the Hyper-V Hosts
  8. Brought the Disks and CSV's online via cluster manager
  9. Started up the VM's
Conclusion

This was quite a harrowing experience, obviously.  What struck me as ridiculous is that HP does not have *ANY* thermal shutdown logic / capabilities on the EVA controller itself.  It keeps on trucking till the drives themselves fail, causing a very ungraceful failure of the VDisks.  There is also no guarantee that your drives and VDisks will come back online.  In essence - if your EVA overheats there is a distinct possibility that you lose your Data.  Caveat Emptor...








Friday, July 2, 2010

Move WSUS server to a new server

Namibia is a third world....errr....developing country.  So apart from rampant (55%) unemployment it also means that we are bandwidth starved.  This also means that some businesses are running on capped accounts, so attempts need to be made to conserve bandwidth.  I've had to move WSUS servers to new hardware or VM's a couple of times, and needless to say it's a huge time-sink and waste of bandwidth to re-download all updates every time you move your WSUS server.  Thus I've come up with a way to move WSUS without downloading tons of patches - steps are outlined below for your enjoyment.

  1. Install WSUS on your new server, making sure to select the option to use the existing Windows Internal Database
  2. During the Choose Upstream Server of the configuration wizard, be sure to select Synchronise from another Windows Server Update Services Server
  3. Ensure that the This is a replica of the upstream server check box is selected.  This ensures that existing approvals, settings, computers and groups are maintained
  4. Complete this Wizard
  5. Your new replica server will synchronise with your upstream server.  This is what we're talking about - no re-downloading many GB's worth of patches!  N.B.  Wait for this process to complete before carrying on with step 6.
  6. Now change your Update Source and Proxy Server settings to Synchronise from Microsoft Update
  7. Now for the magic bit.  Download and install the WSUS 3 API Samples and Tools on both your old and new WSUS servers
  8. Open up a CMD Prompt and navigate to "C:\Program Files\Update Services 3.0 API Samples and Tools\WsusMigrate\WsusMigrationExport" folder on your old WSUS server
  9. Run "wsusmigrationexport.exe WSUS_Settings.xml" to export the settings. This will backup your approvals and target groups to an XML file
  10. Transfer the WSUS_Settings.xml created above to your new WSUS server
  11. Again navigate to "C:\Program Files\Update Services 3.0 API Samples and Tools\WsusMigrate\WsusMigrationImport" folder (on the new WSUS  server). Run "wsusmigrationimport.exe WSUS_Settings.xml All None"
  12. Review and compare settings on your two WSUS servers, ensuring that they match each other
  13. Update the relevant GPO's to ensure clients are pointing to the new WSUS server
One last point, I've noticed that if you are not using GPO's to assign your computers to Computer Groups all your clients will get stuck into the Unassigned Computers group.  You'll have to manually sort them into the appropriate groups again.  Bummer, but you really should be using Computer Groups....