What is this?

This is basically where I write down stuff that I work with at my job as a GIS Technical Analyst (previously system administrator). I do it because it's practical for documentation purposes (although, I remove stuff that might be a security breach) and I hope it can be of use to someone out there. I frequently search the net for help myself, and this is my way of contributing.

Wednesday, January 19, 2011

VMWare ESXi 4.1 NICs stops responding when copying database in SQL Server 2008 r2

Had a strange issue on some VMWare ESXi 4.1 (build 260247) hosts running on a HP C7000 Blade encenter here today.

Blades:
3 BL460c G1 (Broadcom NC373i and NC382m NICs)
3 BL460c G6 (with Broadcom NetXtreme II 57711E/NC532i and NC326m NICs)
Running in EVC mode: "Inten Xeon 45nm Core2"

Interconnect bays:
2 gbe2c
2 gbe2c layer2/layer3

The problem happened on a clean install of Windows Server 2008 r2 with SQL Server 2008 r2. When I tried to copy a database in SQL Server Management Studio using the Wizard I got to the part where I choose "destination" and selected the network tab. This is when it's supposed to start broadcasting the network to find other SQL Servers. In my case I just get thrown out of RDP and VMWare remote console. The issue could be reproduced every time.

At first I thought the VM crashed, but it was still running (latest VM Tools jftr). What happened was that the interfaces on the esxi host that were connected to the two gbe2c (not the gbe2c layer2/layer3 which is a newer model) interconnect bay switches died. I could not ping the ESXi host, and cannot could not rescue the VMs since I was not able to connect to the Management network interface. The only way to get back in touch with the ESXi host was to cold boot the blade (from HP Onboard Administrator).

This only happened if the VM was running on a ESXi host on a BL460c G6 blade, it did NOT happen on the BL460c G1 blades. Therefore I updated firmware on the BL460c G6 blade and the gbe2c switches, but the problem persisted.

/var/log/messages showed a lot of Broadcom-related kernel panic messages, so I looked around for new drivers for the NetXtreme II NICs which I found here:

http://downloads.vmware.com/d/details/esx41_broadcom_netextremeii_dt/ZHcqYnRlaHRiZCVodw

Installing the 1.60.50 drivers on my BL460c G6 blades did the trick!

1 comment:

  1. Awesome. I experienced pretty much exactly the same scenario, but I was using a SUSE VM.

    ReplyDelete