Friday, February 11, 2011

VMWare ESXi back to life

So after my computer (which I use as a VMware ESXi virtualization host) resuscitated from the dead and started booting up I got the following messages:

Failed to load tpm
Failed to load lvmdriver

TPM apparently stand for Trusted Platform Module and since my new motherboard coincidentally had a header for a tpm module for a moment I considered buying one. Turned out it wasn't necessary because the tpm is not important for ESXi.

But lvmdriver is; lvmdriver means that it could not find a compatible network card and this stops completely the boot up process. ESX and ESXi are targeted for high end servers and so VMware has not added support for desktop NICs.

Solutions
There are two solutions to this problem, the easiest one and probably the best one is to get a compatible Intel card. Apparently Intel makes the best cards and most are fully compatible with ESXi. Mine is on the way.

The long way home
The other solution is to use a third party driver for your card. I tried this solution first.
My new motherboard came with an onboard Realtek 8111C NIC. So now I had to find a custom driver for it and install it in my unbootable ESXi installation without destroying it in the process. Bear in mind that my Unix/Linux skills are not that great.

Google is your friend, with it you can find practically anything in the Internet and if there is something you don't understand then Wikipedia is your other friend. With those friends I found this forum http://www.vm-help.com/forum/ where they specialize in this sort of thing. Turns out that you can take a Linux driver compatible with your NIC make some changes to adapt it to ESXi, compile it, copy some files here and there and you're good to go. Easy.

Since I cannot do that I used some precompiled and prepackaged solutions posted in the same forum. It basically involved booting up the computer (with Puppy Linux from a pen drive in my case) and replacing a file called oem.tgz in one of the partitions.
The first time I tried it I got a Pink Screen of Death which was something new to me (different from the BSOD in this one you can still interact with a debugger).
So I just got a different oem.tgz (thank you geppi) and this time success!

vSphere Client
At least partial success: ESXi booted up and it worked OK but since there was a change in hardware the VMs would not start right away. For each VM,  ESXi needed me to answer a pending question: did you move or did you copy the VM?. Until I could answer that question the VMs would not start.

It had been a while since I had used the vSphere client and by then it was broken. Apparently an update in .NET made the previous version inoperative. I was getting this error message:

Error parsing the server "IP" "clients.xml" file

The solution was to download the version 4.0U1 of the vSphere client. For some reason getting a direct link to this new version is not possible. The way the VMware website is constructed it's very difficult to find the link. When you seem to be getting closer it keeps eluding you.

Finally I realized you had to login with your user and get a license key to go to the download pages and there you can find a link to the client. Why is a much required update hidden so deep into web bureaucracy is a mystery to me.

Changing the virtual NIC
Before starting the VMs, and following advice found in vm-help/forum, I replaced the Virtual Network Card in each of the Windows VMs from type Flexible to VMXNET3. Without this change Remote Desktop Connection to the VM would be completely unusable. Also I was experiencing great instability with NeoRouter (great tool BTW). vSphere client worked well though.

Unfortunately, even after those changes, RDC proved to be still very unstable (much better than before but still not good, specially when browsing the web) so that's why I eventually decided to go for solution one (see above).

UPDATE: Afterwards I disabled the UDP & TCP Checksum Offload (IPv4) options in the Advanced properties of the vmxnet3 Ethernet Adapter in the Windows VMs, this seemed to fix all the stability issues.




Upgrading to ESXi 4.1
Since I was doing all this tinkering with ESXi I decided to also upgrade from 4.0 to 4.1.

- First I downloaded the upgrade-from-ESXi4.0-to-4.1.0-0.0.260247-release.zip package.
- Then I downloaded and installed the latest version of vSphere Host Update Utility for ESXi 4.0.
 I clicked on the button marked Upgrade Host, boom! got this fine message:

Failed to read the upgrade package metadata: Could not find file 'metadata.xml'

Back to google, turns out the Update Utility doesn't work for this upgrade.
I had to download the vSphere CLI which is based on Perl. This opens a command prompt (in Windows).
According to the upgrade guide I have to issue the following commands:

vihostupdate --server <hostname or IP> -i -b "location of upgrade-from-ESXi4.0-to-4.1.0-0.0.260247-release.zip"

vihostupdate --server <hostname or IP> -i -b "location of upgrade-from-ESXi4.0-to-4.1.0-0.0.260247-release.zip" -B ESXi410-GA-esxupdate

Except that doesn't work either for two reasons, first the command prompt is not in the correct directory so first we need to go there:
cd bin
Second in Windows the commands need to have .pl appended:

vihostupdate.pl --server <hostname or IP> --username <username> --password <password> -i -b "location of upgrade-from-ESXi4.0-to-4.1.0-0.0.260247-release.zip"

vihostupdate.pl --server <hostname or IP> --username <username> --password <password> -i -b "location of upgrade-from-ESXi4.0-to-4.1.0-0.0.260247-release.zip" -B ESXi410-GA-esxupdate

After that reboot the ESXi host and install the new vSphere Client..

Console
If you had the console activated in 4.0 upgrading to 4.1 will deactivate it. In 4.0 it used to be a hidden feature, in 4.1 it's called Tech Support Mode and you can activate it for local and remote connection through ssh. Fortunately it's very easy to turn it on.
Using the vSphere Client go to: Configuration  >Security Profile > Properties
click on Local Tech Support and/or Remote Tech Support,
click on Options, click on Start,
select Start and Stop with Host (Start Automatically probably works too).

Passthrough
One good news with the new motherboard is that it has support for VT-d which the older motherboard (or its BIOS) didn't have. So I decided to test it, sure enough in Configuration >  Advanced Settings where before I would get a:

 Warning: Host does not support passthrough configuration 

now it offered to configure some devices for VMDirectPath passthrough. I was this close to selecting every USB device for passthrough and clicking ok but this message gave me pause:

Warning: configuring host hardware without special virtualization features for virtual machine passthrough will make it unavailable for use except dedicating it to a single virtual machine. In particular, configuring a device needed for normal host boot or operation can make normal host boot impossible and may require significant effort to undo. See the online help for more information.

Long message, isn't it?  Even after reading that message I was still tempted to ignore it, but considering for a moment how difficult normal things are with vmware I decided to google it and sure enough there were horror stories of people that tried just that and ended up having to reinstall everything. I just couldn't afford it, maybe some other day.


Thinning disks

I wanted to convert some thick drives to thin drives for that I followed Kent's blog steps. This is the key step for me:

vmkfstools -i <source file> -d thin <target file>

It clones a virtual drive using the specified mode.

Moving things (disks) around

Something else I wanted to do is to make things more efficient in terms of speed and space, but mostly speed. I have a modest installation with 3 Windows VMs constantly on (though not necessarily constantly in use) and some other Windows and Linux (Ubuntu and Puppy Linux) VMs mostly for experimental use. The fact is whenever two of the VMs were doing some work performance suffers terribly. So I decided to move the disks around to test better configurations. I'll update as I get results.



In conclusion things are never easy with VMware but after much trial and error and googling a lot you eventually get there.

No comments: