ESX/ESXi 4.1 with Broadcom bnx2x experiences a purple diagnostic screen

October 13, 2010

During an implementation of ESXi 4.1 at a customer site I experienced random crashes of ESXi during normal operations. VMware support has been investigating this issue and a probem has been found in the Broadcom’s bnx2x Inbox driver version 1.54.1.v41.1-1vmw. This issue can lead to a complete crash of the ESX/ESXi 4.1 host.

VMware has currently released a workaround for this bug and is working to provide a full solution to the issue. Information on this problem can be found in KB1029368.


Common VMkernel errors

February 10, 2010

In a lot of ESX/ESXi implementations I noticed some common warnings messages in the VMkernel log. I wanted to get more insight into these warnings so I contacted VMware support for some explanation. Here are my results:

WARNING: UserCartel: 1820: Fork not supported for multithreaded parents
This message is normally caused by a known bug in qlogic drivers. VMware can’t provide an exact cause as they are still waiting for qlogic to come back with an updated driver but there is no cause for concern, no impact on the system has been demonstrated.

WARNING: UserLinux: 1717: UNIMPLEMENTED! write-back of mmap regions unsupported
This message is caused problem in the CIM agent in combination with the OEM providers and some IHV information that is reported whether the Hardware exists or not on the host. This problem can result in the error generated. The message can be ignored for the moment and will be fixed in a future patch.

WARNING: VFAT: 154: File_Ioctl
The messages is not a cause for concern. There messages are being created by busybox. Despite the ‘warning’ tag they are actually information and are being generated as busybox interacts with parts of the filesystem. There is an upcoming patch due which will stop the messages from being logged.


ESXi CIM Agent bug

August 7, 2009

The CIM agent allows ESX to monitor the hardware status of the physical server and provide this hardware status information back to the administrator either through vCenter Hardware status or Health Status views.

Unfortunately there is a problem in the CIM agent in combination with the OEM providers and some IHV information that is reported whether the Hardware exists or not on the host. This problem can result in intermittent lockups of the ESX host.

The problem can be detected by the following warning messages in de vmkernel log of ESXi 3.5 update 4 or ESXi 4 hosts.

StorelibManager::createDefaultSelfCheckSettings – failed to get TopLevelSystem Jul 29 19:53:57 vmkernel: 0:06:03:34.837 cpu2:5535)WARNING: UserThread: 402: Peer table full for sfcbd
Jul 29 19:53:57 vmkernel: 0:06:03:34.837 cpu2:5535)WARNING: World: vm99478: 1111: init fn user failed with: Out of resources!

VMware recommends to disable the CIM agent until this problem is resolved. If this service is disabled or otherwise not available, no updated hardware status information is received until it is available again.

More information on: http://kb.vmware.com/kb/1012575


VMKernel: nmp_DeviceRequestFastDeviceProbe

August 6, 2009

I am currently testing ESXi 4 by adding one ESXi 4 host to a VMware production cluster of a customer. The ESXi 4 host seems to run fine but i noticed the following kernel warnings in the system log:

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x4100021b8480) to NMP device "naa.600508b4000554df00007000034a0000" failed on physical path "vmhba1:C0:T0:L11" H:0×2 D:0×0 P:0×0 Possible sense data: Jul 18 17:00:27 0×0 0×0 0×0.

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4000554df00007000034a0000" state in doubt; requested fast path state update…

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)ScsiDeviceIO: 747: Command 0x2a to device "naa.600508b4000554df00007000034a0000" failed H:0×2 D:0×0 P:0×0 Possible sense data: 0×0 0×0 0×0.

The log message above contains the following codes:

failed H:0×2 D:0×0 P:0×0

The interesting section here is the code starting with "H" (H stands for "Host status"). Host status 0×2 means "HOST BUSY"

Vmware support gives the following explanation for this:


I checked with our bug database and as I had thought previously, H:0×2 D:0×0 P:0×0 translates to hba busy. The driver for whatever reason failed the i/o with a busy status. These can occur for any number of reasons. These failures are automatically retried by ESX.

Jul 18 17:00:27 vmkernel: 2:07:08:24.308 cpu7:40478)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600508b4000554df00007000034a0000" state in doubt; requested fast path state update…"

This messaging will initially indicate that a NMP command was not responsive on a device, thus the NMP plugin ‘doubted’ the sate of the lun, i.e was it busy, was it on a path, was it responsive. This could be a driver problem or spurious logging. A bug for this message has been logged, and as yet is not an issue, unless followed by failing I/O or VM failures.


So it looks like a bug, but as yet is not an issue. Hope this gives some clarification!


Follow

Get every new post delivered to your Inbox.