Penguin XEN: Bugs Reports

Last update : Thursday the 8th of January 2009
maintained by: Guillaume Thouvenin

Table of Contents [Back to main menu]

We use the following format:
    BugNumber. mmddyyyy - [opened/closed] description

  1. 09212006 - [opened] negative mapcount
  2. 09212006 - [opened] HVM Bug
  3. 05292006 - [closed] Compile error with block device tap driver
  4. 03302006 - [closed] New 2.6.16-xen doesn't boot
  5. 02162006 - [closed] [x86_64] eth0 e1000_clean_tx_irq hang
  6. 01192006 - [closed] CPU fatal trap with sEDF
  7. 01182006 - [closed] Process doesn't inherit time left
  8. 01182006 - [closed] Network doesn't work in domU



09212006 - negative mapcount

Synopsis: When I tried to create a new domain "debian_steinitz" and destroyed it I saw the following message on the screen

Message from syslogd@localhost at Thu Sep 21 09:30:42 2006 ...
localhost kernel: Eeek! page_mapcount(page) went negative! (-1)
Message from syslogd@localhost at Thu Sep 21 09:30:42 2006 ...
localhost kernel:   page->flags = 4
Message from syslogd@localhost at Thu Sep 21 09:30:42 2006 ...
localhost kernel:   page->count = 0
Message from syslogd@localhost at Thu Sep 21 09:30:42 2006 ...
localhost kernel:   page->mapping = 0000000000000000
Message from syslogd@localhost at Thu Sep 21 09:30:42 2006 ...
localhost kernel: invalid opcode: 0000 [1] SMP

Message from syslogd@localhost at Thu Sep 21 09:35:09 2006 ...
localhost kernel: Bad page state in process 'python'
Message from syslogd@localhost at Thu Sep 21 09:35:09 2006 ...
localhost kernel: page:ffff8800062cd168 flags:0x0000000000000004 mapping:0000000000000000 mapcount:-1 count:0
Message from syslogd@localhost at Thu Sep 21 09:35:09 2006 ...
localhost kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@localhost at Thu Sep 21 09:35:09 2006 ...
localhost kernel: Backtrace:

and by using 'dmesg' I can see:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/rmap.c:560
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: tun af_packet bridge ipv6 serverworks generic ide_generic ext3 jbd loop psmouse genrtc tsdev mousedev joydev sd_mod ide_cd cdrom ide_disk ide_floppy usb_storage ide_core tg3 usbserial usbhid usbkbd ehci_hcd ohci_hcd thermal processor fan aic94xx sas_class scsi_mod unix
Pid: 3538, comm: qemu-dm Not tainted 2.6.16.13-xen-sas-testing #3
RIP: e030:[] {page_remove_rmap+124}
RSP: e02b:ffff88010d04fc48  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff8800056c3a38 RCX: ffffffff80421d08
RDX: ffffffff80421d08 RSI: 0000000000000000 RDI: ffffffff80421d00
RBP: 00002b66be88d000 R08: ffffffff80421d08 R09: 0000000000000009
R10: 0000000100000000 R11: 0000000000000000 R12: ffff88010e154468
R13: ffff8800010072c0 R14: 00000000e02ef027 R15: ffff88010f38bbc0
FS:  00002ae0f8b794a0(0000) GS:ffffffff8055c000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process qemu-dm (pid: 3538, threadinfo ffff88010d04e000, task ffff88010ecea890)
Stack: ffff8800056c3a38 ffffffff802573a1 00002b66be88e000 ffff88010e154470
       ffff88010f38bc38 ffffffff00000000 ffff88010d04fd68 00002b66bea00000
       ffff88010eaeace8 00002b66bea00000
Call Trace: {zap_pte_range+584} {unmap_page_range+1106}
       {unmap_vmas+238} {exit_mmap+119}
       {mmput+37} {do_exit+509}
       {sys_exit_group+0} {get_signal_to_deliver+769}
       {sysret_signal+56} {do_signal+108}
       {pipe_read+26} {vfs_read+210}
       {ptregscall_common+61}

Code: 0f 0b 68 f5 35 3e 80 c2 30 02 5b 48 83 ce ff bf 20 00 00 00
RIP {page_remove_rmap+124} RSP 
 <1>Fixing recursive fault but reboot is needed!
device vif1.0 left promiscuous mode
xenbr0: port 4(vif1.0) entering disabled state
Bad page state in process 'python'
page:ffff8800056c3a38 flags:0x0000000000000004 mapping:0000000000000000 mapcount:-1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: {bad_page+83} {prep_new_page+75}
       {buffered_rmqueue+416} {get_page_from_freelist+132}
       {__alloc_pages+81} {do_no_page+229}
       {__handle_mm_fault+905} {do_page_fault+563}
       {do_mmap_pgoff+1533} {__up_write+20}
       {error_exit+0}

In the xen log I can read:

(XEN) (GUEST: 2) Unknown opcode at F000:09E0=0xF09E0
(XEN) (GUEST: 2) Halt called from %eip 0xD39D1

09212006 - HVM Bug

Synopsis: This bug occurs when we tried to install a new debian sarge from an ISO image. The problem is only when using the 32 processors. When we use only one frame (with 16 procs) it boots.
Here are the config used and the output produced by the crash

#  -*- mode: python; -*-

import os, re
arch = os.uname()[4]
if re.search('64', arch):
    arch_libdir = 'lib64'
else:
    arch_libdir = 'lib'

# Kernel image file.
kernel="/usr/lib64/xen/boot/hvmloader"

# The domain build function. HVM domain uses 'hvm'.
builder='hvm'

# Initial memory allocation (in megabytes) for the new domain.
memory=2048

name="debian_steinitz.hvm"
vcpus = 16
vif=[ 'type=ioemu, bridge=xenbr0' ]
disk = [ 'file:/home/guill/file_image/debian_steinitz.img,ioemu:hda,w' ]

on_poweroff = 'destroy'
on_reboot   = 'destroy'
on_crash    = 'destroy'

device_model = '/usr/' + arch_libdir + '/xen/bin/qemu-dm'
cdrom = '/home/guill/iso/debian-31r2-i386-netinst.iso'
boot='d'
vnc=0
vncviewer=0
#============================================================================
(XEN) __hvm_bug at vmx.c:2280
(XEN) ----[ Xen-3.0.2-3    Not tainted ]----
(XEN) CPU:    31
(XEN) RIP:    0008:[<00000000000096b2>]
(XEN) RFLAGS: 0000000000010006   CONTEXT: hvm
(XEN) rax: 0000000000052400   rbx: 0000000000001000   rcx: 00000000000041d6
(XEN) rdx: 000000000000ffe0   rsi: 000000000001b002   rdi: 000000000000b001
(XEN) rbp: 00000000000d0003   rsp: 0000000000007bd6   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000050033   cr3: 000000001677d000
(XEN) ds: 0010   es: 0010   fs: 0018   gs: 0018   ss: 0018   cs: 0008
(XEN) domain_crash_sync called from vmx.c:2280
(XEN) Domain 1 (vcpu#0) crashed on cpu#31:
(XEN) ----[ Xen-3.0.2-3    Not tainted ]----
(XEN) CPU:    31
(XEN) RIP:    0008:[<00000000000096b2>]
(XEN) RFLAGS: 0000000000010006   CONTEXT: hvm
(XEN) rax: 0000000000052400   rbx: 0000000000001000   rcx: 00000000000041d6
(XEN) rdx: 000000000000ffe0   rsi: 000000000001b002   rdi: 000000000000b001
(XEN) rbp: 00000000000d0003   rsp: 0000000000007bd6   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000050033   cr3: 000000001677d000
(XEN) ds: 0010   es: 0010   fs: 0018   gs: 0018   ss: 0018   cs: 0008

05292006 - Compile error with block device tap driver

Synopsis: When the block tap driver is selected in linux-2.6.16.13-xen (Xen ---> Block device tap driver) it produces an error during the compilation. I'm using xen-3.0-testing.hg

<...>
CC      drivers/xen/blktap/xenbus.o
drivers/xen/blktap/xenbus.c: In function `frontend_changed':
drivers/xen/blktap/xenbus.c:71: warning: passing arg 1 of `xenbus_exists' makes integer from pointer without a cast
drivers/xen/blktap/xenbus.c:71: error: too few arguments to function `xenbus_exists' 
drivers/xen/blktap/xenbus.c:72: warning: passing arg 1 of `xenbus_rm' makes integer from pointer without a cast
drivers/xen/blktap/xenbus.c:72: error: too few arguments to function `xenbus_rm' 
drivers/xen/blktap/xenbus.c:80: warning: passing arg 1 of `xenbus_gather' makes integer from pointer without a cast
drivers/xen/blktap/xenbus.c:96: warning: implicit declaration of function `xenbus_dev_ok' 
drivers/xen/blktap/xenbus.c:101: error: too few arguments to function `xenbus_transaction_end'
drivers/xen/blktap/xenbus.c: In function `blkback_probe':
drivers/xen/blktap/xenbus.c:154: warning: passing arg 1 of `xenbus_gather' makes integer from pointer without a cast
drivers/xen/blktap/xenbus.c:163: warning: passing arg 1 of `xenbus_exists' makes integer from pointer without a cast
drivers/xen/blktap/xenbus.c:163: error: too few arguments to function `xenbus_exists' 
make[3]: *** [drivers/xen/blktap/xenbus.o] Error 1
make[2]: *** [drivers/xen/blktap] Error 2 make[1]: *** [drivers/xen]
Error 2 make: *** [drivers] Error 2

Solution: Give the right number of arguments...

Public report: Send to the Xen mailing-list http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01378.html and to Bugzilla Bug 660 and the answer

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=660

stefanb@us.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

------- Comment #1 from stefanb@us.ibm.com  2006-08-25 07:56 -------
This bug has been fixed a while ago. In the meantime the code has also been
rewritten.

03302006 - New 2.6.16-xen doesn't boot

Synopsis: When booting the dom0, the 2.6.16-xen kernel produces a kernel panic. Here is the message we obtained:

...
TCP reno registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
VFS: Cannot open root device "sda1" or unknown-block(0,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.

Solution: With the 2.6.16-xen kernel, everything is compiled as a module. Thus, we need to create an initrd image in order to be able to manage disks and mount the filesystem. With 2.6.16-xen0, many things are built-in.

To boot the new kernel you need to add the ramdisk path to dom0 and domU configuration files like follow:

$ cat /boot/grub/menu.lst
title Xen-3.0-unstable  / XenLinux 2.6 (X86_64)
root (hd0,0)
kernel /boot/xen-3.gz dom0_mem=1048576 com1=9600,8n1
module /boot/vmlinuz-2.6.16-xen root=/dev/sda1 console=ttyS0
module /boot/initrd.img-2.6.16-xen

$ cat /etc/xen/capablanca
kernel="/boot/vmlinuz-2.6.16-xen"
ramdisk = "/boot/initrd.img-2.6.16-xen"
memory=1024
name="capablanca"
disk=['phy:sdc1,sdc1','phy:sdc5,sdc5','phy:sdc6,sdc6']
vif=[' ']
root="/dev/sdc1 ro"

Public report: Send to the Xen mailing-list issue report

02162006 - [x86_64] eth0 e1000_clean_tx_irq hang

http://lists.xensource.com/archives/html/xen-devel/2006-02/msg00582.html

01192006 - CPU fatal trap with sEDF

Synopsis: sEDF scheduler is a scheduler that provides weighted CPU sharing. When we change the value of the scheduler a CPU fatal trap can happen.

Test: Create a domU and change the sheduler politic as follow:
# xm sched-sedf 1 20000000 5000000 0 0 0
# xm sched-sedf 1 20000000 0 0 1 0

The problem has been seen on: Changeset 8269 (xen-3.0.0-testing)
Changeset 8571 (xen-unstable)
Changeset 8612 and 8627

Solution: The problem is not in the Changeset 9441 (xen-unstable)

Public report: Send to the Xen mailing-list issue report

01182006 - Process doesn't inherit time left

Synopsis: When you set an alarm clock for delivery of a signal, the signal is not inherited by the child if you use the execvp() function.

Test:

#include <unistd.h>

int main() {
	char *args[] = {"/bin/ls", "-lR", "/",
                        (char *)NULL};
        alarm(2);
        execvp("/bin/ls", args);
        return 0;}

Normally the program should stop after 2 seconds but it doesn't.

Solution: I don't remenber in which version of Xen the problem occurs but it's fixed since Changeset 9441 (xen-unstable).

Public report: Send to the Xen mailing-list issue report

01182006 - Network doesn't work in domU

Synopsis: We can boot domU but the network doesn't work. We have the following error:

modprobe: FATAL: Could not load /lib/modules/2.6.12-xenU/modules.dep: No such file or directory

Solution: We test a xen-3.0.0-testing and the network is working. Thus, if we compare the changeset, we can say that the problem is introduced between changeset 8242 (which is the RELEASE-3.0.0) and changeset 8627.

We test all changeset and find that the problem comes with changeset 8330. In fact the problem is that the nics option is now obsolete and we must use the vif variable to configure domU. Thus we add the line vif=[''] in /etc/xen/donU.

Public report: No report


Valid XHTML 1.0 Strict