The Book of Xen

Chris Takemura - Luke S. Crawford

Part 19

Report Chapter
webnovel
webnovel

#xmlist NameIDMemVCPUsStateTime(s) Domain-0010248r-----76770.8 caliban722561-b----4768.3 Here we're going to demonstrate connectivity between the domain caliban caliban (IP address 192.0.2.86) and the dom0 (at 192.0.2.67). (IP address 192.0.2.86) and the dom0 (at 192.0.2.67).

#arping192.0.2.67 ARPING192.0.2.67from192.168.42.86eth0 Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]0.752ms Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]0.671ms Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]2.561ms Note that the dom0 replies with its MAC address when queried via ARP.

#tcpdump-ivif72.0 tcpdump:WARNING:vif72.0:noIPv4addressa.s.signed tcpdump:verboseoutputsuppressed,use-vor-vvforfullprotocoldecode listeningonvif1.0,link-typeEN10MB(Ethernet),capturesize96bytes 18:59:33.704649arpwho-hascaliban(00:12:3f:ac:3d:bd(ouiUnknown))tell 192.168.42.86 18:59:33.707406arpreplycalibanis-at00:12:3f:ac:3d:bd(ouiUnknown) 18:59:34.714986arpwho-hascaliban(00:12:3f:ac:3d:bd(ouiUnknown))tell 192.168.42.86 The ARP queries show up correctly in the dom0.

Now, most of the time, you will see appropriate output in tcpdump tcpdump as shown. This tells you that Xen is moving packets from the domU to the dom0. Do you see a response to the ARP who-has? (It should be ARP is-at.) If not, it's possible your bridge in the dom0 isn't set up correctly. One easy way to check the bridge is to run as shown. This tells you that Xen is moving packets from the domU to the dom0. Do you see a response to the ARP who-has? (It should be ARP is-at.) If not, it's possible your bridge in the dom0 isn't set up correctly. One easy way to check the bridge is to run brctl show brctl show: #brctlshow bridgenamebridgeidSTPenabledinterfaces eth08000.00304867164cnocaliban prospero arielNoteIn Xen.org versions before Xen 3.2, the bridge name is, by default, xenbr0 xenbr0 for for network-bridge network-bridge. Xen 3.2 and later, however, named the bridge eth0 (0, in this case, is the number of the related network interface). RHEL/CentOS, by default, creates another bridge Xen 3.2 and later, however, named the bridge eth0 (0, in this case, is the number of the related network interface). RHEL/CentOS, by default, creates another bridge, virbr0 virbr0, which is part of the libvirt stuff. In practical terms, it functions like which is part of the libvirt stuff. In practical terms, it functions like network-nat network-nat, with a DHCP server handing out private addresses on the dom0 with a DHCP server handing out private addresses on the dom0.

Now, for troubleshooting purposes, a bridge is like a switch. Make sure the bridge (switch) your domU interface is connected to is also connected to an interface that touches the network you want the domU on, usually a pethX pethX device. (As explained in device. (As explained in Chapter5 Chapter5, network-bridge network-bridge renames renames ethX ethX to to pethX pethX and creates a fake and creates a fake ethX ethX device from device from vif0.x vif0.x when it starts up.) when it starts up.) Check the easy stuff. Can anything else on the bridge see traffic from the outside world? Do tcpdump -n -i peth0 tcpdump -n -i peth0. Are the packets flowing properly?

Check your routes. Don't forget higher-level stuff, like DNS servers.

The DomU Interface Number Increments with Every Reboot When Xen creates a domain, it looks at the vif=[] vif=[] statement. Each string within the statement. Each string within the [ ] [ ] characters (it's a Python array) is another network device. If I just say characters (it's a Python array) is another network device. If I just say vif=['',''] vif=['',''] it creates two network devices for me, with random MAC addresses. In the domU, they are (ideally) named it creates two network devices for me, with random MAC addresses. In the domU, they are (ideally) named eth0 eth0 and and eth1 eth1. In the dom0, they are named vifX.0 vifX.0 and and vifX.1 vifX.1, where X X is the domain number. is the domain number.

Most modern Linux distros, by default, lock ethX ethX to a particular MAC address on the first boot. In RHEL/CentOS, the setting is to a particular MAC address on the first boot. In RHEL/CentOS, the setting is HWADDR= HWADDR= in in /etc/sysconfig/network-scripts/ifcfg-ethX /etc/sysconfig/network-scripts/ifcfg-ethX. Most other distros use udev udev to handle persistent MAC addresses, as described in to handle persistent MAC addresses, as described in Chapter5 Chapter5. We circ.u.mvent the problem by specifying the MAC address on the vif= vif= line in the line in the xm config xm config file: file: vif=['mac=00:16:3E:AA:AA:AB','mac=00:16:3E:AA:AA:AC']

Here we're using the XenSource MAC prefix, 00:16:3E 00:16:3E. If you start your MAC with that prefix, you know it won't conflict with any a.s.signed hardware MAC addresses.

If you don't specify the MAC address, it'll be randomly generated every time the domU boots, which causes some inconvenience if your domU OS has locked down ethX ethX to a particular MAC. For more on the possible effects and why it's a good idea to specify a MAC address, see to a particular MAC. For more on the possible effects and why it's a good idea to specify a MAC address, see Chapter5 Chapter5.

iptables The iptables iptables rules can also be a source of trouble with Xen. As with any rules can also be a source of trouble with Xen. As with any iptables iptables setup, it's easy to mess up in subtle ways and break everything. The best way we've found to make sure that setup, it's easy to mess up in subtle ways and break everything. The best way we've found to make sure that iptables iptables rules are working is to send packets through and watch what happens to them. Run rules are working is to send packets through and watch what happens to them. Run iptables -L -v iptables -L -v to see counters for how many packets have hit each rule or have been affected by the chain policy. to see counters for how many packets have hit each rule or have been affected by the chain policy.

NoteThe interface counters for vifs that are examined from the dom0 end will be inverted; outgoing traffic will report as incoming, and vice versa. See Chapter5 Chapter5 for more information about why that happens for more information about why that happens.

You may also have trouble getting antispoof to work. If you enable antispoof but find you can still spoof arbitrary IP addresses in the domU, add the following to your network startup: echo1>/proc/sys/net/bridge/bridge-nf-call-iptables This will cause packets sent through the bridges to traverse the forward chain, where Xen puts the antispoof rules. We added the command to the end of /etc/xen/scripts/network-bridge /etc/xen/scripts/network-bridge.

Another problem can occur if you're using vifnames, as we suggest in Chapter5 Chapter5. Make sure the names are short-eight characters or less. Longer names can get truncated, and different parts of the system truncate at different lengths (at least in CentOS 5.0). In our particular case, we saw problems where the actual vifnames were truncated at one length, and our firewall rules (for antispoof) were truncated at another length, blocking all packets from the domain in question. It is better to avoid the problem and keep the vifnames short.

Memory Issues Xen (or rather, the Linux driver domain) can act rather strangely when memory is running low. Because Xen and the dom0 require a certain amount of contiguous, unswappable memory, it's surprisingly easy (in our experience) to find the oom-killer snacking on processes like candy. This even happens when there's plenty of swap available.

The best solution we've found-and we freely admit that it's not perfect-is to give dom0 more memory. We also prefer to fix its memory allocation at something like 512MB so that it doesn't have to cope with Xen constantly adjusting its memory size.

The basic way of tuning dom0's memory allocation is by adjusting the dom0_mem dom0_mem kernel parameter, which sets an upper limit, and the kernel parameter, which sets an upper limit, and the dom0-min-mem dom0-min-mem parameter in parameter in /etc/xen/xend-config.sxp /etc/xen/xend-config.sxp, which sets a lower limit. Again, we usually set both of these to the same value.

To set the maximum amount of memory available to the dom0, edit menu.lst menu.lst and put the option after the kernel line, like this: and put the option after the kernel line, like this: kernel/xen.gzdom0_mem=512Mnoreboot In the absence of units, Xen will a.s.sume that the value is in KB.

Next, edit /etc/xen/xend-config.sxp /etc/xen/xend-config.sxp and add a line that says: and add a line that says:[85]

(dom0-min-mem512) We do this because we've seen the dom0 have problems with ballooning. Ballooning usually works, but, like taking backups from a nonquiescent filesystem, usually works usually works is not good enough for something as important as the dom0. is not good enough for something as important as the dom0.

[85] Recent versions of Xen also support the option Recent versions of Xen also support the option (enable-dom0-ballooning no) (enable-dom0-ballooning no).

Other Messages xenconsole:Couldnotreadttyfromstore:Nosuchfileordirectory This message usually shows up in response to an attempt to connect to a domain's virtual console (especially when Xen's kernel doesn't match its userland; for example, if we've upgraded Xen's supporting tools without changing the hypervisor).

If this is a paravirtualized domain, first try killing and restarting the xenconsoled xenconsoled process. Make sure it dies. We have seen cases where process. Make sure it dies. We have seen cases where xenconsoled xenconsoled hangs and must be killed with a hangs and must be killed with a -9 -9.

#pkillxenconsoled&&/usr/sbin/xenconsoled Then reconnect with xm console xm console.

If the problem persists, you're most likely trying to access a domain that doesn't have the necessary Xen frontend console device configured in. There are several possibilities: If this is a custom kernel, you may have simply forgotten to include it, for example. Check the configuration of the domain's kernel and the initrd for the xvc driver.

If you are accessing an HVM domain running a default (nonenlightened) kernel that doesn't include the console driver, try using the framebuffer or booting a different kernel. You might also be able to set serial=pty serial=pty in the domain config file and set the domU OS to use com1 as the console. See in the domain config file and set the domU OS to use com1 as the console. See Chapter12 Chapter12 for details. for details.

VmError:(22,'Invalidargument')

This error can mean a number of things. Often the problem is a version mismatch between the tools and the running Xen hypervisor. Although the binaries installed in /usr/sbin /usr/sbin may be correct, the underlying Python modules may be wrong. Check that they're correct using whatever evidence is available: dates, comments in the files themselves, output of may be correct, the underlying Python modules may be wrong. Check that they're correct using whatever evidence is available: dates, comments in the files themselves, output of xm info xm info, and so on.

The error can also indicate a PAE mismatch. In this case xend-debug.log xend-debug.log will give a succinct description of the problem: will give a succinct description of the problem: #tail/var/log/xen/xend-debug.log ERROR:NonPAE-kernelonPAEhost.

ERROR:ErrorconstructingguestOS Incidentally, your dom0-which is, after all, just a special Xen guest domain-can also suffer from this problem. If it happens, the hypervisor will report a PAE mismatch in a large boxed-off error message at boot time and immediately reboot.

"noversionforstruct_modulefound:kerneltainted"

We got this error while trying to install the binary Xen distribution on a Slackware machine. The binary distro comes with a very minimal kernel, so it needs an initrd with appropriate modules. For some reason, the default script loaded modules in the wrong order, causing some loads to fail with the preceding message.

We fixed the problem by changing the load order in the initrd; specific directions would depend on your distro.

A Constant Stream of 4GiB seg fixup Messages Sometimes, on booting a newly installed i386 domain, you'll be greeted with screens full of messages like this: 4gbsegfixup,processinit(pid1),cs:ip73:b7ec2fc5 These are related to the /lib/tls /lib/tls problem: Xen is complaining because it's having to emulate a 4GiB segment for the benefit of some process that's using negative offsets to access the stack. You may also see a giant message at boot, reminding you to address this issue. problem: Xen is complaining because it's having to emulate a 4GiB segment for the benefit of some process that's using negative offsets to access the stack. You may also see a giant message at boot, reminding you to address this issue.

To solve this problem, you want to use a glibc that does not do this. You can compile glibc with the -mno-tls-direct-seg-refs -mno-tls-direct-seg-refs option or install the appropriate libc6-xen package for your distribution (both Red Hatlike and Debian-like distros have created packages to address this problem). option or install the appropriate libc6-xen package for your distribution (both Red Hatlike and Debian-like distros have created packages to address this problem).

With Red Hat (and its derived distros), you can also run these commands: #echo'hwcap0nosegneg'>/etc/ld.so.conf.d/libc6-xen.conf #ldconfig This will instruct the dynamic loader to avoid that particular optimization.

For Debian-based distros (using the 2.6.18 kernel), you can simply run: #apt-getinstalllibc6-xen If all else fails (or if you are just too lazy to find a version of gcc with no-tls-direct-seg-refs no-tls-direct-seg-refs), you can do as the error message advises and move the TLS library out of the way: #mv/lib/tls/lib/tls.disabled In our experience, there isn't any problem with moving the library. Everything will continue to function as expected.

The Importance of Disk Drivers (initrd Problems) Often when using a distro kernel, a Xen domU will boot but be unable to locate its root device. For example: VFS:Cannotopenrootdevice"sda1"orunknown-block(0,0) Pleaseappendacorrect"root="bootoption Kernelpanic-notsyncing:VFS:Unabletomountrootfsonunknown-block(0,0) The underlying problem here-at least in this case-is that the domU kernel doesn't have the necessary drivers compiled in, and the ramdisk was not specified. A look at the boot output confirms this, with the messages: XENBUS:Devicewithnodriver:device/vbd/769 XENBUS:Devicewithnodriver:device/vbd/770 XENBUS:Devicewithnodriver:device/vif/0 Nearly all distro kernels come with a minimal kernel and require an initrd with the disk driver to finish booting. These messages may simply come from the kernel before the initrd has loaded, or they can indicate a serious problem if the initrd doesn't contain the necessary drivers.

If the kernel managed to load its initrd correctly and failed to switch to its real root, you'll find yourself stuck in the initrd with a very limited selection of files. In this case, make sure that your devices exist (/dev/sda1 in this example) and that you've got the Xen disk frontend kernel module. in this example) and that you've got the Xen disk frontend kernel module.

We also commonly see this within PyGRUB domUs after a kernel upgrade (and new initrd) if the modules config (/etc/modules on Debian, on Debian, /etc/modprobe.conf /etc/modprobe.conf on Red Hat) didn't specify on Red Hat) didn't specify xenblk xenblk. For RHEL/CentOS domUs, you can solve this problem by running mkinitrd mkinitrd with the with the --preload xenblk --preload xenblk switch. switch.

If you use an external kernel and want to use a distro kernel, you must specify a ramdisk= ramdisk= line in the domain config file, and specify a ramdisk that includes the line in the domain config file, and specify a ramdisk that includes the xenblk xenblk (and (and xennet xennet, if you want network before boot) drivers.

Another solution to this problem would be to compile Xen from source and build a sufficiently generic domU kernel, with the xenblk xenblk and and xennet xennet drivers already compiled in. Even if you continue to boot the dom0 from the distro kernel (probably a good idea), this will sidestep the distro-specific issues found with both Red Hat and Debian kernels. drivers already compiled in. Even if you continue to boot the dom0 from the distro kernel (probably a good idea), this will sidestep the distro-specific issues found with both Red Hat and Debian kernels.

This may cause problems with some domU distros because the expected initrd won't be there. Sometimes it can be difficult to build an initrd against a kernel with disk drivers built in. However, the generic kernel will usually at least boot.

We often find it useful to keep these generic kernels as a secondary rescue boot option within the domU PyGRUB config because they work no matter how badly the initrd is messed up.

XenStore Sometimes the XenStore gets corrupted, or xenstored xenstored dies, or for various other reasons the XenStore ceases to store and report information. For example, this may happen if the block device holding the XenStore database becomes full. dies, or for various other reasons the XenStore ceases to store and report information. For example, this may happen if the block device holding the XenStore database becomes full.

The most obvious symptom is that xm list xm list will report domain names incorrectly, for example: will report domain names incorrectly, for example: #xmlist NameIDMem(MiB)VCPUsStateTime(s) Domain-0025542r-----16511.2 Domain-10101271-b----1671.5 Domain-11112551-b----442.0 Domain-1414631-b----1758.2 Domain-1515621-b----7507.7 Domain-16161271-b----11194.9 Domain-66941-b----5454.2 Domain-77621-b----270.8 Domain-991271-b----1715.7 Obviously, this is problematic. For one thing, it means that all commands that can take a name or ID, such as xm console xm console, will no longer recognize names.

*** You are reading on https://webnovelonline.com ***

Unfortunately, xenstored xenstored cannot be restarted, so you'll have to reboot. If you're running a version of Xen prior to 3.1 (including the RHEL 5.x version), you'll have to remove cannot be restarted, so you'll have to reboot. If you're running a version of Xen prior to 3.1 (including the RHEL 5.x version), you'll have to remove /var/lib/xenstored/tdb /var/lib/xenstored/tdb first, then reboot. first, then reboot.

Armed with this information, you can do several things. To continue our earlier example, we'll open /usr/lib/python2.5/site-packages/xen/xend/XendAPI.py /usr/lib/python2.5/site-packages/xen/xend/XendAPI.py and add a line near the top of the file to import the deb.u.g.g.e.r module, and add a line near the top of the file to import the deb.u.g.g.e.r module, pdb pdb.

importpdb Having done that, you can set a breakpoint. Just add a line near line 672: pdb.set_trace() Then try rerunning the server (or redoing whatever other behavior you're concerned with) and note that xend xend starts the deb.u.g.g.e.r when it hits your new breakpoint. starts the deb.u.g.g.e.r when it hits your new breakpoint.

At this point you can do everything that you might expect in a deb.u.g.g.e.r: change the values of variables, step through a function, step into subroutines, and so forth. In this case, we might backtrace, figure out why it's trying to call VM.get_auto_power_on VM.get_auto_power_on, and maybe wrap it in an error-handling block.

Domain Stays in Blocked State This heading is a bit of a misnomer. The reality is that the "blocked" state reported by tools like xm list xm list simply means that the domain is idle. The true problem is that the domain seems unresponsive. simply means that the domain is idle. The true problem is that the domain seems unresponsive.

Usually we find that this problem is related to the console; for example: [[email protected]~]#xmcreate-csebastian.cfg Usingconfigfile"/etc/xen/sebastian.cfg".

GoingtobootFedoraCore(2.6.18-1.2798.fc6xen) kernel:/vmlinuz-2.6.18-1.2798.fc6xen initrd:/initrd-2.6.18-1.2798.fc6xen.img Starteddomainsebastian rtc:IRQ8isnotfree.

i8042.c:Nocontrollerfound.

(and then an indefinite hang). Upon breaking out and looking at the output of xm list xm list, we note that the domain stays in a blocked state and consumes very little CPU time.

[~]#xmlist NameIDMem(MiB)VCPUsStateTime(s) Domain-0034762r-----407.1 sebastian134991-b----19.9 A quick look at /var/log/xen/xend-debug.log /var/log/xen/xend-debug.log suggested an answer: suggested an answer: 10/09/200720:11:48AutoprobingTCPport 10/09/200720:11:48Autoprobingselectedport5900 Port 5900 is VNC. Aha! The problem was that Xen wasn't using the virtual console device that xm xm console connects to. In this case, we traced it to user error. We specified the framebuffer and forgot about it. The kernel, as instructed, used the framebuffer as console rather than emulated serial console that we were expecting. When we started a VNC client and connected to port 5900, it gave us the expected graphical console. console connects to. In this case, we traced it to user error. We specified the framebuffer and forgot about it. The kernel, as instructed, used the framebuffer as console rather than emulated serial console that we were expecting. When we started a VNC client and connected to port 5900, it gave us the expected graphical console.

NoteIf we had put a getty getty on xvc0, even though we wouldn't have seen boot output, we'd at least get a login prompt when the machine booted on xvc0, even though we wouldn't have seen boot output, we'd at least get a login prompt when the machine booted.

Debugging Hotplug Xen makes extensive use of udev to create and destroy virtual devices, both in the dom0 and the domU. Most of its interaction with Linux's hotplug subsystem gets logged in /var/log/xen/xen-hotplug.log /var/log/xen/xen-hotplug.log. (We're going to treat hotplug as synonymous with udev because we can't think of any system that still uses the pre-udev hotplug implementation.) First, we examine the effects of the script. In this case, we use udevmonitor udevmonitor to see udev events. It should show an to see udev events. It should show an add add event for each event for each vif vif and and vbd vbd as well as an as well as an online online event for the event for the vif vif. These go through the rules in /etc/udev/rules.d/xen-backend.rules /etc/udev/rules.d/xen-backend.rules, which executes appropriate scripts in /etc/xen/scripts /etc/xen/scripts.

At this point you can add some extra logging. At the top of the script for the device you're interested in (e.g., blktap), put: set-x exec2>>/var/log/xen-hotplug.log This will cause the sh.e.l.l to expand the commands in the script and write them to xen-hotplug.log xen-hotplug.log, enabling you (hopefully) to trace down the source of the problem and eliminate it.

Hotplug can also act as a bit of a catchall for any virtual device problem. Some hotplug-related errors take the form of the dreaded Hotplug scripts not working Hotplug scripts not working message, like the following: message, like the following: Error:Device0(vkbd)couldnotbeconnected.Hotplugscriptsnotworking.

This seems to be a.s.sociated with messages like the following: DEBUG(DevController:148)Waitingfordevicesirq.

DEBUG(DevController:148)Waitingfordevicesvkbd.

DEBUG(DevController:153)Waitingfor0.

DEBUG(DevController:539)hotplugStatusCallback /local/domain/0/backend/vkbd/4/0/hotplug-status In this case, however, these messages turned out to be red herrings. The answer came out of xend-debug.log xend-debug.log, which said: /usr/lib/xen/bin/xen-vncfb:errorwhileloadingsharedlibraries: libvncserver.so.0:cannotopensharedobjectfile:Nosuchfileor directory As it developed, libvncserver libvncserver was installed in was installed in /usr/local /usr/local, which the runtime linker had been ignoring. After adding /usr/local/lib /usr/local/lib to to /etc/ld.so.conf /etc/ld.so.conf, xen-vncfb xen-vncfb started up happily. started up happily.

strace One important generic troubleshooting technique is to use strace to look at what the Xen control tools are really doing. For example, if Xen is failing to find an external binary (like xen-vncfb), strace can reveal that problem with a command like the following: #strace-etrace=open-fxmcreateprospero2>&1

grepENOENT

less Unfortunately, it'll also give you a lot of other, entirely harmless output while Python proceeds to pull in the entirety of its runtime environment based on crude guesses about filenames.

Another example of strace's usefulness comes from when we were setting up PyGRUB: #stracexmcreate-cprospero (snipped) mknod("/var/lib/xen/xenbl.4961",S_IFIFO

0600)=-1ENOENT(Nosuchfileor directory) As it turned out, we didn't have a directory required by PyGRUB's backend. Thus: #mkdir-p/var/lib/xen/ and everything works fine.

*** You are reading on https://webnovelonline.com ***

Popular Novel