The Book of Xen

Chris Takemura - Luke S. Crawford

Part 5

Report Chapter
webnovel
webnovel

No matter what, though, all storage backends look the same from within the Xen virtual domain. The hypervisor exports a Xen VBD (virtual block device) to the domU, which in turn presents the device to the guest OS with an administrator-defined mapping to traditional Unix device nodes. Usually this will be a device of the form hdx hdx or or sdx sdx, although many distros now use xvdx xvdx for for xen virtual disk xen virtual disk. (The hd hd and and sd sd devices generally work, as well.) devices generally work, as well.) We recommend blktap (a specialized form of file backend) and LVM for storage backends. These both work, offer good manageability, can be resized and moved freely, and support some mechanism for the sort of things we expect of filesystems now that we Live In The Future. blktap is easy to set up and good for testing, while LVM is scalable and good for production.

None of this is particularly Xen-specific. LVM is actually used (outside of Xen) by default for the root device on many distros, notably Red Hat, because of the management advantages that come with an abstracted storage layer. blktap is simply a Xen-specific mechanism for using a file as a block device, just like the traditional block loop driver. It's superior to the loop mechanism because it allows for vastly improved performance and more versatile filesystem formats, such as QCOW, but it's not fundamentally different from the administrator's perspective.

Let's get to it.

Basic Setup: Files For people who don't want the ha.s.sle and overhead of LVM, Xen supports fast and efficient file-backed block devices using the blktap driver and library.

blktap (blk being the worn-down stub of "block" after being typed hundreds of times) includes a kernel driver and a users.p.a.ce daemon. The kernel driver directly maps the blocks contained by the backing file, avoiding much of the indirection involved in mounting a file via loopback. It works with many file formats used for virtual block devices, including the basic "raw" image format obtainable by dd dd ing a block device. ing a block device.

You can create a file using the dd dd command: command: #ddif=/dev/zeroof=/opt/xen/anthony.imgbs=1Mcount=1024NoteYour version of dd might require slightly different syntax-for example, it might require you to specify the block size in bytes.

Now dd dd will chug away for a bit, copying zeroes to a file. Eventually it'll finish: will chug away for a bit, copying zeroes to a file. Eventually it'll finish: 1024+0recordsin 1024+0recordsout 1073741824bytes(1.1GB)copied,15.1442seconds,70.9MB/s Thus armed with a filesystem image, you can attach it using the tap driver, make a filesystem on it, and mount it as usual with the mount mount command. command.

#xmblock-attach0tap:aio:/opt/xen/anthony.img/dev/xvda1w0 #mkfs/dev/xvda1 #mount/dev/xvda1/mnt/ First, we use the xm(8) xm(8) command to attach the block device to domain 0. In this case the command to attach the block device to domain 0. In this case the xm xm command is followed by the block-attach subcommand, with the arguments command is followed by the block-attach subcommand, with the arguments and optionally and optionally [backend domain id] [backend domain id]. To decompose our example, we are attaching anthony.img anthony.img read/write using the tap:aio driver to read/write using the tap:aio driver to /dev/xvda1 /dev/xvda1 in domain 0 using domain 0 to mediate access (because we tend to avoid using non-dom0 driver domains). When the file is attached as in domain 0 using domain 0 to mediate access (because we tend to avoid using non-dom0 driver domains). When the file is attached as /dev/xvda1 /dev/xvda1, we can create a filesystem on it and mount it as with any block device.

Now that it's mounted, you can put something in it. (See Chapter3 Chapter3 for details.) In this case, we'll just copy over a filesystem tree that we happen to have lying around: for details.) In this case, we'll just copy over a filesystem tree that we happen to have lying around: #cp-a/opt/xen/images/centos-4.4/*/mnt/ Add a disk= disk= line to the domU config (in our example, line to the domU config (in our example, /etc/xen/anthony /etc/xen/anthony) to reflect the filesystem: disk=['tap:aio:/opt/xen/anthony.img']

Now you should be able to start the domain with its new root device: #xmcreate-canthony Watch the console and bask in its soothing glow.

MOUNTING PARt.i.tIONS WITHIN A FILE-BACKED VBDThere's nothing that keeps you from part.i.tioning a virtual block device as if it were a hard drive. However, if something goes wrong and you need to mount the subpart.i.tions from within dom0, it can be harder to recover. The standard mount -o loop filename /mnt -o loop filename /mnt won't work, and neither will something like won't work, and neither will something like mount /dev/xvda1 /mnt mount /dev/xvda1 /mnt (even if the device is attached as (even if the device is attached as /dev/xvda /dev/xvda, Xen will not automatically scan for a part.i.tion table and create appropriate devices).kpartx will solve this problem. It reads the part.i.tion table of a block device and adds mappings for the device mapper, which then provides device file-style interfaces to the part.i.tions. After that, you can mount them as usual. will solve this problem. It reads the part.i.tion table of a block device and adds mappings for the device mapper, which then provides device file-style interfaces to the part.i.tions. After that, you can mount them as usual.Let's say you've got an image with a part.i.tion table that describes two part.i.tions:#xmblock-attach0tap:aio:/path/to/anthony.img/dev/xvdaw0 #kpartx-av/dev/xvdakpartx will then find the two part.i.tions and create will then find the two part.i.tions and create /dev/mapper/xvda1 /dev/mapper/xvda1 and and /dev/mapper/xvda2 /dev/mapper/xvda2. Now you should be able to mount and use the newly created device nodes as usual.

LVM: Device-Independent Physical Devices Flat files are well and good, but they're not as robust as simply providing each domain with its own physical volume (or volumes). The best way to use Xen's physical device support is, in our opinion, LVM.

LVM, short for logical volume management logical volume management, is Linux's answer to VxFS's storage pools or Windows Dynamic Disks. It is what the marketing people call enterprise grade enterprise grade. In keeping with the software mantra that "all problems can be solved by adding another layer of abstraction," LVM aims to abstract away the idea of "disks" to improve manageability.

Instead, LVM (as one might guess from the name) operates on logical volumes. This higher-level view allows the administrator much more flexibility-storage can be moved around and reallocated with near impunity. Even better, from Xen's perspective, there's no difference between an LVM logical volume and a traditional part.i.tion.

Sure, setting up LVM is a bit more work up front, but it'll save you some headaches down the road when you have eight domUs on that box and you are trying to erase the part.i.tion for the third one. Using LVM and naming the logical volume to correspond to the domU name makes it quite a bit harder to embarra.s.s yourself by erasing the wrong part.i.tion.[27]

QCOWUp to this point, we've talked exclusively about the "raw" file format-but it's not the only option. One possible replacement is the QCOW format used by the QEMU project. It's got a lot to recommend it-a fast, robust format that supports spa.r.s.e allocation, encryption, compression, and copy-on-write. We like it, but support isn't quite mature yet, so we're not recommending it as your primary storage option.Nonetheless, it might be fun to try. To start working with QCOW, it'll be convenient to have QEMU. (While Xen includes some of the QEMU tools, the full package includes more functionality.) Download it from http://www.nongnu.org/qemu/download.html. As usual, we recommend the source install, especially because the QEMU folks eschew standard package management for their binary distribution.Install QEMU via the standard process:#tarzxvf #cd #./configure #make #su #makeinstallQEMU includes the qemu-img qemu-img utility, which is used to create and manipulate the various sorts of image files that QEMU supports, including QCOW, vmdk, raw, and others. utility, which is used to create and manipulate the various sorts of image files that QEMU supports, including QCOW, vmdk, raw, and others.#qemu-imgcreate-fqcowen.o.barbus.qcow1024MThis command creates an image in QCOW format (-f qcow) with a size of 1,024MB. Of course, you'll want to replace the filename and size with appropriate values for your application.You can also convert a raw image to a QCOW image with the img2qcow img2qcow utility, which is included as part of the Xen distribution: utility, which is included as part of the Xen distribution:#img2qcowen.o.barbus.qcowen.o.barbus.imgYou can use the QCOW image directly as a domain's root disk with the tap driver. Configure the guest domain to use the QCOW image as its root filesystem. In the domain's config file under /etc/xen /etc/xen, add a disk= disk= line similar to: line similar to:disk=['tap:qcow:/opt/xen/en.o.barbus/en.o.barbus.qcow,sda1,w']You can extend this line with another disk, thus:disk=['tap:qcow:/opt/xen/en.o.barbus/en.o.barbus.qcow,sda1,w', 'tap:qcow:/opt/xen/en.o.barbus/en.o.barbus_disk2.qcow,sdb1,w']

Basic Setup: LVM The high-level unit that LVM operates on is the volume group volume group, or VG VG. Each group maps physical extents physical extents (disk regions of configurable size) to (disk regions of configurable size) to logical extents logical extents. The physical extents are hosted on what LVM refers to as physical volumes physical volumes, or PVs PVs. Each VG can contain one or more of these, and the PVs themselves can be any sort of block device supported by the kernel. The logical extents, reasonably enough, are on logical volumes logical volumes, abbreviated LVs LVs. These are the devices that LVM actually presents to the system as usable block devices.

As we're fond of saying, there really is no subst.i.tute for experience. Here's a five-minute ill.u.s.trated tutorial in setting up logical volumes (see Figure4-1 Figure4-1).

Figure4-1.This diagram shows a single VG with two PVs. From this VG, we've carved out three logical volumes, lv1, lv2, and lv3. lv1 and lv3 are being used by domUs, one of which treats the entire volume as a single part.i.tion and one of which breaks the LV into subpart.i.tions for / and /var.

Begin with some hard drives. In this example, we'll use two SATA disks.

NoteGiven that Xen is basically a server technology, it would probably be most sensible to use RAID-backed redundant storage, rather than actual hard drives. They could also be part.i.tions on drives, network block devices, UFS-formatted optical media ... whatever sort of block device you care to mention. We're going to give instructions using a part.i.tion on two hard drives, however. These instructions will also hold if you're just using one drive.WarningNote that we are going to repart.i.tion and format these drives, which will destroy all data on them.

First, we part.i.tion the drives and set the type to Linux LVM Linux LVM. Although this isn't strictly necessary-you can use the entire drive as a PV, if desired-it's generally considered good Unix hygiene. Besides, you'll need to part.i.tion if you want to use only a portion of the disk for LVM, which is a fairly common scenario. (For example, if you want to boot from one of the physical disks that you're using with LVM, you will need a separate /boot /boot part.i.tion.) part.i.tion.) So, in this example, we have two disks, sda and sdb. We want the first 4GB of each drive to be used as LVM physical volumes, so we'll part.i.tion them with fdisk fdisk and set the type to 8e (Linux LVM). and set the type to 8e (Linux LVM).

If any part.i.tions on the disk are in use, you will need to reboot to get the kernel to reread the part.i.tion table. (We think this is ridiculous, by the way. Isn't this supposed to be the future?) Next, make sure that you've got LVM and that it's LVM2, because LVM1 is deprecated.[28]

#vgscan--version LVMversion:2.02.23(2007-03-08) Libraryversion:1.02.18(2007-02-13) Driverversion:4.5.0 You might need to load the driver. If vgscan vgscan complains that the driver is missing, run: complains that the driver is missing, run: #modprobedm_mod In this case, dm dm stands for stands for device mapper device mapper, which is a low-level volume manager that functions as the backend for LVM.

Having established that all three of these components are working, create physical volumes as ill.u.s.trated in Figure4-2 Figure4-2.

#pvcreate/dev/sda1 #pvcreate/dev/sdb1Figure4-2.This diagram shows a single block device after pvcreate has been run on it. It's mostly empty, except for a small identifier on the front.

Bring these components together into a volume group by running vgcreate vgcreate. Here we'll create a volume group named cleopatra cleopatra on the devices sda1 and sdb1: on the devices sda1 and sdb1: #vgcreatecleopatra/dev/sda1/dev/sdb1 Finally, make volumes from the volume group using lvcreate lvcreate, as shown in Figure4-3 Figure4-3. Think of it as a more powerful and versatile form of part.i.tioning.

#lvcreate-L-m1--corelog-nmenascleopatra Here we've created a mirrored logical volume that keeps its logs in core (rather than on a separate physical device). Note that this step takes a group name rather than a device node. Also, the mirror is purely for ill.u.s.trative purposes-it's not required if you're using some sort of redundant device, such as hardware RAID or MD. Finally, it's an administrative convenience to give LVs human-readable names using the -n -n option. It's not required but quite recommended. option. It's not required but quite recommended.

Figure4-3.lvcreate creates a logical volume, /dev/vg/lvol, by chopping some s.p.a.ce out of the LV, which is transparently mapped to possibly discontinuous physical extents on PVs.

Create a filesystem using your favorite filesystem-creation tool: #mkfs/dev/cleopatra/menas At this point, the LV is ready to mount and access, just as if it were a normal disk.

#mount/dev/cleopatra/menas/mnt/hd To make the new device a suitable root for a Xen domain, copy a filesystem into it. We used one from http://stacklet.com/-we just mounted their root filesystem and copied it over to our new volume.

#mount-oloopgentoo.img/mnt/tmp/ #cp-a/mnt/tmp/*/mnt/hd Finally, to use it with Xen, we can specify the logical volume to the guest domain just as we would any physical device. (Note that here we're back to the same example we started the chapter with.) disk=['phy:/dev/cleopatra/menas,sda1,w']

At this point, start the machine. Cross your fingers, wave a dead chicken, perform the accustomed ritual. In this case our deity is propitiated by an xm create xm create. Standards have come down in the past few millennia.

#xmcreatemenas

[27] This example is not purely academic. This example is not purely academic.

[28] This is unlikely to be a problem unless you are using Slackware. This is unlikely to be a problem unless you are using Slackware.

Enlarge Your Disk Both file-backed images and LVM disks can be expanded transparently from the dom0. We're going to a.s.sume that disk s.p.a.ce is so plentiful that you will never need to shrink an image.

Be sure to stop the domain before attempting to resize its underlying filesystem. For one thing, all of the user-s.p.a.ce resize tools that we know of won't attempt to resize a mounted filesystem. For another, the Xen hypervisor won't pa.s.s along changes to the underlying block device's size without restarting the domain. Most important, even if you were able to resize the backing store with the domain running, data corruption would almost certainly result.

File-Backed Images The principle behind augmenting file-backed images is simple: We append more bits to the file, then expand the filesystem.

First, make sure that nothing is using the file. Stop any domUs that have it mounted. Detach it from the dom0. Failure to do this will likely result in filesystem corruption.

Next, use dd dd to add some bits to the end. In this case we're directing 1GB from our to add some bits to the end. In this case we're directing 1GB from our /dev/zero /dev/zero bit hose to bit hose to anthony.img anthony.img. (Note that not specifying an output file causes dd dd to write to stdout.) to write to stdout.) #ddif=/dev/zerobs=1Mcount=1024>>/opt/xen/anthony.img Use resize2fs resize2fs to extend the filesystem (or the equivalent tool for your choice of filesystem). to extend the filesystem (or the equivalent tool for your choice of filesystem).

#e2fsck-f/opt/xen/anthony.img #resize2fs/opt/xen/anthony.img resize2fs will default to making the filesystem the size of the underlying device if there's no part.i.tion table. will default to making the filesystem the size of the underlying device if there's no part.i.tion table.

If the image contains part.i.tions, you'll need to rearrange those before resizing the filesystem. Use fdisk fdisk to delete the part.i.tion that you wish to resize and recreate it, making sure that the starting cylinder remains the same. to delete the part.i.tion that you wish to resize and recreate it, making sure that the starting cylinder remains the same.

*** You are reading on https://webnovelonline.com ***

LVM It's just as easy, or perhaps even easier, to use LVM to expand storage. LVM was designed from the beginning to increase the flexibility of storage devices, so it includes an easy mechanism to extend a volume (as well as shrink and move).

[32] More properly, a device mapper snapshot, which LVM snapshots are based on. LVM snapshots are device mapper snapshots, but device mapper snapshots can be based on any pair of block devices, LVM or not. The LVM tools provide a convenient frontend to the arcane commands used by More properly, a device mapper snapshot, which LVM snapshots are based on. LVM snapshots are device mapper snapshots, but device mapper snapshots can be based on any pair of block devices, LVM or not. The LVM tools provide a convenient frontend to the arcane commands used by dmsetup dmsetup.

Storage and Migration These two storage techniques-flat files and LVM-lend themselves well to easy and automated cold migration cold migration, in which the administrator halts the domain, copies the domain's config file and backing storage to another physical machine, and restarts the domain.

Copying over a file-based backend is as simple as copying any file over the network. Just drop it onto the new box in its corresponding place in the filesystem, and start the machine.

Copying an LVM is a bit more involved, but it is still straightforward: Make the target device, mount it, and move the files in whatever fashion you care to.

Check Chapter9 Chapter9 for more details on this sort of migration. for more details on this sort of migration.

Network Storage These two storage methods only apply to locally accessible storage. Live migration, in which a domain is moved from one machine to another without being halted, requires one other piece of this puzzle: The filesystem must be accessible over the network to multiple machines. This is an area of active development, with several competing solutions. Here we'll discuss NFS-based storage. We will address other solutions, including ATA over Ethernet and iSCSI, in Chapter9 Chapter9.

NFS NFS is older than we are, and it is used by organizations of all sizes. It's easy to set up and relatively easy to administer. Most operating systems can interact with it. For these reasons, it's probably the easiest, cheapest, and fastest way to set up a live migration-capable Xen domain.

The idea is to marshal Xen's networking metaphor: The domains are connected (in the default setup) to a virtual network switch. Because the dom0 is also attached to this switch, it can act as an NFS server for the domUs.

In this case we're exporting a directory tree-neither a physical device nor a file. NFS server setup is quite simple, and it's cross platform, so you can use any NFS device you like. (We prefer FreeBSD-based NFS servers, but NetApp and several other companies produce fine NFS appliances. As we might have mentioned, we've had poor luck using Linux as an NFS server.) Simply export your OS image. In our example, on the FreeBSD NFS server at 192.0.2.7, we have a full Slackware image at /usr/xen/images/slack /usr/xen/images/slack. Our /etc/exports /etc/exports looks a bit like this: looks a bit like this: /usr/xen/images/slack-maproot=0192.0.2.222 We leave further server-side setup to your doubtless extensive experience with NFS. One easy refinement would be to make / read-only and shared, then export read-write VM-specific /var /var and and /home /home part.i.tions-but in the simplest case, just export a full image. part.i.tions-but in the simplest case, just export a full image.

NoteAlthough NFS does imply a performance hit, it's important to recall that Xen's network buffers and disk buffers are provided by the same paravirtualized device infrastructure, and so the actual network hardware is not involved. There is increased overhead in transversing the networking stack, but performance is usually better than gigabit Ethernet, so it is not as bad as you might think.

Now configure the client (CONFIG_IP_PNP=y). First, you'll need to make some changes to the domU's kernel to enable root on NFS: networking-> networkingoptions-> ip:kernellevelautoconfiguration If you want to do everything via DHCP (although you should probably still specify a MAC address in your domain config file), add DHCP support under that tree: CONFIG_IP_PNP_DHCP CONFIG_IP_PNP_DHCP: or CONFIG_IP_PNP_BOOTP CONFIG_IP_PNP_BOOTP if you're old school. If you are okay specifying the IP in your domU config file, skip that step. if you're old school. If you are okay specifying the IP in your domU config file, skip that step.

Now you need to enable support for root on NFS. Make sure NFS support is Y and not M; that is, CONFIG_NFS_FS=Y CONFIG_NFS_FS=Y. Next, enable root over NFS: CONFIG_ROOT_NFS=Y CONFIG_ROOT_NFS=Y. In menuconfig menuconfig, you can find that option under: Filesystems-> NetworkFileSystems-> NFSfilesystemsupport-> RootoverNFS Note that menuconfig menuconfig won't give you the option of selecting root over NFS until you select kernel-level IP autoconfiguration. won't give you the option of selecting root over NFS until you select kernel-level IP autoconfiguration.

Build the kernel as normal and install it somewhere where Xen can load it. Most likely this isn't what you want for a dom0 kernel, so make sure to avoid overwriting the boot kernel.

Now configure the domain that you're going to boot over NFS. Edit the domain's config file: #Rootdevicefornfs.

root="/dev/nfs"

#Thenfsserver.

nfs_server='38.99.2.7'

#Rootdirectoryonthenfsserver.

nfs_root='/usr/xen/images/slack'

netmask="255.255.255.0"

*** You are reading on https://webnovelonline.com ***

Popular Novel