The night of 1000 jails

As FreeBSD 8.0 is right around the corner it's the right time to get it some more exposure. Just for kicks I got the idea to stress the Jails subsystem - the cheap (both in $$$ and resource requirements) OS-level virtualization technology present in FreeBSD for nearly 10 years now. Behold... the bootup of 1,000, count them - 1,000 virtual machines on a single host with 4 GB of RAM.


Introduction to jails

For the newcomers more familiar with VMWare and similar products, a FreeBSD jail is an operating system level partition of the userland. This means that the kernel, with all its kernel functions and resources is shared among the instantiated virtual machines, but that the individual systems cannot directly influence one another's processes.

There are several consequences of this method. The first one is that, obviously, it is operating system-specific. The kernel is shared so the individual "guest" virtual machines cannot choose to run some other kernel. A specific edge case for FreeBSD is that jails can run any userland the kernel supports; in particular this means that, for example, an older version of FreeBSD userland (like FreeBSD 4 - not that it will matter since the kernel is 8.x) can be used, or a Linux userland. The second major consequence is that all resources really are shared. Most importantly, the CPUs and memory are shared and this can be either a good thing or a bad thing, depending on specific usage scenarios. Jails can be restricted to specific CPUs if needed with the granularity of a logical CPU, but there is currently no such limits available for memory sharing (though some are in development). As a consequence, disk caches are shared among the guests, which can be very nicely exploited by using nullfs to mount across jails (keeping only one physical copy of libraries and other binaries). The network stack is currently also shared, though there is work to introduce more virtualization in it. As a special feature ready right now, up to 16 separate forward information bases (FIBs) can exist.

I'd like to emphasize once again how cheap (with regards to resource comsumption) this sort of virtualization is - in practice, it is not much different than starting N times the set of basic processes, of which each will behave like it normally does.

The 1000 jails challenge

FreeBSD has jails integrated with the overall system and that extends to their startup and shutdown. By default, all common jail configuration can be done in exactly the same way as the rest of the system is configured - in /etc/rc.conf. Some additional files may be needed per-jail, like a jail-specific fstab.

To automate the creation of all that configuration I wrote a simple script, mkjails.py which generates the configuration files for a set of 1000 identical light jails. Each individual jail will have these properties:

  • It will null-mount the relevant binary directories from the host (like /bin, /usr/bin, /lib, /usr/lib, etc.) but will have its own /etc, /var and /usr/local
  • It will have its own single IP address
  • It will start with a set of default FreeBSD processes like cron, syslog and sshd

Each jail will have its own configuration and after it is created is practically ready to be handled to an independant administrator who will be root within the jail. This administrator will be able to do everything to the system except upgrade its kernel and base userland (e.g. the admin will be able to install apache and have complete control over it, but will not be able to upgrade /bin/ls).

A single section of rc.conf.jails (geneated by mkjails.py for a single jail) looks something like this:

jail_j0001_rootdir='/jails/j0001'
jail_j0001_mount_enable='YES'
jail_j0001_fstab='/jails/fstab_j0001'
jail_j0001_hostname='j0001.cosmos'
jail_j0001_ip='10.0.1.3/16'
jail_j0001_interface='em0'

The properties described above mean the host system ended up with 1001 IP addresses assigned to a single NIC and with more than 14,000 individual mount points for the nullfs mounts.

One particular technology not shown here is the integration between jails and ZFS, which enables jail roots to administrate ZFS properties, including creation of file systems.

Without much more talk, here is what starting 1000 jails looks like:

OGG THEORA VIDEO

(for the impatient, feel free to skip to 29:45 for a bit less boring view)

The machine which did all that is equipped with 2x quad-core CPUs and 4 GB of RAM. During the experiment I feared that maybe the relatively low amount of RAM will prevent all 1000 jails to be created but it appears like I could easly create twice as many without problems. In reality, memory is probably the only limiting factor here and I could have created an arbitrary number of VMs, but 1000 is a nice round number.

As can be seen in the frames with "top" running, CPU usage is almost 0. This is because the jails are not doing anything in particular once they are started. The most CPU intensive part was the ssh RSA key generation.

In retrospect, this is an awesome result. Everything from the kernel downwards was perfectly stable before and after the experiment and the jails run flawlessly. This experiment was done without special kernel tuning - an out-of-the-box GENERIC kernel was used without any tunables and sysctls set.

Some observations:

  • Iit gets really interesting when every minute 1000 crons wake up to do their work :) Load averages spike to > 100. Obviously, crons would need to be turned off where uneeded.
  • To drive more than about 1000 jails, kern.maxproc will need to be increased (and probably kern.maxprocperuid).

So there it is - cheap, easy, low-weight virtualization that can be quickly set up and destroyed.


#1 Re: The night of 1000 jails

Added on 2009-10-20T16:56 by bryan

How long does it take to bootup now that the SSH keys have been generated?

#2 Re: The night of 1000 jails

Added on 2009-10-20T18:07 by Ivan Voras

It's around 10 times faster without ssh key generation.

#3 Re: The night of 1000 jails

Added on 2009-10-21T08:36 by Christer Solskogen

Could you post your python script for creating those jails?

#4 Re: The night of 1000 jails

Added on 2009-10-21T11:30 by Francisco Cabrita

Awesome :) I always loved FreeBSD Jail!

Once I had some FreeBSD Servers with no more than 20 jails on each doing email, samba, virtual domains with Apache and in some cases lighttpd to save resources... Jails are in fact a very very awesome solution to many virtualization environements.

thanks for your post/experiment and keep the good work.

Regards,

Francisco

#5 Re: The night of 1000 jails

Added on 2009-10-21T11:42 by Ivan Voras

I've linked to the mkjails.py script in the text!

#6 Re: The night of 1000 jails

Added on 2009-10-21T11:46 by DES

I'd be more impressed if I hadn't already been there and done that... ten years ago, on FreeBSD 3.4 (if I recall correctly), on what must have been a PIII with 1 GB RAM—and in production, not just for show.

Kids these days... :)

#7 Re: The night of 1000 jails

Added on 2009-10-21T12:01 by Christer Solskogen

Sweet! Thanks for the script :)

#8 Re: The night of 1000 jails

Added on 2009-10-24T00:33 by Andrew

Thank you Ivan. FreeBSD Jails are perhaps the best responses to today's optimization needs in the whole IT world, and I'm really committed to make the most IT pros aware of this big opportunity to cut costs and increase efficiency, and reducing complexity at the same time.

I think that both network stack virtualization and ZFS support are essentials pieces which complete the technical picture, but we need a comprehensive management tool to convince IT staffs to adopt this great solution. I feel the best candidate to become such a definitive tool for managing jails in production business environments is the ezjail framework (http://erdgeist.org/arts/software/ezjail/), but it should leverage the latest features (FIBs, vimage, ZFS, etc.) as soon as they are available for each -RELEASE.

With such tools in the hands, I'll definitely be able to make a bunch of sysadmins switch from the most famous hardware-virtualization tools to FreeBSD Jails, today!

#9 Re: The night of 1000 jails

Added on 2009-10-24T00:44 by Ivan Voras

I have tried using ezjals and after years of building my own jail scripts and setup I find it a bit overcomplicated. It would be nice to have an unified jail admin tool with all fancy trimmings but I think that for it to be truly useful it needs to get into the base system, which complicates things a bit.

#10 Re: The night of 1000 jails

Added on 2009-10-27T20:48 by kace

You pointed out that you could run a linux userland in a jail.  Seems like it should be possible to use the work being done in Debian GNU/kFreeBSD or Gentoo/FreeBSD to make that happen.  Wow.

#11 Re: The night of 1000 jails

Added on 2009-11-03T15:18 by tom
kace, you can run linux apps in freebsd by installing the linux compatibility package.

#12 Re: The night of 1000 jails

Added on 2009-11-04T18:28 by kace

tom, what I'm saying/asking is could you run an entire linux instance in a jail?  The Debian and Gentoo projects I mentioned are intended to run their distros on the FreeBSD kernel, which is the common part of jails.  ...  So, linux virtual servers inside FreeBSD jails??

#13 Re: The night of 1000 jails

Added on 2009-11-10T20:59 by kelly martin

Cool, I've always wondered what kind of overhead jails have. I didn't realize the kernel was shared, so I learned something new today too.

#14 Debian inside freebsd

Added on 2009-11-17T12:54 by Jamie

Kace, we do that. A few of the debian startup scripts needed tweaking, and linprocfs is not 'jail safe' (in that it leaks the mountpoints of the parent) but other than that it works fine:

 

postfix:~# w
 11:54:30 up 20:29,  2 users,  load average: 0.06, 0.07, 0.07
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     ttyp0    xx.xx.xx.xx    11:51    0.00s  0.00s  0.44s -bash
jg       ttyp1    xx.xx.xx.xx   09:59   30:58   0.00s  1.43s /usr/sbin/sshd -R
postfix:~#
postfix:~# uname -a
GNU/kFreeBSD xx.xx.com 8.0-RC2 FreeBSD 8.0-RC2 #0: Sun Oct 25 08:55:51 UTC 2009     root@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 i386 Intel(R) Celeron(R) CPU 2.40GHz GNU/kFreeBSD

#15 Re: Debian inside freebsd

Added on 2010-05-18T15:57 by asenchi

The link to your mkjails.py script is broken. Would you be able to update that?

#16 Re: Debian inside freebsd

Added on 2012-04-29T10:32 by

try -J and -j on cron to lessen the impact of that many cron running.

Comments !

blogroll

social