Adventures with FOG project, aka "All software sucks"

[this post is basically me venting after an all-nighter, it's way too harsh than it should have been]

An alternative title to this could be "if something looks too good to be true, it usually is."

We needed to uplift 80+ desktop PCs from a pile of cheap parts to usable state in about two weeks. Nothing to it, right? Mass installations and cloning are done every day, right? Well, no. I basically had two projects to pick from - Clonezilla and FOG, and picked FOG because Clonezilla is old and clunky, while FOG has a relatively nice and powerful web interface. As it turned out, either of them would have collapsed under the task...


The basic setup was to install a template computer with Windows 7 and Ubuntu 10.04 dual-boot and clone it to the 80-some other computers. To do this, we figured two weeks would be more than enough. But no cigar.

Basically, we spent 4/5ths of the planned time installing only the Windows part of the system, with a huge amount of time spent reconciling Windows 7 and our old Win2k3 Active Domain controller. Only then, basically two days to the deadline, we installed Ubuntu (thankfully, this was over in two afternoons - one for the OS and one to integrate it with the AD) and started working on imaging and distribution.

And this point - cloning the systems - is where we got such a hugely unexpected mess of problems that it left a "will not touch this in the next 10 years with a shitty stick" type of feeling for the whole process. I will simply skip the preamble dealing with configuring our network (AD-controlled) for PXE, reconfiguring firewalls to allow TFTP and NFS from the FOG server to the labs,  hitting Samba and Winbind on the head to actually accept the AD etc. and go straight for the meat.

Problem #1

Well, taking an image shouldn't be a problem, right? Erm, no. Not if it turns out we have a network problem with packets lost which turns the 1 Gbit streaming TCP imaging process to basically sub-100 Mbit/s speed. After fiddling around it was clear that we will spend more time debugging the network so we sucked it up and went with it.

Problem #2

Well, we now have the image - it shouldn't be too hard to deploy it? After all, a test run we did earlier in the setup of the template system showed absolutely no problems? No way.

What would you think of the error message "Bad partition table - invalid partition signature 0" when we attempted to deploy the image? FOG spews this out in the "Checking disks" phase and refuses to consider any other option but stopping loudly.

It turns out it's a bug in FOG. I haven't found any explanation of it except for a few mysterious "HELP ME! IT DOESN'T WORK!" messages in forums which went unanswered. Here is what is going on: FOG creates its image (of type "single drive, all partitions, non-resizable") in the following way:

  1. record the MBR
  2. for each partition: read the free/allocated space map from the file system, dump the allocated data

The deployment process goes similarily:

  1. restore the MBR
  2. for each partition: dump the recorded data to the partition

Looks good but contains two vital flaws. This section is about Flaw #1: if the image contains an extended partition (basically, a nested partition within a MBR partition), when the MBR is restored (with dd) on the deployment computer, it will contain a record of the extended partition, but the nested partition table to which the MBR now points will contain garbage. In particular, on a freshly minted hard drive, it will contain All Zeroes, which, when the MBR is dumped and partprobe invoked, will cause it to drop dead of consternation with the above error.

The solution to this is as follows: PXE boot FOG menu, choose "Debug" options, in the given debug console invoke "fdisk /dev/sda" , watch gleefully as the fdisk complains about the "Zero" problem, accept its offer to fix the problem to you and write the partition back (with "w"). Now, when restoring the image, FOG will restore the MBR, which will again point to the same garbage but with one important difference: this particular byte of the MBR will Not Be Zero and all will be well, for now.

Apparently, noone has caught this problem before because noone images with extended partitions on empty disk drives?

Problem #3

Ok now, FOG restored the image. Surerly the troubles are over, right? Of course not. As it turns out, the imaged computers are unbootable. No error messages, nothing from either GRUB or Windows boot loader, just a lonely text mode cursor blinking on the screen. Changing the active partition on such systems yields precisely no effect.

Which bring us to Flaw #2 of the FOG imaging process: it records *used blocks in the file system*. Unix boot loaders are actually written near the superblock (in front or behind it), which is *not* counted by the used space bitmaps. So FOG, in all its wizdom, skipped recording GRUB and since GRUB was resposible for the dual-boot magic, everything went "poof".

The solution is this: boot the imaged computer from a CD containing "Super GRUB2 CD", ask this recovery tool to do its magic and locate any possible GRUB2 configurations on the drive, watch happily as it does its job properly and reads the grub.conf file from the imaged file system, presenting you with boot options. Next, boot the installed Linux from the hard drive into recovery mode, select "Fix GRUB" from the menu presented here, and *then*, in addition to that, activate the single-user root console (or netroot, if you need it), and do "grub-install /dev/sda".

NOW, reboot and all is well.

Apparently people have caught this problem earlier but I could not find any details except a foum posts saying "yeah, both FOG and clonezilla suck if you need to clone dual-boot systems". It looks like this problems only affects dual-boot images for some reason. Maybe FOG tries to be overly clever instead of dumping the first few MBs of active partitions verbatim to the image?

The waiting

Two things should make clear how bad all this went: 1) all production imaging was done unicast because we couldn't reconfigure our network for cross-IP-network multicast (we have a large network), with 1 Gbit/s uplinks to the labs and 100 Mbit/s switches inside the labs; and 2) the two recovery steps for FOG I've described above needed to be done for every single imaged machine, pre- and post- imaging. In addition to that, there were many broken runs, imaging 30-or-so computers over 100 Mbit/s unicast, only to find the imagings failed because of a combination of the above factors. Every boot device change from PXE to CD to HDD required going to BIOS which was password protected (because we were trying to implement a security policy). Tedious. It would be easier to just buy Ghost or Acronis.

The sickness

I truly believe all this is making me mentally ill. I literally visually parsed MBRs, debugged bootcode in GRUB and other Linux code to find out what happened here and why did it not work. I know way too much about all this to keep my sanity.

I want to be a tourist guide in some nice scenic country.

 

Update: Despite that this post sounds pretty harsh, Fog project is actually pretty nice and once I learned what works and what doesn't, I have no problems using it and even recommending it to friends. We originally chose it because, being Linux-based, had a better chance in dealing with Linux, and (of course, no surprise here) to save per-seat money on a potentially large deployment of cloned machines (this is only the first wave). It ended being pretty popular and even my collegues like it so it is probably here to stay.


#1 Re: Adventures with FOG project, aka "All software sucks"

Added on 2010-09-25T06:18 by Jonathan
Wow! Thanks for sharing. I really like the idea of FOG, but it's given us (small company, 20ish workstations) odd stuff to-- except I've still got my sanity, and can't debug it. :-)

#2 Re: Adventures with FOG project, aka "All software sucks"

Added on 2010-09-25T20:34 by sprewell

Perhaps you should have just paid for a real product for your work? ;) I'm all for tinkering with open-source solutions on your own time, but frankly I find your griping directed at this FOSS project tiresome.  "Oh no, someone gave away their software and it doesn't do everything a real, paid product does!"  I'm all for detailing the problem and it obviously frustrated you, but there's a sense of entitlement underlying it that's hard to take.

#3 Re: Adventures with FOG project, aka "All software sucks"

Added on 2010-09-26T00:35 by darq

What an adventure:) Really thanks for sharing. I assume all this was happening on FER. The Ubuntu part sounds good so I hope Im not wrong with the location cause this would be a great progress in using OSS compared to now and all this MS stuff we have:)

#4 Re: Entitlement

Added on 2010-09-26T04:26 by Ivan Voras

You are wrong if you think that I wouldn't write just as frustrated and pissed off post if I used a "real product" - I tend to use and promote software based on its technical merits and fitness for a purpose and try to avoid bringing the "Open" vs "Closed" philosophy into it. A consequence of this is that I do not feel the need to "help" FOSS by special promotion or deliberately overlooking its flaws. I'm an evangelist, not an apologetic, but I'm striving for the "pragmatist" badge. I've been known to recommend Active Directory where it made more sense than the alternative :) And it is probably a large reasons of why I'm a BSD guy.

That said, this post was my way of venting out frustration and 24 hours later it does seem a bit impulsive. I'm sticking by it of course.

FOSS has obvious benefits; for instance, as soon as I get some sleep I'll write up proper bug reports for FOG.

#5 Re: Where

Added on 2010-09-26T04:31 by Ivan Voras

Yup, it's at FER. But there will be problems that need to be solved during usage; for instance I have a feeling that I didn't quite succeed in convincing AD to allow Windows home directory access to ordinary users.

#6 Re: Entitlement

Added on 2010-09-27T06:51 by sprewell

Ivan, I think you read something into my comment that wasn't there, nowhere did I say you wouldn't have criticized a real product.  I never argued that you need to take it easy on OSS products: rather, it's understandable to me that the OSS product is inferior, because it doesn't have any money behind it. :) Your criticism is fine, as I said, it's the seeming entitlement, that the OSS product must do everything you want it to, that I found annoying.  My point is that OSS generally has extremely limited resources and the majority of OSS projects will not have the fit and finish to be used at work on real tasks like you were trying.  I found the fact that you expected it to work surprising. ;)

#7 Re: Entitlement

Added on 2010-09-27T12:02 by Ivan Voras

From your reply it looks like I've understood your first message just fine, we just don't agree on the topic. You sort of have a point in this specific case if looked from a different perspective - since FOG's self-labeled version is 0.29 it is "normal" for it not to be finished and I was a willing tester for an incomplete product (and will report found bugs). But not in general: I will not agree that OSS products are inferior. I think they can be as good or better than the closed-source alternatives and as such will not agree to "not expect it to work." Otherwise, it is equivalent to saying OSS is only fit for hobbyists and non-serious use.

If you think this is wrong then I'm perfectly content to agree that we disagree.

#8 Re: Entitlement

Added on 2010-09-27T20:54 by sprewell

Yes, that is exactly my point, that OSS is mostly only fit for hobbyists and non-serious use, because the vast majority of OSS products don't have the money to compete with closed-source products.  There are a few products like Chromium or Android that giant corporations like Google give away for free that work very well, but those are the exceptions, not the rule.  If I were you, I'd have tried FOG, then ditched it and paid for Ghost as soon as it didn't work, which would have been cheaper than all the time you spent getting FOG to work.  Again, no problem with criticism of FOG or other OSS, but when you say its deficiencies made you "mentally ill," combined with all your other over the top statements, it implies a sense of entitlement that the OSS, which its authors were nice enough to give away for free, should do everything Ghost does, which I find peculiar and tiresome.

#9 Re: Entitlement

Added on 2010-10-07T15:30 by Leonidas Tsampros

I have being doing that what you described for windows only workstations with the commercial Norton Ghost. Imagine the hell of having to locate modern NIC drivers to load at the PXE DOS image that I had to use. 

FOG was on my TODO list for a LONG time but I never managed to find the time to improve and FOSS the process I used. It would be great for FOG to get improved to the point where a couple clicks will do what you want hassle-free.

However, I won't be having to clone computers massively on maintenance windows for a long time. And you can't even imagine how happy I am about that.
P.S.: Another invaluable tool that I paid for which saved me lots of maintenance headaches is DeepFreeze (in case your workstations are public).

#10 Mandriva Pulse 2

Added on 2010-12-13T15:54 by Fredo

Hi there,

I found your article as I was encountering similar problems with windows images, using FOG. I resorted by cloning my clients using FOG's DD method instead of partimage (extremely long process, but it works!)

Also, Mandriva's Pulse2 v1.3.0 is out since a few days, and now features bare-metal imaging. It's very similar to FOG but the cloning process seems to be more "solid" (I say "seems" because I haven't tried yet, but I intend to)

see more at

http://pulse2.mandriva.org

#11 Re: Mandriva Pulse 2

Added on 2010-12-13T16:04 by Ivan Voras

thanks!

#12 What do we prepare the target PC before deployment?

Added on 2010-12-25T17:24 by Terry Hieu

Hi all,

I have some confusion on the target PC. Do we need to create partitions on the target PC the same as the source image before deploying the image to it?

Could anyone help me?

Many thanks,

Terry

#13 Re: What do we prepare the target PC before deployment?

Added on 2010-12-25T18:14 by Ivan Voras
Fog supports different modes of operation - you can save and restore whole disks (with all partitions).

#14 Re: What do we prepare the target PC before deployment?

Added on 2011-03-14T05:31 by Jason Toews

Fog has worked wonder for our business network...Fog is really meant for Windows based imaging and not really recommended for Linux based imaging.

Fog took some time for me to understand, but I wasn't trying to ride by the seat of my pants before using it... now I've made some modifications to the php and sql it is a wonderful software for deployment... I use snapins to deploy all of our applications as well.

#15 Have you reported the bugs to the FOG Project?

Added on 2011-03-16T14:23 by Adrian Zaugg

Dear Ivan

Have you had the time to report the problrms encountered? It seems none of the problems were solved in the recent 0.30 release. I'm suffering currently under the same problems.

Regards, Adrian.

#16 Re: Have you reported the bugs to the FOG Project?

Added on 2011-03-16T14:31 by Ivan Voras

I have reported one problem and another one (I don't recall which one of them) was previously reported so I didn't create duplicates.

#17 Re: Have you reported the bugs to the FOG Project?

Added on 2011-03-17T01:09 by Adrian Zaugg

I didn't find the one concerning GRUB. It is now there: ID 3217804

Regards, Adrian.

Comments !

blogroll

social