I have created a port for FreeBSD 8 for MongoDB but I won't be using it for anything because of these things I found out about it:
- MongoDB has no, worth repeating: NO provisions for on-disk consistency. It doesn't fsync, it doesn't use algorithms that would ensure some degree of safety (e.g. journalling), nothing. It does lazy writes! Which means that in case of server or process crashes (which will happen with 100% probability) the whole database could very trivially be corrupted beyond repair. For someone like me who comes from the "real" database world, this makes it unusable.
- Its performance sucks. MongoDB uses mmapped files for databases (i.e. it creates a large 32, 64, etc. MB files which it mmaps and works on the memory region). This kind of architecture is, at least in theory, excellent for many reasons, but apparently MongoDB gets only very mediocre performance out of it. Considering that it is, from the operating system's view, a memory database not unlike memcached, the performance I got from it - around 11,000 simple INSERTs per second (with the Python client) - is inadequate.
Apparently, the developers are willing to work on solving the data consistency issue (which will probably mean they'll have to abandon the pure-mmap approach or at least make very careful use of fsync()s) some time in the future, but I am puzzled by the low performance. Maybe the BSON overhead is too large?
Replication (which MongoDB has built-in) is not a replacement for on-disk consistency, for much the same reasons because RAID is not a replacement for backups and vice-versa: they simply solve different problems. To be fair, MongoDB's site does contain a page which lists usages which it isn't well suited for - which will hopefully help users who are not aware of these problems.
Of course, MongoDB isn't the only NOSQL database - OTOH there is CouchDB which apparently takes data integrity seriously, but opts for the lazy approach - perpetual contiguous journalling, which means old free space is never automatically reclaimed. And it uses a text protocol for exchanging data! Sigh... maybe I'll try it if it ever reaches version 1.0.
The whole "document store" thing looks a little too rudimentary, considering that PostgreSQL has an arbitrary key-value "document" field and also an XML field type, both of which are "smart", searchable and indexable, with impeccable data consistency, performance and even some modes of replication.