A short time with MongoDB

I wanted to try something from the "NOSQL" camp and it looked like MongoDB is one of the darlings of the whole idea / "movement". If this is really representative of the whole group, it's very, very unimpressive. (update: there are better ones Wink)

Update: Here is a funny talk about NOSQL.

Update: New post available with updated information.

Update: Note that this post was written in 2009.

Update: Now, in 2013., MongoDB still has the same problems.


I have created a port for FreeBSD 8 for MongoDB but I won't be using it for anything because of these things I found out about it:

  • MongoDB has no, worth repeating: NO provisions for on-disk consistency. It doesn't fsync, it doesn't use algorithms that would ensure some degree of safety (e.g. journalling), nothing. It does lazy writes! Which means that in case of server or process crashes (which will happen with 100% probability) the whole database could very trivially be corrupted beyond repair. For someone like me who comes from the "real" database world, this makes it unusable.
  • Its performance sucks. MongoDB uses mmapped files for databases (i.e. it creates a large 32, 64, etc. MB files which it mmaps and works on the memory region). This kind of architecture is, at least in theory, excellent for many reasons, but apparently MongoDB gets only very mediocre performance out of it. Considering that it is, from the operating system's view, a memory database not unlike memcached, the performance I got from it - around 11,000 simple INSERTs per second (with the Python client) - is inadequate.

Apparently, the developers are willing to work on solving the data consistency issue (which will probably mean they'll have to abandon the pure-mmap approach or at least make very careful use of fsync()s) some time in the future, but I am puzzled by the low performance. Maybe the BSON overhead is too large?

Replication (which MongoDB has built-in) is not a replacement for on-disk consistency, for much the same reasons because RAID is not a replacement for backups and vice-versa: they simply solve different problems. To be fair, MongoDB's site does contain a page which lists usages which it isn't well suited for - which will hopefully help users who are not aware of these problems.

Of course, MongoDB isn't the only NOSQL database - OTOH there is CouchDB which apparently takes data integrity seriously, but opts for the lazy approach - perpetual contiguous journalling, which means old free space is never automatically reclaimed. And it uses a text protocol for exchanging data! Sigh... maybe I'll try it if it ever reaches version 1.0.

The whole "document store" thing looks a little too rudimentary, considering that PostgreSQL has an arbitrary key-value "document" field and also an XML field type, both of which are "smart", searchable and indexable, with impeccable data consistency, performance and even some modes of replication.


#1 Re: A short time with MongoDB

Added on 2009-11-07T05:31 by Sam Baskinger

Hey Ivan,

 

I don't think your experience with Mongo is representative of where the NOSQL groups are going. The idea is to have fast access by having horizontial scalling and replication. That also "solves" the disk write back consistency issues. Hope that gives you more "hope" in the movement. :) The magic is in the distribution of the system more than the single node instance.

#2 Re: A short time with MongoDB

Added on 2009-11-07T11:39 by Ivan Voras

It makes sense on one level, but I think it just opens another question - who is it for? Today, it is easier (cheaper) to get more hard drives than to get more servers - not everyone is Google :)

What I'm saying basically comes down to the question: "what is the disaster recovery scenario?" - and the minimum "disaster" here is a room-scale power outage which will take out all servers within the room thus, in case of MongoDB, corrupting all replicated instances. Since the on-disk state is never consistent during regular operation (since there are no consistency algorithms used neither is the OS memory image of the cached files), you cannot create reliable backups.

It looks like the whole thing relies on a large number of possibly geographically distributed servers so that never ever will all of them be down at the same time. Again, maybe possible for Google but I feel it's absolutely not for 90% of current users of MongoDB.

#3 Re: A short time with MongoDB

Added on 2009-11-11T12:31 by Ivan Voras

Also, all users who think data consistency via memory-only (or lazily-written) databases is so easy, they should ask themselves - why doesn't Google (which for all purposes has nearly-unlimited hardware resources for replication) doesn't do it this way :)

#4 Re: A short time with MongoDB

Added on 2009-11-14T18:19 by Emil Eifrem

Neo4j (http://neo4j.org) is taking a different route. We have a robust transactional core, which has been running in 24/7 production since 2003. We support XA-protocol transactions (yes, even including, -gasp!-, 2PC!), deadlock detection, transaction recovery, JTA, online snapshot backups, MVCC, etc.

Now we're rolling out replication and next in line is auto sharding. It's the other way around from a lot of NOSQL projects, but we feel that it's better to start with a solid transactional core and then go from there.

--

Emil Eifrem

http://neo4j.org

http://twitter.com/emileifrem

#5 Robust Storage

Added on 2009-11-14T18:28 by Chris Anderson

Ivan,

Thanks for making the point that we call them databases because we want to get back what we put in.

CouchDB uses pure tail-append storage with compaction because it is the simplest to know is reliable. As long as your disk respects fsyncs (or doesn't reorder writes) there is no possibility of CouchDB crashing in an inconsistent state.

We could implement in-place compaction with a free-space map but I honestly don't think it's worth it in terms of complexity. We're aiming to be the Honda Accord of databases, so simplicity is a prime goal.

Also, don't underestimate the performance benefits of pure tail-append storage (especially as we write binary attachments in parallel, so we can handle a lot of concurrent uploads.) Both SSDs and spinning platters have highest throughput (and I'd venture to say longer mtbf) with contiguous writes.

#6 Re: Robust Storage

Added on 2009-11-14T19:05 by Ivan Voras

I'm aware and fine with append storage as reliable method of storing data, but most serious implementations choose to limit the use of this method to small journalling-style logs instead of their core data set :)

Orthogonally to this: linearily written journals (of which append-only files are just a subtype) are not the *only* way of ensuring consistency, but have developed to be extremely popular with spinning disk media because it's the fastest way. The alternative is to do a lot of fsync-ed scattered writes all over the place, which while exactly as safe (or actually safer in some regards), is slow. SSDs will of course change this.

As FreeBSD's "softupdates" technology demonstrates, you don't specifically need a journal-like structure to achieve either consistency or performance in real-life file systems as long as the assumption of no write reorders by the drives (or the controller) is true. It's just noticably easier to do journalling than proper algorithms to order scattered writes so that used structures (metadata, indexes, etc.) maintain internal consistency in absolutely every point in time. Though this may sound insanly hard, it has been done :)

#7 Re: Robust Storage

Added on 2009-11-14T19:06 by Jonathan Ellis

Cassandra fsyncs, either every-N-ms or before acking a write, configurably.

#8 Re: Robust Storage

Added on 2009-11-14T19:26 by Jan Lehnardt

Hi Ivan,

Chris just said the append only log is easiest to get right. Wikipedia (heh) says getting softupdates right is hard (http://en.wikipedia.org/wiki/Soft_updates)* :)

And I'm not sure just because nobody treats the whole DB as a log it is a bad idea :) There are disadvantages (2x disk spaces needed for compaction) that can be solved with more complexity. But then, simplicity rules. I know of a high-volume production CouchDB setup that happily compacts / garbage collects more than 2^15 times a day.

 

Cheers

Jan

--

* I'm a huge fan of the *BSDs.

#9 Re: Robust Storage

Added on 2009-11-14T19:37 by Ivan Voras

It is true that universally, the simplest working solution is almost always good enough and worth pursuing, so I'm not going to nag CouchDB developers to suddely switch to another data model :)

On the other hand, most of the uses I have that could benefit from a document store like CouchDB also tend to have the property of data being often updated, so I'll just have to try and see if it starts to waste unacceptable amounts of disk space.

OTOH, there's still all this "we shall parse and construct JSON at every data query and entry" thing... you wouldn't believe how much performance can be gained from using a sane binary parsable format as opposed to a text format. I know, I did it for a (different) project.

#10 Re: Robust Storage

Added on 2009-11-14T19:47 by Jan Lehnardt

Totally, JSON is not the most optimal way, but it is, again, simple. Worth noting too is that a JSON parser can be faster than the much hailed protocol buffers: http://blog.juma.me.uk/2009/03/25/json-serializationdeserialization-faster-than-protocol-buffers/ 

Other comparative benchmarks with CouchDB suggest that neither (the perceived as "slow") HTTP and JSON are a real bottleneck for operation as we get disk bound soon enough.

We have a (different) production setup where we "shard by time" (a database a day). This keeps the actual GC-process manageable (there are other reasons for the setup, this is just a nice side-effect). But that adds complexity one level up from CouchDB and you might just not want to pay it.

Anyway, I don't want to turn this into a CouchDB lecture. The take-away lesson is that NoSQL comes in a lot of flavours and one solution is not representative for any other :)

Thanks for the worthwhile discussion!

 

Cheers

Jan

--

 

#11 Re: Robust Storage

Added on 2009-11-14T19:55 by Ivan Voras

Thank you!

#12 Re: Robust Storage

Added on 2009-11-14T20:38 by Sammy

I've been using MongoDB in production and getting around 55k inserts per second.  Not sure why your performance isn't great, but might be more helpful to ask questions and try to help rather than just criticize.  From what I've seen, the database often out performs some of the drivers.  So depends on which driver, how many indexes, etc...

As for durability, server process crashes don't corrupt anything since the files are still in the OS's memory.  On a os or power lossage, there is a risk of loss/corruption of course.  However, in my experience this is the exception, and the much more common problem is disk failure, which is why i'm happy using replication (lan and wan) for durability.

 

#13 Re: Robust Storage

Added on 2009-11-14T20:44 by Sammy

Also - one of the reasons I switched to MongoDB was because i lost some mysql data with innodb because of a power failure.  I trusted innodb too much, and ended up losing a couple hours of data.  Obviously my fault for not having enough replication, etc.. but the point is single failure durability is kind of a dangerous thing to rely on.

#14 Re: Robust Storage

Added on 2009-11-14T20:46 by Ivan Voras

Re: performance: I've added that I've used the Python driver, for what it's worth.

Re: what is a "common" problem and what isn't: once is enough :)

The comments on this post are public - if MongoDB developers can benefit from them, that's great!

#15 Re: Robust Storage

Added on 2009-11-14T20:48 by Sammy

I'm using python as well.  Wondering if its a freebsd thing.  Would love to see your benchmark code.  Would you mind publishing it?

#16 Re: Performance

Added on 2009-11-14T20:56 by Ivan Voras

Sure, it's a slight modification of one of pymongo examples. See here: http://ivoras.net/stuff/bigtest.py . Now that you mentioned how much better performance you get, I see that most of CPU time is spent in python, not in mongodb. I am using the C BSON extension (_cbson.so) so it really might be a bad client driver. What result do you get from my test?

#17 Re: Performance

Added on 2009-11-14T21:09 by Sammy

On my desktop:

28101.45 inserts/sec

python: 100% cpu

mongod: 45% cpu

Have you run a similar test on another db that has gone faster?  If so, we should send the mongo python guys the source for that driver to figure out how to make it faster :)

#18 Re: Performance

Added on 2009-11-14T21:45 by Ivan Voras

It's curious how you got more than 2x performance I did, but it's still kind of on the low end :)

I'm running python 2.6, a 64-bit OS, 2 GHz Core2 CPU, disk drives are not important since there is no disk IO during the test, mongodb 1.1.3 and pymongo 1.1.

I haven't tested it on other workloads - there may be cases where it is faster, but currently I don't have the time to experiment more with it.

#19 Re: Robust storage

Added on 2009-11-14T22:01 by dwight_mongodb

MongoDB does not use a transaction log but we have found in practice that this works just fine -- lots of sites using in production without problems.  I think an analogy to MySQL and innodb/myisam is appropriate.  The MySQL web site says myisam, which has similar durability to mongodb, can be many times faster than innodb.  That is the idea with mongo.  We think one size fits all for databases is over: if one is building a bond trading system i would use a different tool.


In the past also i have had several situations where i lost data with mirrored drives and a logging database when drives began to have hardware errors.  My experence is that redo logs with fsync are not enough to achieve true durability.

#20 Re: performance

Added on 2009-11-14T22:06 by dwight_mongodb

In general we hear (and see ourselves) very good things on performance with MongoDB.  It is very likely in your benchmark the Python client was the bottleneck.  Also i have never tested it myself with freebsd there could be some issue there.

It will not be as fast as memcached - it is not a pure key/value store and addition of ad hoc queries, secondary indexes, sorting, replication, etc. does have some overhead.  However for many problems it can be much faster than postgres, depending on the problem. 

Also think there are a lot of other interest properties beyond performance, such as easy development from object oriented programming languages, and horizontal scalability.

#21 Re: performance

Added on 2009-11-14T22:09 by dwight_mongodb

@Ivan curious what are you comparing to s.t. 28k inserts/second is slow ("low end")?

#22 Re: performance

Added on 2009-11-14T22:23 by Ivan Voras

@dwight_mongodb: "low end" - I've said before: I'm comparing it to other memory-only databases, like memcached. The reason why is that since there is no disk IO, everything happens completely in memory. When the OS is not told to flush data to the drives, mmaped memory is on the low level not different than anonymously allocated memory.

I suppose you will not agree with me because with MongoDB there is still the *option* of having the data on the disk eventually and MongoDB data can be more complex (BSON), but still - 28 kops/s for memory databases is what was achieved in the era of pentium 3s.

#23 Re: performance

Added on 2009-11-14T22:33 by dwight_mongodb

@ivan - yes we disagree -- i feel like your argument is "mongodb is slower than memcached therefore i will use postgres" :-)


i think better performance comparators would be postgres, mysql (myisam), couchdb, ...

i'm actually a fan of memcached - if that's all you need it is a great tool.

#24 Re: performance

Added on 2009-11-14T23:08 by Ivan Voras

I'm saying that, since it already abandoned on-disk consistency, MongoDB could be a lot faster than it already is :)

I'm not against databases such as MongoDB but I feel it, in particular, has missed some opportunities to make it better. It could have gone for data consistency but it didn't, but at the same time it appears not to have taken advantage of this decision to bulk up on performance.

I just couldn't resist now that this discussion is ongoning and did another experiment: I've created an equivalent python script that inserts the same records in a PostgreSQL database using the flexible ("document") key-value data type (hstore), and got the same performance as I did with MongoDB (around 11,000 INSERTs/s :) Though I cheated a bit: I used autocommit but disabled synchronous_commit, meaning the logging is still fully active and all consistency guarantees are there, but the last few seconds of committed transactions could be lost.

Again - I'm not trying to say the whole concept of MongoDB is bad, but that the implementation could be better in some specific places :)

#25 Re: performance

Added on 2009-11-15T02:59 by Sammy

Just to be clear, there is a big difference between 28k ops and 28k inserts.  

I think you're maxing out the wrong thing with this test.  The case is so simple, that with just about any database your'e probably maxing our CPU <->RAM, and database architecture probably isn't even a factor.  (This is why benchmarks are so hard to do well).

Would be curious to try your postgres code on my box as well though.   Would you mind posting that as well?

#26 Re: performance

Added on 2009-11-15T03:16 by Ivan Voras

Have fun with it: http://ivoras.net/stuff/ppg.py . Instructions are in the file.

Don't make too large a deal from it - I'm just saying that to get nearly as complex as PgSQL (which has a lot of overhead in this case: e.g. SQL parsing, transactions, etc), MongoDB has lots of room to grow.

#27 Re: performance

Added on 2009-11-15T03:17 by Sammy

i tired the same test with java btw, and got 51692 inserts/sec.

Then also changed the test to use _id as one the unique field.

Java: 55k Python: 32k

I got postgres to 16k on my box, but can't seem to get it higher.

#28 Re: performance

Added on 2009-11-15T04:28 by Sammy

A little more color for those curious.  The mongo java driver and postgres python driver have the same cpu usage ratio for this test.  100% db, 25% driver.

Java: 53140.610054203426 inserts/s

Postgres: 100000 INSERTS in 5.7 seconds: 17520.0 INSERTS/s

That's postgres with synchronous_commit off.  (was 3.5k with it on).  

 

#29 Re: performance

Added on 2009-11-15T08:52 by Chris Anderson

Ivan,

If you're curious about the effect the JSON/HTTP layer has on CouchDB, I'm guessing it's pretty significant. Speed-of-light from inside the Erlang layer is significantly faster (almost 2x, than via HTTP). However, we tell people not to interface directly via Erlang calls, because nothing is going to scale and deploy as smoothly as HTTP.

I haven't run the benchmark scripts since Damien's latest optimizations but I'm guessing that right now our speed-of-light (w/o HTTP/JSON overhead) is in the same ballpark as PostgreSQL. Since we're optimized for concurrency over serial speed I'm feeling like that's "fast enough" but we'll see, and of course we'll continue to remove bottlenecks as they become apparent.

#30 Re: performance

Added on 2009-11-15T17:37 by Josh

I'm not sure the performance numbers here make sense. Remember that mongodb is documented as opposed to row oriented. That means that you may be writing a lot more per insertion with mongodb, then with postgres.

If you are not denormalizing your data, you are missing something with these data stores.

 

Just curious, can anyone run this benchmark against couchdb?

 

#31 Re: performance

Added on 2009-11-15T17:39 by Emil Eifrem

I threw together an equivalent test using Neo4j. On a standard Dell server with dual 2.3Ghz Core2, 64-bit OS, standard SATA disks, I inserted 100k "documents" with two properties and a throughput of ~30k / seconds.

I tried to emulate what I think Sammy mentioned in #27 by removing one of the properties and use native ID for lookups and then throughput rose to ~45k / seconds. I sized every transaction to 10,000 "documents." This is fully ACID transactional with guaranteed consistency, durability and recoverability.

But it's a huge microbenchmark. It may serve as an indication, but at the end of the day the only thing that makes sense is to benchmark real world use cases similar to whatever domain we want to model.

#32 Re: performance

Added on 2009-11-15T18:59 by Jan Lehnardt

At this point I feel it necessary to link to two of my blog posts that discuss benchmarks: 

http://jan.prima.de/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html and http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html

Cheers

Jan

--

#33 Re: reliability

Added on 2009-12-13T14:12 by Ivan Voras

It looks like the developers made a modification for inclusion into their 1.2.x branch: http://jira.mongodb.org/browse/SERVER-442 - "durability: fsync files to disk every minute." I hope they know that this doesn't help durability at all except if there are absolutely no dynamic structures stored in the mmaped region (i.e. no trees, hash arrays, anything).

#34 Re: reliability

Added on 2010-02-19T02:57 by Mathias Stearn

We just posted to our blog about this topic: http://blog.mongodb.org/post/381927266/what-about-durability


@Ivan, since this page is frequently linked to would you mind adding that link to our blog into your main post?

#35 Re: reliability

Added on 2010-02-20T12:56 by Ivan Voras

@Mathias: thanks, I've written an updated post and liked it from the main post here.

#36 Re: Performance

Added on 2010-03-16T02:53 by Mark Smith

Just as an additional data point, (the discussion has moved on a bit, I know), I'm getting 36-38K inserts/s on my laptop, using the C# drivers.

#37 Re: Performance

Added on 2010-12-05T15:04 by prakash patidar

Hi ,

I use C# driver and getting 12-15k insert/second only.Its done on dual core laptop and I saw my C# client was taking 45% cpu(one core completely) and 25% of mongod process.Is it serialization which is hurting me from .net to bso nobject?

call = new Document();

call["_id"] = i;//int i

call["data"] = byt; //byte array of 2k chunk even i do byte array of 1 byte performance does not change much

calls.Insert(call);

do we have ways to bulk upload into it using c# or other way?

in my use case i need to append document ,can u share code for the same?

#38 Re: Performance

Added on 2011-03-07T18:28 by ASBai

The posix system use 'msync' to perform a file mapping flush. So it's may be ok if you simply could not found the 'fsync' call in its source code.

#39 Re: Performance

Added on 2011-08-19T11:55 by good luck

The idea of MongoDB is to use 32x more hardware to achieve the same as with traditional database systems. It just sucks.

#40 Re: Performance

Added on 2012-05-11T03:09 by iPhone guy
That was a QED response if I've ever seen one. Thanks good luck.

#41 Re: Performance

Added on 2014-03-03T17:59 by jbg

I wonder if mongo db is fashion or real stuff against legacy DBMS and even other NoSQL .

how will you manage consistency, and recovery (db process, server, power, cluster crash ?).

For the moment, and probably the next 10 years, you need it. Even if you are a web programmer and don't want to hear of it. saying "google" is consistent, use fsync, increase number of disks and server, yes. But what experiments for critical (eg customers) data, description, links, procedures ...

make the old farmer get wrong !

Comments !

blogroll

social