Evaluating persistent, replicated message queues (updated w/ Kafka)

An updated and extended version of this post is available on SoftwareMill’s website. Below you can find the original content containing benchmarks from 2014.

Update 17/07/2014: added Kafka benchmarks
Update 20/03/2015: updated HornetQ replication description
Update 4/05/2015: updated & extended version here

Message queues are useful in a number of situations; for example when we want to execute a task asynchronously, we enqueue it and some executor eventually completes it. Depending on the use case, the queues can give various guarantees of message persistence and delivery. For some use-cases, it is enough to have an in-memory message queue. For others, we want to be sure that once the message send completes, it is persistently enqueued and will be eventually delivered, despite node or system crashes.

To be really sure that messages are not lost, we will be looking at queues which:

  • persist messages to disk
  • replicate messages across the network

Ideally, we want to have 3 identical, replicated nodes containing the message queue.

There is a number of open-source messaging projects available, but only a handful supports both persistence and replication. We’ll evaluate the performance and characteristics of 5 message queues:


While SQS isn’t an open-source messaging system, it matches the requirements and I’ve recently benchmarked it, so it will be interesting to compare self-hosted solutions with an as-a-service one.

MongoDB isn’t a queue of course, but a document-based NoSQL database, however using some of its mechanisms it is very easy to implement a message queue on top of it.

By no way this aims to be a comprehensive overview, just an evaluation of some of the projects. If you know of any other messaging systems, which provide durable, replicated queues, let me know!

Testing methodology

All sources for the tests are available on GitHub. The tests run a variable number of nodes (1-8); each node either sends or receives messages, using a variable number of threads (1-25), depending on the concrete test setup.

Each Sender thread tries to send the given number of messages as fast as possible, in batches of random size between 1 and 10 messages. For some queues, we’ll also evaluate larger batches, up to 100 or 1000 messages. After sending all messages, the sender reports the number of messages sent per second.

The Receiver tries to receive messages (also in batches), and after receiving them, acknowledges their delivery (which should cause the message to be removed from the queue). When no messages are received for a minute, the receiver thread reports the number of messages received per second.


The queues have to implement the Mq interface. The methods should have the following characteristics:

  • send should be synchronous, that is when it completes, we want to be sure (what sure means exactly may vary) that the messages are sent
  • receive should receive messages from the queue and block them; if the node crashes, the messages should be returned to the queue and re-delivered
  • ack should acknowledge delivery and processing of the messages. Acknowledgments can be asynchronous, that is we don’t have to be sure that the messages really got deleted.

The model above describes an at-least-once message delivery model. Some queues offer also other delivery models, but we’ll focus on that one to compare possibly similar things.

We’ll be looking at how fast (in terms of throughput) we can send and receive messages using a single 2 or 3 node message queue cluster.


Mongo has two main features which make it possible to easily implement a durable, replicated message queue on top of it: very simple replication setup (we’ll be using a 3-node replica set), and various document-level atomic operations, like find-and-modify. The implementation is just a handful of lines of code; take a look at MongoMq.

We are also able to control the guarantees which send gives us by using an appropriate write concern when writing new messages:

  • WriteConcern.ACKNOWLEDGED (previously SAFE) ensures that once a send completes, the messages have been written to disk (though it’s not a 100% durability guarantee, as the disk may have its own write caches)
  • WriteConcern.REPLICA_ACKNOWLEDGED ensures that a message is written to the majority of the nodes in the cluster

The main downside of the Mongo-based queue is that:

  • messages can’t be received in bulk – the find-and-modify operation only works on a single document at a time
  • when there’s a lot of connections trying to receive messages, the collection will encounter a lot of contention, and all operations are serialised.

And this shows in the results: sends are faster then receives. But overall the performance is quite good!

A single-thread, single-node setup achieves 7 900 msgs/s sent and 1 900 msgs/s received. The maximum send throughput with multiple thread/nodes that I was able to achieve is about 10 500 msgs/s, while the maximum receive rate is 3 200 msgs/s, when using the “safe” write concern.

Threads Nodes Send msgs/s Receive msgs/s
1 1 7 968,60 1 914,05
5 1 9 903,47 3 149,00
25 1 10 903,00 3 266,83
1 2 9 569,99 2 779,87
5 2 10 078,65 3 112,55
25 2 7 930,50 3 014,00

If we wait for the replica to acknowledge the writes (instead of just one node), the send throughput falls to 6 500 msgs/s, and the receive to about 2 900 msgs/s.

Threads Nodes Send msgs/s Receive msgs/s
1 1 1 489,21 1 483,69
1 2 2 431,27 2 421,01
5 2 6 333,10 2 913,90
25 2 6 550,00 2 841,00

In my opinion, not bad for a very straightforward queue implementation on top of Mongo.


SQS is pretty well covered in my previous blog, so here’s just a short recap.

SQS guarantees that if a send completes, the message is replicated to multiple nodes. It also provides at-least-once delivery guarantees.

We don’t really know how SQS is implemented, but it most probably spreads the load across many servers, so including it here is a bit of an unfair competition: the other systems use a single replicated cluster, while SQS can employ multiple replicated clusters and route/balance the messages between them. But since we have the results, let’s see how it compares.

A single thread on single node achieves 430 msgs/s sent and the same number of msgs received.

These results are not impressive, but SQS scales nicely both when increasing the number of threads, and the number of nodes. On a single node, with 50 threads, we can send up to 14 500 msgs/s, and receive up to 4 200 msgs/s.

On an 8-node cluster, these numbers go up to 63 500 msgs/s sent, and 34 800 msgs/s received.



RabbitMQ is one of the leading open-source messaging systems. It is written in Erlang, implements AMQP and is a very popular choice when messaging is involved. It supports both message persistence and replication, with well documented behaviour in case of e.g. partitions.

We’ll be testing a 3-node Rabbit cluster. To be sure that sends complete successfully, we’ll be using publisher confirms, a Rabbit extension to AMQP. The confirmations are cluster-wide, so this gives us pretty strong guarantees: that messages will be both written to disk, and replicated to the cluster (see the docs).

Such strong guarantees are probably one of the reasons for poor performance. A single-thread, single-node gives us 310 msgs/s sent&received. This scales nicely as we add nodes, up to 1 600 msgs/s:


The RabbitMq implementation of the Mq interface is again pretty straightforward. We are using the mentioned publisher confirms, and setting the quality-of-service when receiving so that at most 10 messages are delivered unconfirmed.

Interestingly, increasing the number of threads on a node doesn’t impact the results. It may be because I’m incorrectly using the Rabbit API, or maybe it’s just the way Rabbit works. With 5 sending threads on a single node, the throughput increases just to 410 msgs/s.

Things improve if we send messages in batches up to 100 or 1000, instead of 10. In both cases, we can get to 3 300 msgs/s sent&received, which seems to be the maximum that Rabbit can achieve. Results for batches up to 100:

Threads Nodes Send msgs/s Receive msgs/s
1 1 1 829,63 1 811,14
1 2 2 626,16 2 625,85
1 4 3 158,46 3 124,92
1 8 3 261,36 3 226,40

And for batches up to 1000:

Threads Nodes Send msgs/s Receive msgs/s
1 1 3 181,08 2 549,45
1 2 3 307,10 3 278,29
1 4 3 566,72 3 533,92
1 8 3 406,72 3 377,68


HornetQ, written by JBoss and part of the JBossAS (implements JMS) is a strong contender. Since some time it supports over-the-network replication using live-backup pairs. I tried setting up a 3-node cluster, but it seems that data is replicated only to one node. Hence here we will be using a two-node cluster.

This raises a question on how partitions are handled; if a node dies, the fact is detected automatically, but then we can end up with two live servers (unless we have more nodes in the cluster), and that rises the question what happens with the data on both primaries when the connection is re-established. Overall, the replication support and documentation is worse than for Mongo and Rabbit.

Although it is not clearly stated in the documentation (see send guarantees), replication is synchronous, hence when the transaction commits, we can be sure that messages are written to journals on both nodes. That is similar to Rabbit, and corresponds to Mongo’s replica-safe write concern.

The HornetMq implementation uses the core Hornet API. For sends, we are using transactions, for receives we rely on the internal receive buffers and turn off blocking confirmations (making them asynchronous). Interestingly, we can only receive one message at a time before acknowledging it, otherwise we get exceptions on the server. But this doesn’t seem to impact performance.

Speaking of performance, it is very good! A single-node, single-thread setup achieves 1 100 msgs/s. With 25 threads, we are up to 12 800 msgs/s! And finally, with 25 threads and 4 nodes, we can achieve 17 000 msgs/s.

Threads Nodes Send msgs/s Receive msgs/s
1 1 1 108,38 1 106,68
5 1 4 333,13 4 318,25
25 1 12 791,83 12 802,42
1 2 2 095,15 2 029,99
5 2 7 855,75 7 759,40
25 2 14 200,25 13 761,75
1 4 3 768,28 3 627,02
5 4 11 572,10 10 708,70
25 4 17 402,50 16 160,50

One final note: when trying to send messages using 25 threads in bulks of up to 1000, I once got into a situation where the backup considered the primary dead even though it was working, and another time when the sending failed because the “address was blocked” (in other words, queue was full and couldn’t fit in memory), even though the receivers worked all the time. Maybe that’s due to GC? Or just the very high load?


Kafka takes a different approach to messaging. The server itself is a streaming publish-subscribe system. Each Kafka topic can have multiple partitions; by using more partitions, the consumers of the messages (and the throughput) may be scaled and concurrency of processing increased.

On top of publish-subscribe with partitions, a point-to-point messaging system is built, by putting a significant amount of logic into the consumers (in the other messaging systems we’ve looked at, it was the server that contained most of the message-consumed-by-one-consumer logic; here it’s the consumer).

Each consumer in a consumer group reads messages from a number of dedicated partitions; hence it doesn’t make sense to have more consumer threads than partitions. Messages aren’t acknowledged on server (which is a very important design difference!), but instead message offsets processed by consumers are written to Zookeeper, either automatically in the background, or manually. This allows Kafka to achieve much better performance.

Such a design has a couple of consequences:

  • messages from each partition are processed in-order. A custom partition-routing strategy can be defined
  • all consumers should consume messages at the same speed. Messages from a slow consumer won’t be “taken over” by a fast consumer
  • messages are acknowledged “up to” an offset. That is messages can’t be selectively acknowledged.
  • no “advanced” messaging options are available, such as routing, delaying messages, re-delivering messages, etc.

You can read more about the design of the consumer in Kafka’s docs.

To achieve guaranteed sends and at-least-once delivery, I used the following configuration (see the KafkaMq class):

  • topic is created with a replication-factor of 3
  • for the sender, the request.required.acks option is set to 1 (a send request blocks until it is accepted by the partition leader – no guarantees on replication, though)
  • consumer offsets are committed every 10 seconds manually; during that time, message receiving is blocked (a read-write locked is used to assure that). That way we can achieve at-least-once delivery (only committing when messages have been “observed”).

Now, to the results. Kafka’s performance is great. A single-node single-thread achieves about 2 550 msgs/s, and the best result was 33 500msgs/s with 25 sending&receiving threads and 4 nodes.

Threads Nodes Send msgs/s Receive msgs/s
1 1 2 558,81 2 561,10
5 1 11 804,39 11 354,90
25 1 29 691,33 27 093,58
1 2 5 879,21 5 847,03
5 2 14 660,90 13 552,35
25 2 29 373,50 27 822,50
5 4 22 271,50 20 757,50
25 4 33 587,00 31 891,00

Kafka has a big scalability potential, by adding nodes and increasing the number of partitions; however how it scales exactly is another topic, and would have to be tested.

In summary

As always, which message queue you choose depends on specific project requirements. All of the above solutions have some good sides:

  • SQS is a service, so especially if you are using the AWS cloud, it’s an easy choice: good performance and no setup required
  • if you are using Mongo, it is easy to build a replicated message queue on top of it, without the need to create and maintain a separate messaging cluster
  • if you want to have high persistence guarantees, RabbitMQ ensures replication across the cluster and on disk on message send.
  • HornetQ has great performance with a very rich messaging interface and routing options
  • Kafka offers the best performance and scalability

When looking only at the throughput, Kafka is a clear winner (unless we include SQS with multiple nodes, but as mentioned, that would be unfair):


It is also interesting to see how sending more messages in a batch improves the throughput. As already mentioned, when increasing the batch size from 10 to 100, Rabbit gets a 2x speedup, HornetQ a 1.2x speedup, and Kafka a 2.5x speedup, achieving about 89 000 msgs/s!


There are of course many other aspects besides performance, which should be taken into account when choosing a message queue, such as administration overhead, partition tolerance, feature set regarding routing, etc.

Do you have any experiences with persistent, replicated queues? Or maybe you are using some other messaging solutions?

  • Gabriel Ciuloaica

    It will be interesting to add also Kafka to the table….

  • Definitely, my bad.

    Stay tuned, will be there in ~2 weeks (vacations next week ;) ). I updated the blog to mention Kafka.

  • Josh Kuhn

    You may also want to check out RethinkDB, the newest release adds change feeds: http://rethinkdb.com/docs/changefeeds/

  • Interesting, though I guess that would be kind of an event stream, for point-to-point messaging you would need consumers to somehow know which changes where consumed by which consumer

  • Otter3000

    Is it correct to conclude that Mongo does not scale up or out in this situation?

    Great read, btw! Seems very thorough work to me.

  • Mongo accepts writes only on a single node, so there’s certainly a limit on how much writes per second a node can accept.

    The scale-out strategy is sharding, which however wouldn’t work in case of a message queue (you’d want to receive messages from each shard).

    Instead, you can apply a simple scaling technique to all kinds of queues: set up multiple copies of the message queue cluster, and send/receive from random nodes (see also https://github.com/twitter/kestrel)

  • J-C Berthon

    What would be an interesting “performance” measure to add on top of your research would be the latency. How long does it takes for a message (or batch of messages) to reach the other end?

  • True – maybe a topic for another blog post :)

  • Jacek Pospychala

    RE Rabbit scaling poorly on a single node maybe that’s because it uses one thread per queue on a node. So splitting the traffic across more queues should give better results.

  • Tom Ellis

    Maybe give ActiveMQ a go while you’re playing with Kafka?

  • Gabriel Ciuloaica

    Thanks Adam. Kafka results are as expected.

  • John Lon

    If you perform the same test on a long fat network. This scheme will emphasise the cost of any handshaking going. We found that we were able to achieve 20k msg/s on a trivial single threaded setup across the Atlantic. Whereas we struggled to get 20 msg/sec between two US states on when using JMS implementations.
    All due to handshaking or lack of it.
    Your graphs would look radically different if you increase the network latence, kafka performs brilliantly in this scenario we found

  • Thanks for sharing, that’s valuable information! Which JMS implementations did you test? ActiveMQ (not covered here), HornetQ?

  • Yuancai Ye

    We have created a super performance persistent message queue at the site http://www.udaparts.com/document/articles/fastsocketpro.htm

    As described at the above site, our SocketPro (www.udaparts.com) persistent message queue is significantly faster than RabbitMQ especially for high volume of small messages.

    There are samples with source code available for you to study

  • Is it a replicated message queue? The article covers only such queues, otherwise in the non-replicated space there’s a lot of other contenders which would have to be taken into account.

  • Yuancai Ye

    Hi, Adam:

    Your works are great as we can easily see how each performance of persistent message queue implementations is.

    We compared our SocketPro (www.udaparts.com) persistent message queue with RabbitMQ with one client thread and one server scenario for both same-machine and cross-machine cases. There is no replicated message queue involved at all as you can see at the site http://www.udaparts.com/document/articles/fastsocketpro.htm

    Our SocketPro supports requests batching and dequeue confirmation in batch. This particular feature makes our SocketPro very fast.

    Our SocketPro persistent message queue supports replication too as described at the site http://www.udaparts.com/document/Tutorial/replication.htm

  • Correct. Replica sets do NOT scale writes. Shards should be used instead.

  • Ankur Jain

    I see lot of places comparison of Kafka with Akka, is it possible to add Akka too?

  • Akka is not a messaging system, it’s rather a library using which you could build one. So until such a system is built, I don’t think these two are up for a comparison.

  • Clebert Suconic

    Hi, I’m the project lead for HornetQ, and one of the main developers at activemq6 now (there’s no figure of a project lead within apache.. a more democratic approach which I like more)

    >Also, as far as I understand the documentation (but I think it isn’t stated clearly anywhere), replication is asynchronous, meaning that even though we send messages in a transaction, once the transaction commits, we can be sure that messages are written only on the primary node’s journal. That is a weaker guarantee than in Rabbit, and corresponds to Mongo’s safe write concern.

    This is not true… we only answer clients when the data is replicated…
    We don’t need to make a sync at the backup but we guarantee a round trip.
    We will make syncs as soon as the backup is live. And that will always guarantee message survival in case on failures.

    > the operator must do that (turn the backup server into a live one).

    Where did you take that information? this makes your article a bit invalid.

    You should try this with activemq6 now. which is the hornetq donation where we updated it and fixed a lot of issues around performance.

    You also didn’t state what version you used. There were a lot of fixes.

  • Hey,

    As for sync/async replication, I was basing this mainly on “Guarantees of sends and commits” (http://docs.jboss.org/hornetq/2.4.0.Final/docs/user-manual/html_single/index.html#send-guarantees). It only mentions durably persisting to storage, nothing about replication, hence I assumed that it’s asynchronous (if it’s synchronous, even better, you should advertise that :) ). So if I have replication set up and I send a message, the call will only return when the data is persisted to storage on both nodes?

    As to the manual fail-over, I suppose I read it somewhere in the docs, but I can’t find it now (I did write the blog 9 months ago :) ). So the fail-over is always automatic?

    Great to hear that ActiveMQ and HornetQ are merging! (at least it seems so :) ). I definitely have to do an updated version some day as all of the systems involved had new releases, plus apart from ActiveMQ there’s also e.g. EventStore missing.

    (Btw. we actually met long time ago in the JBoss office in Austin :) )

  • Are there any official builds of ActiveMQ 6?

  • Clebert Suconic

    I thought I recognized your name.. good to know you’re doing great :)

    We do it asynchronously, but we keep guarantees. There’s a OperationContext class that controls responses asynchronously. The client will think it’s synchronous and will be waiting (blocking) until the whole process is done, but everything is asynchronous on the server.

    Besides I remember a version had a bug on the replication. this is probably very different now.

    as for the version we are currently in vote. It will probably be released the next days.

    This link is probably transient (I don’t think it will exist a few weeks from now, but this is the staged website):


  • Thanks for the info, I updated the post.

    I’ll get in touch once I get around doing the tests again to get a fresh build of ActiveMQ :) So there won’t be any more releases of HornetQ?

  • Clebert Suconic

    2.3.x will be kept alive for a long time.

    once 6.0 is out there won’t be any more need to release 2.4.x or 2.5.x

    2.3.x is part of EAP customers so it will be active for 10 years on bug fixes

  • Great post. I’m curious to see the effect of MongoDB’s wiredtiger engine.

  • Stay tuned, an update might be coming soon :)

  • sapenov

    Great article!
    Have you also considered to test nats?

  • Not until now, as I wasn’t aware of nats :) Can you share a link?
    By the way, that would probably go here: https://softwaremill.com/mqperf/

  • analytiq

    You could check out their library for pub-sub – http://rethinkdb.com/docs/publish-subscribe/javascript/

  • Thanks – I’ve added them for future revisions of the benchmark.

  • Aaron

    What is the best way to make sure you have all the messages in the proper order in your queue such as SQS or Kafka?

  • I don’t think there are any ordering guarantees in SQS. In Kafka, messages in each partition will be delivered in order. So if you have some requirements around that, implement a custom partitioner for your Kafka topic.

  • Rohit Yadav

    Which one should I go for ,if I have to listen to a udp listenere with 100 msgs/s load ??

  • David Liu

    NATS streaming server supports this scalable and at-least-once delivery scenario. Check out https://github.com/nats-io/nats-streaming-server

  • Thanks! Does it replicate data across the cluster for fail-over?

  • WestInEast

    Would ZeroMQ (ZMQ) http://zguide.zeromq.org/page:all qualify to be tested as well?

  • Not really – despite the name ZeroMQ is *not* a message broker in the sense you would expect. It’s more of a lower-level messaging layer, using which you can implement a message broker. As such, it has no any kind of persistence/replication built-in.

  • Drissi Yazami

    nice article please is redis better than kafka or it depends on the use case, I am building a big data pipeline (logstash-kafka-sparkStreaming-es-kibana) and some mates are talking about redis in memory beast is it a good thing to replace kafka with redis??

  • Definitely depends on the use-case :) Kafka’s quite the opposite of an in-memory store.

  • Jossarian

    What about benchmarking of ZeroMQ??

  • Sunil Singhal

    Nice article. Did a comparison recently on few messaging frameworks based on few parameters which can help in making a decision. Refer here https://yoursandmyideas.com/2017/11/08/messaging-frameworks-comparison-grid/