Akka vs Storm

I was recently working a bit with Twitter’s Storm, and it got me wondering, how does it compare to another high-performance, concurrent-data-processing framework, Akka.

What’s Akka and Storm?

Let’s start with a short description of both systems. Storm is a distributed, real-time computation system. On a Storm cluster, you execute topologies, which process streams of tuples (data). Each topology is a graph consisting of spouts (which produce tuples) and bolts (which transform tuples). Storm takes care of cluster communication, fail-over and distributing topologies across cluster nodes.

Akka is a toolkit for building distributed, concurrent, fault-tolerant applications. In an Akka application, the basic construct is an actor; actors process messages asynchronously, and each actor instance is guaranteed to be run using at most one thread at a time, making concurrency much easier. Actors can also be deployed remotely. There’s a clustering module coming, which will handle automatic fail-over and distribution of actors across cluster nodes.

Both systems scale very well and can handle large amounts of data. But when to use one, and when to use the other?

There’s another good blog post on the subject, but I wanted to take the comparison a bit further: let’s see how elementary constructs in Storm compare to elementary constructs in Akka.

Comparing the basics

Firstly, the basic unit of data in Storm is a tuple. A tuple can have any number of elements, and each tuple element can be any object, as long as there’s a serializer for it. In Akka, the basic unit is a message, which can be any object, but it should be serializable as well (for sending it to remote actors). So here the concepts are almost equivalent.

Let’s take a look at the basic unit of computation. In Storm, we have components: bolts and sprouts. A bolt can be any piece of code, which does arbitrary processing on the incoming tuples. It can also store some mutable data, e.g. to accumulate results. Moreover, bolts run in a single thread, so unless you start additional threads in your bolts, you don’t have to worry about concurrent access to the bolt’s data. This is very similar to an actor, isn’t it? Hence a Storm bolt/sprout corresponds to an Akka actor. How do these two compare in detail?

Actors can receive arbitrary messages; bolts can receive arbitrary tuples. Both are expected to do some processing basing on the data received.

Both have internal state, which is private and protected from concurrent thread access.

Actors & bolts: differences

One crucial difference is how actors and bolts communicate. An actor can send a message to any other actor, as long as it has the ActorRef (and if not, an actor can be looked up by-name). It can also send back a reply to the sender of the message that is being handled. Storm, on the other hand is one-way. You cannot send back messages; you also can’t send messages to arbitrary bolts. You can also send a tuple to a named channel (stream), which will cause the tuple (message) to be broadcast to all listeners, defined in the topology. (Bolts also ack messages, which is also a form of communication, to the ackers.)

In Storm, multiple copies of a bolt’s/sprout’s code can be run in parallel (depending on the parallelism setting). So this corresponds to a set of (potentially remote) actors, with a load-balancer actor in front of them; a concept well-known from Akka’s routing. There are a couple of choices on how tuples are routed to bolt instances in Storm (random, consistent hashing on a field), and this roughly corresponds to the various router options in Akka (round robin, consistent hashing on the message).

There’s also a difference in the “weight” of a bolt and an actor. In Akka, it is normal to have lots of actors (up to millions). In Storm, the expected number of bolts is significantly smaller; this isn’t in any case a downside of Storm, but rather a design decision. Also, Akka actors typically share threads, while each bolt instance tends to have a dedicated thread.

Other features

Storm also has one crucial feature which isn’t implemented in Akka out-of-the-box: guaranteed message delivery. Storm tracks the whole tree of tuples that originate from any tuple produced by a sprout. If all tuples aren’t acknowledged, the tuple will be replayed.

Also the cluster management of Storm is more advanced (automatic fail-over, automatic balancing of workers across the cluster; based on Zookeeper); however the upcoming Akka clustering module should address that.

Finally, the layout of the communication in Storm – the topology – is static and defined upfront. In Akka, the communication patterns can change over time and can be totally dynamic; actors can send messages to any other actors, or can even send addresses (ActorRefs).

So overall, Storm implements a specific range of usages very well, while Akka is more of a general-purpose toolkit. It would be possible to build a Storm-like system on top of Akka, but not the other way round (at least it would be very hard).


  • Michał Matłoka

    It is worth to say a few words from where those frameworks come. Generally Akka has more concrete ‘theoretical’ background – it implements Actor concurrency model which originates 1973 (!) and was used already e.g. by Erlang for creation of fault tolerant telecommunications systems. On the other side Storm as far as I remember was created for realtime processing of tweets, and it is considered as realtime alternative to Hadoop. Its also interesting that Storm is implemented in clojure what can be quite surprising ;) So like you’ve written, Akka is more general (actor concurrency model), comparing to Storm (realtime computation system).

  • zygm0nt

    Nicee write up, but lacks a lot of storm background. I wouldn’t compare it that easily to Akka – mainly because Storm offers distributed abstraction out of the box. Just look at Trident topologies and similar stuff.

  • True, though the blog post was never meant as a comprehensive description of either Storm or Akka (I’m sure there would be a couple of nice abstractions on top of Akka that would mean mentioning, similarly to Trident). My main goal was to compare the basic constructs in each framework and see if they are similar.

  • Pingback: Links & reads for 2013 Week 26 | Martin's Weekly Curations()

  • Davi

    And I got the point but perhaps some examples on what you can achieve with one and not with the other.

    And who is best suited from your point-of-view to a specific task (in which one of them excels or could excel the other).

  • That could be a very long post, but essentially as I write at the end of the post, Akka is a general purpose toolkit with which you can *also* implement data stream processing. Storm implements this single set of use cases well, and that’s probably where it excels.

  • Tomer Ben David
  • P Ghosh

    Comparison by an akka enthusiast

  • A. Steven Anderson

    Wouldn’t Apache Spark vs. Storm be a better comparison, especially since Spark is built using Akka?

  • Certainly, a very good topic for another blog post! Spark got quite well developed since this post was written, and there are certainly overlapping use-cases.

  • Robert

    I’m a committer at the Apache Flink (incubating) project. We recently got an external contribution of a streaming layer which has been included into the latest 0.7 release. See the docs: http://flink.incubator.apache.org/docs/0.7-incubating/streaming_guide.html
    Flink Streaming has a very nice API and uses a real streaming engine in the backend (instead of micro batches).

  • So the new version of this blog should be Storm vs Spark vs Flink :)

  • Robert

    Would love to see that.

  • dave

    Spark has nothing to do with the use cases of Akka and Storm. Or very little. It operates over big data in bulk, not over streams.

  • A. Steven Anderson

    Really? So, then what’s the purpose of https://spark.apache.org/streaming ?

  • dave

    Thanks for correcting my mistake. There’s already good comparisons between them at slideshare. I’d begin any blog post with a link to http://en.wikipedia.org/wiki/Stream_processing though, because stream processing is only useful under very careful assumptions. Being cool isn’t reason to use it. The nature of the compute tasks must match its assumptions.. otherwise you can and should just write regular distributed code, rather than force its semantics upon your problem domain.

  • A. Steven Anderson

    I’m not sure where the *being cool* reason comes from, but having a single solution that addresses both batch and stream processing sounds like a good reason and possibly a key differentiator. Obviously, *rolling your own* distributed solution is always an option, but writing less custom code is always the best option as far as I’m concerned.

  • A. Steven Anderson

    Also, considering that Spark is built upon Akka, one would hope there is some applicability. ;-)

  • andy sheldon

    One con for Storm — it is very hard to consistently type “spout” with the correct spelling. It want to come out “sprout” :)

  • Mani Megala

    Great article.. this comparison is very clear and helpful to learn more useful information about apache akka and storm..

    big data training institute in velachery

  • @adamw1pl:disqus Great article. See I have project where I scrape many e-commerce sites to track price and inventory changes and update prices and inventory changes into database.

    Due to application running into single node. Its slow to scrape , I am also using thread and async model but still its limitation due to single node and some time its throwing db concurrency issue. So, I am planning to move scraping and db update module into Akka or strom.

    Can you suggest that which one best for such type case ?

  • That’s a very general question, and without knowing more about the internals of the system will be difficult to answer. If you just want to parallelise execution of existing code, my first bet would be a higher-level framework like Storm (or many others), but again, this depends.

  • Nice Article. Thank you so much for explanation of the difference between Akka and Apache Storm. This post is much helpful to us. I am really enjoying reading your well written articles. I am waiting for your more posts like this or related to any other informative topic. Storm is a distributed, reliable, fault-tolerant system for processing streams of data. The work is delegated to different types of components that are each responsible for a simple specific processing task.