Akka vs Storm

I was recently working a bit with Twitter’s Storm, and it got me wondering, how does it compare to another high-performance, concurrent-data-processing framework, Akka.

What’s Akka and Storm?

Let’s start with a short description of both systems. Storm is a distributed, real-time computation system. On a Storm cluster, you execute topologies, which process streams of tuples (data). Each topology is a graph consisting of spouts (which produce tuples) and bolts (which transform tuples). Storm takes care of cluster communication, fail-over and distributing topologies across cluster nodes.

Akka is a toolkit for building distributed, concurrent, fault-tolerant applications. In an Akka application, the basic construct is an actor; actors process messages asynchronously, and each actor instance is guaranteed to be run using at most one thread at a time, making concurrency much easier. Actors can also be deployed remotely. There’s a clustering module coming, which will handle automatic fail-over and distribution of actors across cluster nodes.

Both systems scale very well and can handle large amounts of data. But when to use one, and when to use the other?

There’s another good blog post on the subject, but I wanted to take the comparison a bit further: let’s see how elementary constructs in Storm compare to elementary constructs in Akka.

Comparing the basics

Firstly, the basic unit of data in Storm is a tuple. A tuple can have any number of elements, and each tuple element can be any object, as long as there’s a serializer for it. In Akka, the basic unit is a message, which can be any object, but it should be serializable as well (for sending it to remote actors). So here the concepts are almost equivalent.

Let’s take a look at the basic unit of computation. In Storm, we have components: bolts and sprouts. A bolt can be any piece of code, which does arbitrary processing on the incoming tuples. It can also store some mutable data, e.g. to accumulate results. Moreover, bolts run in a single thread, so unless you start additional threads in your bolts, you don’t have to worry about concurrent access to the bolt’s data. This is very similar to an actor, isn’t it? Hence a Storm bolt/sprout corresponds to an Akka actor. How do these two compare in detail?

Actors can receive arbitrary messages; bolts can receive arbitrary tuples. Both are expected to do some processing basing on the data received.

Both have internal state, which is private and protected from concurrent thread access.

Actors & bolts: differences

One crucial difference is how actors and bolts communicate. An actor can send a message to any other actor, as long as it has the ActorRef (and if not, an actor can be looked up by-name). It can also send back a reply to the sender of the message that is being handled. Storm, on the other hand is one-way. You cannot send back messages; you also can’t send messages to arbitrary bolts. You can also send a tuple to a named channel (stream), which will cause the tuple (message) to be broadcast to all listeners, defined in the topology. (Bolts also ack messages, which is also a form of communication, to the ackers.)

In Storm, multiple copies of a bolt’s/sprout’s code can be run in parallel (depending on the parallelism setting). So this corresponds to a set of (potentially remote) actors, with a load-balancer actor in front of them; a concept well-known from Akka’s routing. There are a couple of choices on how tuples are routed to bolt instances in Storm (random, consistent hashing on a field), and this roughly corresponds to the various router options in Akka (round robin, consistent hashing on the message).

There’s also a difference in the “weight” of a bolt and an actor. In Akka, it is normal to have lots of actors (up to millions). In Storm, the expected number of bolts is significantly smaller; this isn’t in any case a downside of Storm, but rather a design decision. Also, Akka actors typically share threads, while each bolt instance tends to have a dedicated thread.

Other features

Storm also has one crucial feature which isn’t implemented in Akka out-of-the-box: guaranteed message delivery. Storm tracks the whole tree of tuples that originate from any tuple produced by a sprout. If all tuples aren’t acknowledged, the tuple will be replayed.

Also the cluster management of Storm is more advanced (automatic fail-over, automatic balancing of workers across the cluster; based on Zookeeper); however the upcoming Akka clustering module should address that.

Finally, the layout of the communication in Storm – the topology – is static and defined upfront. In Akka, the communication patterns can change over time and can be totally dynamic; actors can send messages to any other actors, or can even send addresses (ActorRefs).

So overall, Storm implements a specific range of usages very well, while Akka is more of a general-purpose toolkit. It would be possible to build a Storm-like system on top of Akka, but not the other way round (at least it would be very hard).

Adam

  • Michał Matłoka

    It is worth to say a few words from where those frameworks come. Generally Akka has more concrete ‘theoretical’ background – it implements Actor concurrency model which originates 1973 (!) and was used already e.g. by Erlang for creation of fault tolerant telecommunications systems. On the other side Storm as far as I remember was created for realtime processing of tweets, and it is considered as realtime alternative to Hadoop. Its also interesting that Storm is implemented in clojure what can be quite surprising ;) So like you’ve written, Akka is more general (actor concurrency model), comparing to Storm (realtime computation system).

  • zygm0nt

    Nicee write up, but lacks a lot of storm background. I wouldn’t compare it that easily to Akka – mainly because Storm offers distributed abstraction out of the box. Just look at Trident topologies and similar stuff.

  • http://www.warski.org/ Adam Warski

    True, though the blog post was never meant as a comprehensive description of either Storm or Akka (I’m sure there would be a couple of nice abstractions on top of Akka that would mean mentioning, similarly to Trident). My main goal was to compare the basic constructs in each framework and see if they are similar.

  • Pingback: Links & reads for 2013 Week 26 | Martin's Weekly Curations

  • Davi

    And I got the point but perhaps some examples on what you can achieve with one and not with the other.

    And who is best suited from your point-of-view to a specific task (in which one of them excels or could excel the other).

  • http://www.warski.org/ Adam Warski

    That could be a very long post, but essentially as I write at the end of the post, Akka is a general purpose toolkit with which you can *also* implement data stream processing. Storm implements this single set of use cases well, and that’s probably where it excels.

  • Tomer Ben David
  • P Ghosh

    Comparison by an akka enthusiast