Comments:"Distributed Computing at Airbnb - Airbnb Engineering"
URL:http://nerds.airbnb.com/distributed-computing-at-airbnb
Airbnb is obsessed with creating a great experience for hosts and guests alike. One of several ways we work on doing this is by analyzing the various sources of data we have. We have a vast volume of data from various sources—logging, third party analytics, and our own internal sets of generated data.
In order to further analyze this data we’ve leveraged a number of open source tools, which may come as no surprise. Some of these names may be familiar to those who have worked on similar data infrastructure projects, but we’re also doing things a bit differently. We run 3 frameworks on a cluster managed by Mesos. The frameworks we rely heavily upon are Chronos, Hadoop, and Storm. I’ll talk about each one of these briefly and how we use them.
Mesos
Mesos can be thought of as an operating system for a cluster. It manages resources and allows you to allocate those resources to frameworks. Mesos is to clusters what the Linux kernel is to individual computers. Mesos is novel for a number of reasons, but the most important thing about Mesos is that it makes it possible to use the same cluster to run multiple frameworks and ensure that resources are being utilized properly. You can allocate just enough computing power, as you need it. No more, no less. Ideally your cluster will be between 95% and 99% utilized at all times. Mesos has other great features, such as taking advantage of Linux containers to ensure processes don’t run away.
Chronos
Chronos has been talked about on this blog before. It was written by engineers at Airbnb to solve a problem that seems easy to solve, but actually isn’t. Chronos allows us to schedule tasks on a cluster, such as running a daily or weekly Pig job on the Hadoop cluster. Chronos runs on Mesos alongside our other frameworks.
Hadoop
Another framework we run on Mesos is Hadoop. Hadoop has been around for a long time and will be familiar to most people. The newest release of Hadoop includes a new resource manager as well, called YARN, which is in some ways is similar to Mesos. YARN is specific to Hadoop and only works for MapReduce jobs at the moment, so we have decided to stick with MapReduce v1 on Mesos instead. The future of YARN remains unclear, but it will likely mature into a valuable framework.
Storm
Lastly, we use the Storm framework for running our real time distributed computing tasks, such as doing real time analytics or running jobs that require significant computing resources. Storm is stable, mature, and full-featured. You can run Storm on the same nodes that your MapReduce jobs and Chronos tasks run on, without them interfering with each other.
At the moment the barrier to entry for these projects is fairly high. None of them are easy to deploy, especially in conjunction with Mesos. We have been working with the developers of these projects to make it easier to use and deploy them by contributing patches and submitting bug reports. We hope to help enable others to take advantage of these tools, and we want to give back to the community that has given so much to us.
We’re going to post a follow-up in a few weeks with more details on how we’ve deployed these projects. Stay tuned for more.