IBiS' Reise in Richtung Quorum Queues

Last year at IT-Clouds, a group responsible for the development of Swisscom’s cloud products, we enabled RabbitMQ’s quorum queue [1] support in our internal PaaS solution. We then migrated one of our software systems to take advantage of them in production. This post describes our system, our motivation and the steps we followed to migrate a number of Java-/Spring-based applications using code only.

About IBiS

IBiS stands for 'Integration of Business Services'. It is a key backend system of Swisscom's cloud offerings (Enterprise Service Cloud, Dynamic Computing Services), powering the billing and reporting capabilities of our products. The system:

  • tracks the lifecycle of resources running in the cloud (virtual machines, Kubernetes clusters etc.).
  • meters and rates resource usage (CPU, storage, memory etc.).
  • keeps 3rd party inventory and business systems in sync (configuration management systems, billing systems, etc.).
  • provides various reporting capabilities (such as billing or licensing report for operating systems in use) towards the customer portal and subsequent business systems.

To achieve its goals, IBiS listens on all sort of events coming from cloud services and acts on them accordingly. It consists of a number of components architected in a simple, event-driven manner:

IBiS is developed at Swisscom, written in modern Java, and powered by recent releases of the Spring Framework.

RabbitMQ at IBiS

Reliable messaging requirements

At the heart of IBiS lies RabbitMQ [2], an open-source messaging broker supporting the AMQP messaging protocol [3]. We use RabbitMQ as a service offered within our internal Application Cloud (iAPC), a PaaS solution based on Cloud Foundry [4]. It is responsible for reliable transport of events within the system, from the components that ingest them (e.g. from Apache Kafka [5]) to the components that process them by applying the relevant business logic. One of the most important business requirements of IBiS is the accuracy in following what is happening in the Swisscom clouds. To achieve that, we have to ensure that events are delivered in a reliable manner. Loss of an event can have bad consequences and might easily result in billing or reporting errors (e.g. a VM was deleted in the cloud but IBiS never learnt about it as the event was lost).

Originally, IBiS leveraged durable mirrored queues to achieve the desired reliability. While this has been a fine solution for a long time, recent RabbitMQ versions (3.8.0+) ship with quorum queue support. Quorum queues are considered to be the modern alternative to mirrored queues. They focus on data safety and have been designed specifically to address the needs of systems like IBiS, where reliability is key. Since our workload is exactly what the quorum queues are built for, we have decided to migrate away from mirrored queues. The rest of this article describes how we did it.

Our setup

Before we talk about the migration itself, it is important to briefly describe the RabbitMQ setup that IBiS leverages. Our internal messaging architecture relies on several topic exchanges [6] to which messages (events) are sent. Each message is accompanied by a routing key which is, in our case, the event type. The exchange takes care of routing messages to a bunch of queues [7]. The queues are created (declared) by our backend components, with each queue belonging to exactly one component and listening on a subset of routing keys. This essentially represents a publish/subscribe system where each backend component can subscribe to certain routing keys (event types) and listen only for the messages of interest. The architecture of one such exchange is depicted in the figure below:

When handling messages, we also have to deal with failures (e.g. when an event is malformed). For this purpose we rely on dead letter queues. When a message cannot be processed by a backend component, it is sent to a dead letter exchange [8] which pushes it to a dead letter queue owned by the component in which the problem was detected. The message stays there until we can understand the problem and process it manually.

Eventually our architecture can be summarised as follows.

  • We use a topic exchange to which we send new messages.
  • We use a dead letter exchange through which erroneous messages are handled.
  • Each backend component declares two queues: one to process new messages coming from the topic exchange and one to re-process faulty messages coming from the dead letter exchange.

Migration to quorum queues with Spring AMQP

Our research into the migration from mirrored queues to quorum queues quickly revealed that it is not something that RabbitMQ could help us with on its own. The RabbitMQ queues cannot just change their type. They are immutable, i.e., to switch to quorum queues we have to declare new queues, move messages from the old queues, and remove the old queues. One option to solve this problem would be to use the Shovel Plugin [9]. However, this would require involving the RabbitMQ team at Swisscom to take action (plan maintenance, install the plugin, test it etc.) with no guarantee that the approach would actually work correctly for our use case. We therefore decided to take another route and handle everything as code, with Spring AMQP.

Spring AMQP

Spring AMQP [10] is a Spring project aiming to support the development of AMQP-based solutions. It provides handy abstractions that are simple to use, integrates well with the Spring Framework, and allows to leverage full potential of AMQP. While discussing Spring AMQP in depth is out of the scope of this article, it is important to note that it provides classes that allow us to declare and manage various RabbitMQ objects. For example, we can declare a new topic exchange, routing rules (bindings), as well as a new queue as simply as:

Once we declare our setup (which is then automatically provisioned at application startup, in an idempotent way), we can access RabbitMQ using the RabbitTemplate class. Let us try to put these pieces together to migrate mirrored queues to quorum queues.

Transparent migration

We can note down what needs to happen in order to transparently and safely migrate from mirrored queues to quorum queues. We need to:

  1. create new queues of quorum type.
  2. stop routing to the existing mirrored queues.
  3. process the messages left in the existing mirrored queues (they can still arrive while we are stopping to route).
  4. remove the mirrored queues, leaving only quorum queues in place.

By leveraging our messaging architecture and Spring AMQP, we can translate these steps into the ones below. For each backend component we want to:

  1. declare a new quorum queue provisioned at the application startup and remove the previous queue from configuration. This creates the new queue that we listen on and stops listening on the old one.
  2. unbind the old queue from the topic exchange, i.e., remove the routing rules causing the topic exchange to still forward messages to the old queue.
  3. drain all messages from the mirrored queue by rejecting them (i.e., signalizing failure) and hence moving them to the dead letter queue.
  4. consume the messages from the dead letter queue.
  5. remove the mirrored queue.

The steps described above are depicted in the sequence diagram below:

One might wonder why we decided to drain the old queues by sending the messages to dead letter queues instead of consuming them directly. We have chosen this approach for maintenance reasons. Since we heavily rely on dead letter queues to handle erroneous messages, we already have robust, battle-tested code in place that we can reuse. Consuming from the mirrored queues directly would be possible, but comes with a risk of not handling some edge cases and hence losing messages.

We can now go through the code that we used to migrate our queues.

First, let us declare the names of the queues. Assuming we declared the new quorum queue under the name of `my-quorum-queue` and we have the previous classic mirrored queue `my-classic-mirrored-queue` in place:

Then, we can use the autowired RabbitTemplate object (see Spring AMQP docs for more details) to obtain an administration client:

We then first check if the classic queue exists. If it does not, then we have nothing to do.

Next, we leverage the bindings that we declare in our application’s configuration. If we create them similarly to the example outlined in the Spring AMQP section above, we can autowire a Binding list and use it to remove the bindings from the old queue. The key assumption here is that the bindings are the same, i.e., the new queue has the same set of bindings as the old one.

Then, we assume that removing the bindings takes some time and some messages might still arrive while we are progressing. Therefore, we wait a bit.

At this point, the old queue should reach consistent state. All the messages are assumed to be in and no new messages should arrive as there are no bindings. We can therefore reject all of them, hence transferring to the dead letter queue:

At this moment, the old queue should be drained, with all outstanding messages moved to the dead letter queue. As explained previously, our IBiS codebase features logic to handle the dead letter queue. This is manifested through the `deadLetterQueueService` object (which, under the hood, simply provides a couple of utility methods to read from the dead letter queue or get some statistics). We use it to ensure migration has been successful and consume the messages:

Finally, we remove the old queue:

We have exposed this migration as a REST endpoint triggering automatically (after application startup) or manually (in case we need to repeat it after an error). The migration is idempotent, i.e., it can be triggered multiple times and repetitively yields the same result.

The full code can be found here(opens in new tab).

Conclusion

In this article we described how we moved IBiS, a software platform that we develop to support Swisscom's cloud offerings in terms of billing and reporting, to leverage RabbitMQ's quorum queues. We discussed how we use RabbitMQ internally, what our requirements are, and how we decided to proceed with the migration. Finally, we went through real-world code samples to give a glimpse how such a migration can be achieved with Java and Spring AMQP project.

References

Adam Krajewski

Adam Krajewski

Software Engineer

More getIT-articles

Ready  for  Swisscom

Find the job or career to suit you. A career where you can make a difference and continue your personal development.

What you do is who we are.

Go to careers

Go to current cyber security vacancies