Migrating to Multi Cluster Managed Kafka - 0 Downtime | Natan Silnitsky | Conf42 Cloud Native 2022
As Wix Kafka usage grew to 1.5B messages per day, over 10K topics and over 100K leader partitions serving 2000 microservices, we decided to migrate from self-running cluster per data-center to a managed cloud service (Confluent Cloud) with multi-cluster setup. This talk is about how we successfully migrated with 0 downtime and full traffic and the lessons we learned along the way. These lessons include: 1. Automation, Automation, Automation - all the process has to be completely automated at such scale 2. Prefer a gradual approach - E.g. migrate topics in small chunks and not all at once. Reduces risks if things go bad 3. First migrate test topics with relayed real traffic - So data will be real but will not effect production. 4. Cleanup first - avoid migrating unused topics or topics with too many unnecessary partitions 5. Adapt to Confluent Cloud APIs - e.g. lag monitoring Other talks at this conference 🚀🪐 https://www.conf42.com/cloud2022 — 0:00 Intro 1:27 Talk