Bitovi and Coinbase migrated 300+ namespaces across 300+ services off a custom self-hosted Temporal cluster and onto Temporal Cloud, with minimal planned downtime on critical trading and payment paths and a rollback strategy available at every stage.
This content is adapted from a joint talk given at Temporal Replay.
const manager = NewMultiClientManager(
WithPrimaryClient(cloud),
WithSecondaryClient(legacy),
WithCustomExecutionStrategy(
PercentRouting(cfg.cutoverPct),
),
);
manager.ExecuteWorkflow(ctx, opts, MyWorkflow);Coinbase is a secure global crypto platform for buying, selling, transferring, and storing digital assets. Security and infrastructure sit at the center of its mission: protecting users' assets, data, and privacy depends directly on the reliability and security of the Temporal-based workflows running behind the scenes.

Temporal sits underneath a wide range of Coinbase systems. It powers core business workflows like trading, payments and payout flows, settlements, and back-office jobs that need to be reliable and auditable.
On the compliance and regulatory side, it runs workflows for GDPR deletion, travel rule checks, KYC refresh, and other audit-heavy flows where correctness and traceability are non-negotiable. It is also used heavily for internal platform and data workflows, including ETLs and backfills, ML-driven notifications, and automation around infrastructure provisioning and operations.
Coinbase originally ran a single shared Temporal cluster backed by a custom, in-house persistence layer. Over time, more independent teams and workloads piled into the one cluster, amplifying the usual problems.
P95 latency became unpredictable under mixed workloads, and noisy neighbors across teams were a constant risk.
Running a single Temporal cluster at this scale on-prem was on track to become its own Coinbase product, and upgrades to custom components were difficult.
One cluster to rule them all, and one cluster to be very nervous about, with no easy way to tune the server for the unique traffic patterns of specific namespaces.
Coinbase needed stronger service identity, access control, auditability, and encrypted workflow data, and the custom stack made each of those changes difficult and risky.
The work broke down into three categories: how clients and clusters talk to each other, how developers interact with the system, and how the payloads themselves are handled.
Traffic between Coinbase services and Temporal Cloud should never flow over the public internet, and the connection should be one-way from services to Temporal Cloud. Only an allow-list of services should be able to talk to each namespace.
All services communicate with Temporal Cloud over a private link, establishing a private connection between Coinbase's VPC and Temporal Cloud so no data flows over the public internet. Temporal Cloud authenticates with mTLS out of the box, and Coinbase layered certificate filters on top so only approved services could reach a given namespace.
Developers need to interact with the cluster securely: provisioning namespaces with the right certificates, and starting, stopping, resetting, or otherwise operating on workflows during incident response. In production, the default Temporal CLI is not enabled for security reasons.
Namespace provisioning was integrated with LDAP groups and an access portal, so getting access to Temporal Cloud required approval to join the right group. For interacting with workflows, the team built and maintains a Temporal admin service with a consensus model: a developer raises a request to perform an operation and another developer approves it, so no single person can unilaterally stop or reset a workflow.
Temporal Cloud persists payloads in order to maintain things like event history. Coinbase uses the data converter option in the Temporal client to encrypt payloads as they flow in and out of the cluster.
Traditionally, that encoding runs in memory inside the Worker pod, but to ease the initial migration the team stood up a centralized codec server. That added a small amount of latency that was acceptable for most teams; latency-sensitive teams, or teams that flagged the centralized codec server as a single point of failure, were offered a more traditional codec model where the certs are loaded into the Worker pod and the encoding runs entirely in memory there.
It comes down to coordinating with teams, building trust, and earning the room to make the changes you need to make. Coinbase and Bitovi ran what they call white-glove migrations: an engineer was embedded onto each service team. That engineer was responsible for communicating what the migration entailed, standing up dedicated Slack channels for the work, and using Jira and Confluence for project management and documentation.
Rollout plans were templatized and the team aligned on patterns up front, so engineers could step in and out of migrations without losing momentum. Each team got a shared set of runbooks, dashboards, and incident playbooks, agreed upon before the migration even began. The team also deliberately chose early high-visibility wins, which proved out the tooling and gave them concrete successes to point to whenever a service team got nervous later on.
Observability equals transparency, and transparency is what builds trust in a migration process.
Coinbase didn't go straight from the custom self-hosted setup to Temporal Cloud. The journey ran through a staging ground, with real migrations at each step to prove out the patterns before moving to Cloud.
Durable execution, but scaling, reliability, and security problems were getting worse.
Multiple Coinbase-operated clusters acted as a staging ground to validate capacity, isolation, and runbooks.
Same drain-and-switch and dual-Worker patterns proven in Phase 1, now pointed at Cloud.
Managed infrastructure, stronger isolation between namespaces, scaling and security wins delivered.
Drain and switch is the strategy most people picture when they imagine switching Temporal clusters. The strengths are exactly what they sound like: a very simple mental model, very little infrastructure change, and very little impact on customer code, often just a couple of config values. It works well when workflows are short-lived or can drain inside a maintenance window.

The application starts the day running workflows on the old cluster. New workflows route there as they always have, and existing ones keep executing.
For namespaces running critical, long-lived workflows, Coinbase couldn't pause new workflow creation for long, and some workflows take a very long time to drain naturally. The dual-Worker strategy handles those cases by letting Temporal Clients and Workers talk to both the old cluster and the new one.

From the application point of view, nothing really changes. It still makes a single API call. The multi-client manager presents the same interface as a native Temporal client.
On critical paths, and an easy rollback from day one. Even when a migration looks simple on paper, good canaries, routing controls, and rollback paths are invaluable under real production conditions.
For Coinbase that meant strong identity, private network paths, encrypted payloads, and controlled operator access by way of an admin service with consensus-based reviews.
Drain and switch works well where you can drain quickly and accept a brief pause. Reach for dual Workers and more advanced routing only where you genuinely need them.
Runbooks, dashboards, and dedicated migration time. Those shared assets pay off on every migration and build trust with the product teams.
Being transparent, data-driven, and present with the service teams during cutover matters just as much as the mechanics of any strategy.
.png?width=860&t=1772227224680)
Director of Systems Engineering
Bitovi's infrastructure and platform engineering teams have deep experience with Temporal, cloud migrations, and the operational discipline it takes to move critical systems safely. Let's talk about your migration.