<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1063935717132479&amp;ev=PageView&amp;noscript=1 https://www.facebook.com/tr?id=1063935717132479&amp;ev=PageView&amp;noscript=1 "> Coinbase × Temporal Cloud — Bitovi Case Study
Case StudyCoinbase

Migrating 300+ namespaces
to Temporal Cloud

Bitovi and Coinbase migrated 300+ namespaces across 300+ services off a custom self-hosted Temporal cluster and onto Temporal Cloud, with minimal planned downtime on critical trading and payment paths and a rollback strategy available at every stage.

This content is adapted from a joint talk given at Temporal Replay.

TemporalCloud MigrationInfrastructureSecurity
// migration.config.tslive
const manager = NewMultiClientManager(
  WithPrimaryClient(cloud),
  WithSecondaryClient(legacy),
  WithCustomExecutionStrategy(
    PercentRouting(cfg.cutoverPct),
  ),
);

manager.ExecuteWorkflow(ctx, opts, MyWorkflow);
cutoverPct72%
errors0
stranded0
300+
Namespaces migrated
Across compliance, trading, and ETL workloads
300+
Services touched
Each with its own SLO and risk profile
0
Planned downtime
On critical trading and payment paths
1 Yr
Embedded engagement
White-glove team, dedicated per service
Coinbase's Mission

Increasing economic freedom in the world

Coinbase is a secure global crypto platform for buying, selling, transferring, and storing digital assets. Security and infrastructure sit at the center of its mission: protecting users' assets, data, and privacy depends directly on the reliability and security of the Temporal-based workflows running behind the scenes.

Coinbase Web Trade interface
Consumer & Wallet
Buy, sell, transfer, store digital assets.
Coinbase Prime
Trading and custody for institutions.
Developer Tools
APIs and SDKs for the platform.
Base
Coinbase's Ethereum L2 network.
Temporal at Coinbase

The shared layer underneath the platform

Temporal sits underneath a wide range of Coinbase systems. It powers core business workflows like trading, payments and payout flows, settlements, and back-office jobs that need to be reliable and auditable.

On the compliance and regulatory side, it runs workflows for GDPR deletion, travel rule checks, KYC refresh, and other audit-heavy flows where correctness and traceability are non-negotiable. It is also used heavily for internal platform and data workflows, including ETLs and backfills, ML-driven notifications, and automation around infrastructure provisioning and operations.

Workflow classShapeRisk
Trading & paymentsShort-lived, burstyCritical
Settlements & payoutsHours to daysCritical
KYC / Travel ruleAudit-heavyRegulatory
GDPR deletionLong tailRegulatory
ETL & backfillsHours to weeksRecoverable
ML notificationsSecondsRecoverable
Infra provisioningMinutesOperational
Why Temporal Cloud

Four pressure points the custom stack couldn't keep absorbing

Coinbase originally ran a single shared Temporal cluster backed by a custom, in-house persistence layer. Over time, more independent teams and workloads piled into the one cluster, amplifying the usual problems.

01

Scaling & Performance

P95 latency became unpredictable under mixed workloads, and noisy neighbors across teams were a constant risk.

02

Operational Overhead

Running a single Temporal cluster at this scale on-prem was on track to become its own Coinbase product, and upgrades to custom components were difficult.

03

Blast Radius

One cluster to rule them all, and one cluster to be very nervous about, with no easy way to tune the server for the unique traffic patterns of specific namespaces.

04

Security & Compliance

Coinbase needed stronger service identity, access control, auditability, and encrypted workflow data, and the custom stack made each of those changes difficult and risky.

What Needs Securing

Security was a first-class requirement, not an afterthought

The work broke down into three categories: how clients and clusters talk to each other, how developers interact with the system, and how the payloads themselves are handled.

Traffic between Coinbase services and Temporal Cloud should never flow over the public internet, and the connection should be one-way from services to Temporal Cloud. Only an allow-list of services should be able to talk to each namespace.

All services communicate with Temporal Cloud over a private link, establishing a private connection between Coinbase's VPC and Temporal Cloud so no data flows over the public internet. Temporal Cloud authenticates with mTLS out of the box, and Coinbase layered certificate filters on top so only approved services could reach a given namespace.

Developers need to interact with the cluster securely: provisioning namespaces with the right certificates, and starting, stopping, resetting, or otherwise operating on workflows during incident response. In production, the default Temporal CLI is not enabled for security reasons.

Namespace provisioning was integrated with LDAP groups and an access portal, so getting access to Temporal Cloud required approval to join the right group. For interacting with workflows, the team built and maintains a Temporal admin service with a consensus model: a developer raises a request to perform an operation and another developer approves it, so no single person can unilaterally stop or reset a workflow.

Temporal Cloud persists payloads in order to maintain things like event history. Coinbase uses the data converter option in the Temporal client to encrypt payloads as they flow in and out of the cluster.

Traditionally, that encoding runs in memory inside the Worker pod, but to ease the initial migration the team stood up a centralized codec server. That added a small amount of latency that was acceptable for most teams; latency-sensitive teams, or teams that flagged the centralized codec server as a single point of failure, were offered a more traditional codec model where the certs are loaded into the Worker pod and the encoding runs entirely in memory there.

Building Trust

A lot of the difficulty is operational rather than technical

It comes down to coordinating with teams, building trust, and earning the room to make the changes you need to make. Coinbase and Bitovi ran what they call white-glove migrations: an engineer was embedded onto each service team. That engineer was responsible for communicating what the migration entailed, standing up dedicated Slack channels for the work, and using Jira and Confluence for project management and documentation.

Rollout plans were templatized and the team aligned on patterns up front, so engineers could step in and out of migrations without losing momentum. Each team got a shared set of runbooks, dashboards, and incident playbooks, agreed upon before the migration even began. The team also deliberately chose early high-visibility wins, which proved out the tooling and gave them concrete successes to point to whenever a service team got nervous later on.

Observability equals transparency, and transparency is what builds trust in a migration process.
The goal was to make the answer to "is it safe to move to the next step?" a data-driven decision rather than a gut call.
Embedded engineer
On every service team
Runbook & playbook
Agreed before cutover
Dashboards
Latency, errors, queues
Reconcile scripts
No duplicates, no strandeds
Trust earned
Then accelerate
The Migration Journey

A deliberate staging phase

Coinbase didn't go straight from the custom self-hosted setup to Temporal Cloud. The journey ran through a staging ground, with real migrations at each step to prove out the patterns before moving to Cloud.

Start
1

Custom self-hosted Temporal

Durable execution, but scaling, reliability, and security problems were getting worse.

Phase 1
2

Aurora-backed staging clusters

Multiple Coinbase-operated clusters acted as a staging ground to validate capacity, isolation, and runbooks.

Phase 2
3

Migration to Temporal Cloud

Same drain-and-switch and dual-Worker patterns proven in Phase 1, now pointed at Cloud.

Phase 3
4

Temporal Cloud

Managed infrastructure, stronger isolation between namespaces, scaling and security wins delivered.

Migration Approach 01 · Drain and Switch

The strategy most people picture

Drain and switch is the strategy most people picture when they imagine switching Temporal clusters. The strengths are exactly what they sound like: a very simple mental model, very little infrastructure change, and very little impact on customer code, often just a couple of config values. It works well when workflows are short-lived or can drain inside a maintenance window.

01 / 04
Application starts workflows on old cluster; new cluster idle.
STEP 01

Day-1 steady state

The application starts the day running workflows on the old cluster. New workflows route there as they always have, and existing ones keep executing.

The new cluster is provisioned but receives no traffic.
Migration Approach 02 · Dual Workers

When you can't pause, run both clusters at once

For namespaces running critical, long-lived workflows, Coinbase couldn't pause new workflow creation for long, and some workflows take a very long time to drain naturally. The dual-Worker strategy handles those cases by letting Temporal Clients and Workers talk to both the old cluster and the new one.

01 / 07
Application calls StartWorkflowExecution into the SDK.
PART 01

One API call, on the surface

From the application point of view, nothing really changes. It still makes a single API call. The multi-client manager presents the same interface as a native Temporal client.

Adopting it is mostly a matter of constructing it with WithPrimaryClient and WithSecondaryClient, configuring the execution strategy, and calling ExecuteWorkflow the normal way.
Lessons & Takeaways

Five things to bring with you on the next migration

01

Design for no planned downtime

On critical paths, and an easy rollback from day one. Even when a migration looks simple on paper, good canaries, routing controls, and rollback paths are invaluable under real production conditions.

02

Treat security as a first-class requirement

For Coinbase that meant strong identity, private network paths, encrypted payloads, and controlled operator access by way of an admin service with consensus-based reviews.

03

Start with the simplest strategy

Drain and switch works well where you can drain quickly and accept a brief pause. Reach for dual Workers and more advanced routing only where you genuinely need them.

04

Invest early in tooling

Runbooks, dashboards, and dedicated migration time. Those shared assets pay off on every migration and build trust with the product teams.

05

Migrations are about trust as much as technology

Being transparent, data-driven, and present with the service teams during cutover matters just as much as the mechanics of any strategy.

Need help with a large-scale migration?

Kevin Phillips, Director of Systems Engineering at Bitovi

Kevin Phillips

Director of Systems Engineering

Bitovi's infrastructure and platform engineering teams have deep experience with Temporal, cloud migrations, and the operational discipline it takes to move critical systems safely. Let's talk about your migration.