Sub-500ms co-editing used to be a Google Docs flex. In 2026 it's a baseline expectation — if your collaborative editor takes a second to show another user's keystroke, users assume the product is broken. We hit and held a sub-500ms p95 across global users in Vero. This is what we learned getting there, and the trade-offs nobody warns you about.
The latency budget
500ms total, end to end, between user A typing a character and user B seeing it. Where the budget actually goes:
| Stage | Typical budget |
|---|---|
| Client A: keystroke → operation | 5–15 ms |
| Client A → server (network) | 30–120 ms |
| Server: receive, transform/merge, persist | 20–80 ms |
| Server → Client B (network + fanout) | 30–120 ms |
| Client B: apply operation → render | 10–30 ms |
| Browser paint | 10–20 ms |
| Total p95 | ~150–400 ms |
Notice what isn't on that list: a database round-trip on the hot path. If you write the document state to Postgres before fan-out, you've already lost. The hot path goes through memory. Persistence happens asynchronously.
CRDTs vs OT: pick CRDT, here's why
The Operational Transform (OT) vs CRDT (Conflict-free Replicated Data Type) debate is mostly settled in 2026: CRDTs win for most teams.
OT is mathematically elegant and can be very efficient, but its correctness depends on the central server applying operations in a globally consistent order. That makes it brittle to any kind of partition, multi-region replication, or offline support.
CRDTs are designed for eventual consistency. Two clients can apply each other's edits in any order and converge to the same state. The trade-off is metadata overhead — CRDTs carry per-character or per-block metadata, which adds up.
We use a state-based CRDT (Yjs-style) for Vero's document layer. The metadata cost is real but manageable; the correctness guarantee under network partition is worth the bytes.
When OT might still make sense: extremely large documents (>10 MB) where CRDT metadata becomes painful, and you control the server topology tightly. For most products, take the CRDT. — Internal architecture review, Vero v2
The hot-path architecture
┌──────────────┐ Client A ◀──────▶│ Edge WS │ │ Gateway │◀──────▶ Client B, C, D... └──────┬───────┘ │ (in-memory CRDT state per doc) │ ▼ ┌────────────────────┐ │ Async persistence │ → Postgres │ (batched, every │ → Object storage │ 1–5 sec) │ for snapshots └────────────────────┘
Two principles:
- The hot path stays in memory. The WebSocket gateway holds the live CRDT state per doc and fan-outs ops to subscribers immediately.
- Persistence is async and batched. Writes go to Postgres on a 1–5 second cadence, and full snapshots to object storage every minute or on document close.
This is roughly the architecture every serious collaborative editor in production uses (Figma, Notion, Google Docs). The details differ, but the pattern doesn't.
The presence layer is its own thing
Live cursors, "Alice is editing", typing indicators — don't put these through your document CRDT. They're high-frequency, low-importance, ephemeral. Treat them as a separate stream.
We use a presence-only channel that:
- Doesn't persist to disk.
- Doesn't transform through the CRDT.
- Drops messages older than 200ms — there's no point delivering a stale cursor position.
- Coalesces updates per user (last-wins).
Backpressure: the thing that breaks at scale
A document with 50 simultaneous editors and a heavy paste event can produce more ops than a single client can render. Without backpressure, the client's WebSocket buffer grows, the UI stalls, and eventually the tab crashes.
What works:
- Server-side coalescing. If multiple ops on the same character range arrive within a small window, merge them before fan-out.
- Client-side debouncing of expensive renders. Apply ops to the CRDT immediately; debounce the layout/render pass at ~16ms.
- Drop-and-resync threshold. If a client falls more than N ops behind, send it a fresh snapshot and reset its op cursor. Don't try to catch up the whole queue.
Without backpressure, "real-time" works for two users and falls over at 20.
The global-latency problem
A user in Sydney editing a document hosted in Frankfurt has 250–300ms of wire latency each way. Sub-500ms p95 across that geography requires either:
- Edge gateways close to users. Run the WebSocket gateway in 6–10 regions; route by user geography. The CRDT state is replicated between gateways with tolerable delay.
- Move-the-room. When a doc opens, pick the gateway closest to the majority of active editors; route everyone there.
We use option 2 for most workloads. Option 1 is more work but pays off if your active document has truly global editors. Most don't.
What we'd skip if we built it again
A few decisions we'd change:
- Don't try to support offline editing in v1. It's a much harder problem (mobile-style sync, conflict UX, queueing) and most B2B products don't actually need it. Add it later if customers ask.
- Don't use a generic message broker (Kafka, NATS) on the hot path. They add per-hop latency. Direct gateway-to-gateway connections are simpler and faster.
- Don't expose the raw CRDT to plugins or integrations. Wrap it. Future-you will want to swap CRDT libraries.
What you get when it works
The downstream effect of getting under 500ms isn't just "users like it." It's the disappearance of an entire class of behaviours:
- The "let me know when you're done editing so I can take a look" Slack message.
- The "merge conflict" email.
- The "wait, who changed this?" follow-up.
- The 15-minute review meeting that would have been a comment thread.
In one Vero customer org, internal email volume dropped 60% in the six months after rollout. That's not the editor doing it — it's the latency. Below a certain threshold, collaboration stops feeling asynchronous and starts feeling like sitting at the same desk.
That's the threshold worth engineering for.



