Architectures for Central Server Collaboration
Here, have some highlights:
Highlights
This blog post records some thoughts on how to architect a real-time collaborative app when you do have a central server.
The architectural techniques below come from a variety of collaborative apps and devtools, summarized in the Classification Table at the end. Additionally, the following sources serve as theoretical inspiration:
• A Framework for Convergence, Matt Wonlaw (2023).
• Collaborative Editing in ProseMirror, Marijn Haverbeke (2015).
• Operational Version Control, Jonathan Edwards (2024).
Here is a simple abstract model that describes many apps with a central server and multiple users:
• The server stores some state that changes over time. (User status, task list, multiplayer game world, etc.).
• User interact with the state by submitting operations to the server. (“set status”, “mark task as completed”, “pick up loot”, etc.) Each time the server receives an operation, it uses app-specific business logic to validate the operation and possibly update the state.
• Occasionally, the server sends the latest state to users, so that they can update their local views.
In early web apps, updating the user’s local view of the state required refreshing the page. This was true even for the user’s own operations: to perform an operation, the user’s browser would submit a form and display the returned HTML.
Later, AJAX let users submit operations and receive updates without reloading the whole page, but they often still needed to wait for a response from the server before seeing the effects of their own operations.
Real-time collaborative apps like Google Docs, Figma, and many multiplayer games instead let users see their own operations immediately, without waiting for a response from the server. These instant local updates are merely optimistic—the operation might be rejected by the server, or modified by concurrent user actions—but they nevertheless improve the user experience, by reducing user-perceived latency. Real-time collaborative apps can even choose to allow offline work, by storing operations for later submission while still updating the local state immediately.
Even before adding real-time collaboration, our simple abstract model hides an important challenge: What should the server do in situations like Figure 1, where Bob submits an operation that arrives after a concurrent operation from Alice?
Figure 1. Starting in stateS
, Bob submits operationB
to the server. But before it arrives, the server processes Alice’s concurrent operationA
, changing the state toS + A
.
The issue here is that Bob expected his operationB
to apply to some stateS
, but it was instead applied to the stateS + A
. I call this server-side rebasing, by analogy with git rebasing. Depending on the details of the specific situation, the original operationB
might no longer make sense in stateS + A
:
Here are three common solutions to the server-side rebasing challenge.
I call this approach “Serialization” by analogy with SQL’s serializable isolation level.
Note that serialization does not actually solve the rebasing problem; it just defers it to clients, who must rebase the operations themselves. Also, in apps with many active users, frequent rejections can lead to performance issues and even lock out users—see StepWise’s ProseMirror blog post.
# 1. Serialization
In situations like Figure 1, the server rejects Bob’s operation. His client must download the new stateS + A
and then retry with an updated operationB'
.
# 2. CRDT-ish
Operations are designed to “make sense” as-is, even when applied to a slightly newer state. That is, in situations like Figure 1, the server applies operationB
directly to the stateS + A
and hopes that the result looks reasonable to users.
I call this approach “CRDT-ish” because CRDTs also use operations that can be applied as-is to newer states. Thus studying CRDT techniques can help you design operations that “make sense”
Note that you don’t need to use literal CRDTs, which are often overkill in a central server app. In particular, your operations don’t need to satisfy CRDTs’ strict algebraic requirements (commutativity / lattice axioms). Whether operations “make sense” in new states is instead a fuzzy, app-specific question: it depends on your app’s business logic and what users expect to happen, and it’s okay to be imperfect.
Performance-wise, literal CRDTs were historically criticized for storing lots of metadata. This is less of a problem with modern implementations, and the central server can help if needed.
# 3. OT-ish
In situations like Figure 1, the server transformsB
againstA
, computing a new operationB'
that is actually applied to the state. More generally, the server applies a transformation functionT
to each intervening concurrent operation in order
I call this approach “OT-ish” because Operational Transformation (OT) systems also transform each operation against intervening concurrent operations. As in the previous section, literal OT algorithms are probably overkill. In particular, your transformation function doesn’t need to satisfy OT’s strict algebraic requirements (the Transformation Properties).
OT-ish systems are well established in practice. In particular, Google Docs is a literal OT system. However, most published info about OT focuses on text editing, which is difficult but only a small part of many apps.
Performance-wise, OT-ish systems can struggle in the face of many simultaneous active users: the server ends up doing
O(# active users)
transformations per operation, henceO((# active users)^2)
transformations per second. Also, the server needs to store a log of past operations for transformations, not just the current state. You can garbage collect this log, but then you lose the ability to process delayed operations (e.g., users’ offline edits).
# Challenge: Text and Lists
There is one case where server-side rebasing is especially tricky, no matter which of the three above solutions you use: editing text and lists. (Extensions to text, like rich text and wiki pages, are likewise hard.)
Figure 2 illustrates the challenge:
Figure 2. Bob submits the operation “insert ‘ the’ at index 17” to the central server. But before his edit arrives, the server applies Alice’s concurrent operation “insert ‘ gray’ at index 3”. So it no longer makes sense to apply Bob’s operation at index 17; the server must “rebase” it to index 22.
This “index rebasing” challenge is best known for real-time collaborative apps like Google Docs, but technically, it can also affect non-real-time apps—e.g., a web form that inserts items into a list. The problem can even appear in single-threaded local apps, which need to transform text/list indices for features like annotations and edit histories.
Solutions to the index-rebasing problem fall into two camps:
• Immutable Positions (CRDT-ish). Assign each character / list element an immutable ID that is ordered and moves around together with the character. I call these positions.
Fractional indexing is a simple example: assign each character a real number like 0.5, 0.75, 0.8, …; to insert a character between those at 0.5 and 0.75, use an operation like “add character'x'
at position 0.625”, which will end up in the right place even if the array index changes. Text/list CRDTs implement advanced versions of fractional indexing that avoid its main issues.
Immutable positions are a good fit for CRDT-ish server-side rebasing, because the positions “make sense” as-is in newer states: you can always ask “what array index corresponds to this position right now?”.
• Index Transformations (OT-ish). Directly transform index 17 to 22 in situations like Figure 2, by noticing that a concurrent operation inserted 5 characters in front of it. This is normal OT-ish server-side rebasing, but it is hard because there are a lot of edge cases—e.g., how do you transform an insert operation against a delete operation at the same index?
# 1. Server Reconciliation
One implementation technique is to take the abstract model literally: when a client receives a batch of remote operations from the server,
- Undo all pending local operations. This rewinds the state to the client’s previous view of the server’s state.
- Apply the remote operations. This brings the client up-to-date with the server’s state.
- Redo any pending local operations that are still pending, i.e., they were not acknowledged as part of the remote batch.
This technique is called server reconciliation and has been used in multiplayer games since the 1990s. It is the most flexible way to implement optimistic local updates, since it works for arbitrary operations and business logic.
You do need special support from your local data structures to implement step 1—e.g., “undoable” or persistent data structures—but these are easier to implement than CRDT or OT data structures. Indeed, their special properties are purely local, so your data structures don’t need to reason about concurrency or eventual consistency.
# 2. CRDT
The CRDT way to process a remote operation is: just process it directly, as if there were no pending local operations. This works because a new remote operation is always concurrent to all pending local operations, and (op-based) CRDTs guarantee that concurrent operations commute:
The CRDT way to process a remote operationR
in the presence of pending local operationsL1, L2, L3
. The reordered operations are equivalent (same final states) because concurrent operations commute by assumption.
CRDTs can be made quite efficient. For rich-text editing, which is a hard case, Yjs and Collabs can each support ~100 simultaneously active users with a modest server and clients, vs <32 for Google Docs (an OT system).
The downside is that you must restrict yourself to commuting concurrent operations, which are hard to come by. In practice, this means that apps often only use CRDTs provided by an expert-made library—though my blog tries to change that.
# 3. OT
The OT way to process a remote operation is: transform it into a new operation that has the same effect as server reconciliation, then apply the transformed operation to the optimistic local state.
The OT way to process a remote operationR
in the presence of pending local operationsL1, L2, L3
. The reordered operations are equivalent (same final states) by a property ofT
.
The downside is again that you must restrict yourself to specific operations: those possessing a transformation
T
that satisfies the algebraic rule Transformation Property 1 (TP1).
Much work has gone into designing CRDT and OT algorithms, which are challenging because they must satisfy algebraic rules. I and others have also spent a lot of time thinking about how to structure apps so that they can tolerate CRDTs’ limited room for custom business logic.
Thus it is ironic to see that, in the centralized model used by this blog post, CRDTs and OT are merely optimizations over server reconciliation, which is straightforward and completely flexible. Moreover, CRDT/OTs’ usage by clients is unrelated to the app’s core collaboration logic, which instead takes place during server-side rebasing—a setting with no strict algebraic rules.
That said, the difficult parts of CRDTs/OT are still useful for decentralized collaboration
We now have three high-level architectural choices when implementing this post’s abstract model of central server collaboration:
• Server-Side Rebasing. How the server makes sense of operations that arrive after concurrent operations.
• Optimistic Local Updates. How clients process remote operations when they have pending local operations.
• Form of Operations. The level and specificity of operations sent from clients → server and server → clients.
It’s possible to send mutations from the client to the server, but state changes from the server back to clients. Many multiplayer games work this way: clients send high-level user actions to the server (“walk-forward key is pressed”), which the server processes (applying collision detection etc.) and then translates into low-level world updates (“entity E moved to coordinates (x, y, z)”).
CRDT and OT algorithms are the classic approaches to real-time collaboration (in practice, mostly OT). When it comes to collaborative text and lists, this is understandable, since you will need some way to solve the index rebasing challenge. However, we’ve seen that literal CRDT/OT architectures are not the only option, and in fact they are likely overkill for a central server app
Process finished with exit code 0