Introducing the Concurrent Compacting Clock List for collaborative editing
Collaborative editing is a tremendously useful feature in our connected world and there has historically been two different approaches to implement it: Operational Transformation (OT) or Conflict-free Replicated Data Types (CRDTs). Recently, the paper "Collaborative Text Editing with Eg-walker: Better, Faster, Smaller" by Gentle & Kleppmann introduced a slightly different approach of tackling the problem by combining the ideas of OT and CRDT.
The Concurrent Compacting Clock List (shortened as 3C-list) builds further on top of these ideas by adding logical clocks into the mix. This enables us to support sensible semantics of collaborative editing while still being efficient in the common case of no concurrent changes. In addition, it supports both selective undo and minimizing memory usage by throwing away old history.
Introduction
This first section will both introduce the example which we will be using throughout the article, and explain in very high-level terms how 3C-list relates to other techniques.
Example: A collaborative editing session
Our running example will be document which contains, among many other paragraphs, the text "Hi!" (at position 316). One user, Alice, changes it to "Hey!". Another user, Bob, sits across the Atlantic and he changes it to "Hi Sam!" before any of Alice’s changes reaches his machine. For a split second we have two different worlds: One of "Hey!" and one of "Hi Sam!". How can we reconcile this?
The natural way of representing these changes is by having events which are tied directly to a parent event (or potentially multiple of them).
When Alice deletes the "i" the system creates an event which says "the character at position 317 was deleted" together with "this event data only makes sense on server’s v14". This fundamental information makes it possible to reason about that there’s a concurrent change which has happened. We will refer to this graph of events as, unsurprisingly, the event graph.
Operational Transformation: "Events all the way"
Imagine that we’re Alice: We’ve just changed the next to say "Hey!" and then we receive the changes from Bob. These events all include the "parent event" so we can immediately tell that we can’t apply them directly in our document. In the world of operational transformation we will then attempt to rewrite the incoming events so that they make sense.
Let’s go through the processing of Bob’s first event: "Insert ' ' at 318".
-
We know that it was applied to "Hi!".
-
We first compare it with our own event applied at that state: "Delete 'i' at 317". We see here that we deleted a character that was placed before Bob inserted something. We can therefore change Bob’s event to "Insert ' ' at 317" to make it valid after this event.
-
Next we compare it with our next event: "Insert 'e' at 317". Now we actually have a choice to make: Do we want Bob’s insertion to be after or before ours? In this case it seems more sensible that it should be after so we’ll rewrite the event back to "Insert ' ' at 318".
-
Next we compare it with our next event: "Insert 'y' at 318". Once again, it’s not obvious if this belongs before or after.
This example highlights the trickiness of operational transformation: In many cases it’s obvious how to transform an event, but often we’ve lead to multiple choices. However, we can’t just pick a random choice since we need to make sure all clients eventually see the same result. This also means that Bob transforming Alice’s events would need to reconstruct the same result.
In addition, this was the work done for processing a single event from Bob. We have N more events! This leads to quadratic complexity (in terms of concurrent changes) which is famously "fast enough until it’s definitely not".
Conflict-free Replicated Data Types: "Nodes all the way"
CRDTs take a slightly different approach:
Instead of working on events such as "inserted at position X" we prepare this into an operation which has a more direct reference to what it’s working on.
For instance, the character "i" could have an identifier such as 6816@5
(where the first number is the client which created it and the second number is a local index given by that client).
When Alice deletes the "i" the system creates an operation which says "6816@5 has now been deleted" and this is what’s being shared among all of the clients.
Working on these operations is much simpler than reasoning about "what to do with position X at event Y". You can typically just put all of the operations in a list, order them based on some consistent property and then interpret the list.
However, a major headache of CRDTs is that we need keep maintaining these identifiers. Every single character becomes its own "node" and needs additional metadata. These nodes also lasts for ever: Their identifier will be used by other nodes and everything forms one big tree of dependencies. While it’s useful to keep detailed history for every action, there are many situation where we’d like to avoid this to keep storage and memory usage low (e.g. a lightweight client). CRDTs often make it non-trivial to safely do compaction.
Eg-walker and 3C-list: "Events externally, nodes internally"
Eg-walker, which 3C-list builds on top of, introduces a combination of these two approaches:
-
The information that we send between the different clients is based on events forming a graph (like OT).
-
Internally to deal with concurrency we process it in terms of nodes (like CRDT).
One of the major advantages of this approach is that if we’re not dealing with concurrency (e.g there’s actually changes from just a single user) we can fall back to applying the events in a much simpler manner.
2C-list, the primary building block
Before we describe the 3C-list in full we’ll first consider a variant (2C-list) which doesn’t do the compaction. The 2C-list takes inspiration for CRDTs: It consists of an ordered list of nodes. Each node represent an atomic piece of data (e.g. a character in a string) and is never deleted. To represent deletion we instead mark it as such while keeping it in the list.
The data structure itself is a list, but the messages that clients communicate are events. For now we’ll focus on two types of events: "Insert X at position N" and "Delete X at position N". Each event also contains a list of parent events, which then forms an event graph of every concurrent change.
Nodes, in contrast to in a CRDT, are not created with a unique identifier. Instead each event, when received by a client, gets assigned a version vector and each node is annotated with the version vector it was inserted and/or deleted at. The nodes themselves are purely internal to each client as any modification to it is represented by an event with a concrete position in the represented text.
This might be a bit easier to understand by looking at the 2C-list after Alice has done its changes:
The version vector at the beginning of this example was [14, 0, 0]. The first number is for the server, the second for Alice, and the third for Bob. Every event gets assigned its own version vector by taking its parents' event version and then incrementing the client’s entry by one.
-
Alice’s first event gets assigned [14, 1, 0], and we attach this to the "i" node as a deletion.
-
Alice’s second event gets assigned [14, 2, 0], and we attach this to the "e" node as an insertion.
-
Alice’s third event gets assigned [14, 3, 0], and we attach this to the "y" node as an insertion.
Version vectors are compared element-wise: We say that version vector x is greater or equal than version vector y if all elements of x is greater or equal than y. It’s possible for one version vector to be neither greater, equal, nor less than another, and this represents that the events were done concurrently.
The interesting thing about the 2C-list is that it’s representing every single concurrent version at the same time and we can recover any earlier version trivially thanks to the version vector.
def visible(node, version):
(node.inserted <= version) and \
((not deleted <= version) for deleted in node.deletions)
Given a specific version vector (e.g. [14, 2, 0]) we can iterate over the nodes and follow some simple rules:
-
If the node’s insertion is less than the version vector, we ignore this node since it means it was inserted after this time.
-
If one of the node’s deletion is greater or equal than the version vector, we also ignore this node since it means it’s been deleted.
-
Otherwise, we know that the node’s value was present.
This becomes quite useful when we receive the "Insert ' ' at 318" from Bob. This event has a parent version vector [14, 0, 0] and we can easily figure out what part of the list "position 318" actually refers to. Using the visibility rule we discover that both the deletion of "i" and the insertion of "e" and "y" nodes will not be observed by this event. We can then ignore these nodes when finding position 318.
However, it turns out that there’s actually multiple places which represents 318 for this version vector: Since the nodes "e" and "y" are invisible to this event it would be valid to place it anywhere around these. Following the principles of CRDT we know that there’s one rule we must follow: Where we place it has to be consistent regardless of which order the events are being processed. Another client might process the events in a different order and we want to end up with the same result. We decide to order them based on the client ID which created the event. In this case, Bob has a higher client ID so we will order it after the nodes which were created from Alice’s events. Now our 2C-list contains both changes and the text on the screen (for both users) will show "Hey Sam!".
In addition, the 2C-list separately maintains a list of "current parent events".
In this case it will be Alice: v3
and Bob: v4
.
The next event created will have these two events as its parents.
Indexing the positions
The 2C-list must be iterated from the beginning in order to correctly determine what a position refers to. However, it is possible to create an index on top of the list of nodes to speed up this operation.
For any subset of the list we can compute what the total size would be if all the insertions/deletions would be considered visible, and store it together with the minimum and maximum version vector.
For instance, in our example, after Alice’s and Bob’s changes have all been combined, we could split the list in two (before and after the space) and we would have the following entries:
-
In the first half we have the text "Hey " (of size 4) and a maximum version vector [14, 3, 1].
-
In the second half we have the text "Sam!" (of size 4) and a maximum version vector [14, 0, 4].
(Both of them would have a minimum version vector of [14, 0, 0].)
Instead of iterating over the nodes directly we can iterate of this index. If the version we’re interested in is greater than the maximum version vector we know that all nodes inside should be considered visible and we can use the pre-computed size. This might enable us to jump over a huge chunk of the list. If the version we’re interested in is in the middle of the version vectors we would have to look more deeply inside those nodes.
This index can also have multiple levels, which basically turns the whole list into a B-tree.
Collapsed nodes
An important optimization is that it’s possible to store more than a single character in a node:
These nodes are purely internal and have no identity. If we have two consecutive regular nodes with the exact same version vector we can internally store it as a collapsed node containing all the data with just a single version vector. An incoming event might arrive and point to the middle of a collapsed node, in which case we’ll have to expand/split it.
Pruning the version vector
After all of these changes then Bob stop making any further changes. His value in the version vector is now 4, and Alice has received all of this events. In addition, Alice learns that every other client has received Bob’s final change at v4. This implies that every new event that will happen in the system is guaranteed to have a version vector where Bob’s version is greater or equal to 4. This empowers us to remove Bob from our version vectors completely:
This is because this value now no longer gives us any extra information. Every future event will compare greater or equal for this value against every existing event. It doesn’t have any impact on whether we consider an internal node visible or not. Together with collapsing, this enables us to forget about the history and reclaim memory:
We can actually do something more interesting than just removing it: For our existing version vectors we can set Bob’s value to any number in the range 0-4. This enables us to do partial pruning of history. We could for instance round down every value to the closest number of ten. A lot of nodes would then be collapsable, but we still keep some information about the order in which changes were done.
What’s powerful here is that pruning the version vector is a choice for each client. One client could decide to never prune any version vectors and keep a full history of every client. Another client might prune version vectors that come from other clients, but keep their own version vectors as-is so they know about all of their own changes.
3C-list, the 2C-list with compaction
The 2C-list shares a disadvantage with a classical CRDT: It’s stored as a list of nodes, one per character, and the insertion of an event can be costly: If we’re working on a text with 100k characters we might have to loop over a significant number of nodes to find the right position. The 3C-list builds on top of the 2C-list to optimize for the common case of there not being any concurrent events.
A 3C-list is a list of (ordered) compacted nodes, where each compacted node consists of:
-
A mutable text field.
-
A C2-List.
-
A list of events which has been applied to the text (but is not part of the C2-List).
For Alice, the C3-List starts with having just "Hi!" as the initial C2-List and no events. When she changes the text to "Hey!" we end up with the following C3-List:
The key observation here is that we’re never invoking any of the C2-List logic. Instead we’re applying the events directly on the mutable text. In addition, we just need references to the events. They might already have been stored somewhere else (e.g. on disk).
Each compacted node has two version vectors:
-
The meet is the smallest version vector of the C2-List.
-
The join is the version vector of the last event.
When we have an incoming event with a given version vector we need to determine how it’s related compared to each compacted node in order to resolve its position value. There’s three possibilities:
-
The incoming version vector is greater than the join. In this case we know that the full contents of the compacted node was visible for this event. If the event touches on something in the middle of the compacted node we can then apply it directly on the text and append the event to compacted node. This will extend the join of the compacted node further.
-
The incoming version vector is less than the meet. In this case we know that the whole contents of the compacted node (both the C2-List and all events) should be invisible to the incoming event.
-
Otherwise, we know that the event touches on something which happened concurrently with the compacted node.
In the third case we have a bit more work to do. First we need to flatten the compacted node: If there’s any events currently stored in the compacted node we apply them all (from the top to bottom) to the C2-List, and then clear the list. Now this compacted node is just a regular C2-List. Once we’ve done this we can apply the incoming event directly on the internal C2-List and then update the mutable text.
There’s also one optimization to be aware of: When flattening an event into the C2-List we need to know the local position it’s targeting. This is dependent on the version vector of the event and finding the position would normally involve traversing the list from the beginning. However, we’re already calculating this index the first time we discovered that this event belongs to this node. If we store both a reference to the event and the local position it should be applied to we don’t have to re-calculate this.
Merging, splitting, flattening compacted nodes
We actually have quite a lot of flexibility around the compacted nodes: Any flattened compacted node can easily be split up and merged with other flattened nodes around it. These only contain C2-List (which again are just a list of regular nodes) and we can always draw a different set of boundaries around these nodes.
Pruning version vectors with compacted nodes
It’s possible to prune version vectors in C3-List as well, but there’s now one additional condition: You can only reset a value of a version vector in a compacted node’s C2-List if it’s less or equal than the value in all pending events. In reality, this is actually the same condition as before. We care about "all the future events that can be applied towards the C2-List", it’s just that now we have two source of future events: Events from other clients and the pending events being stored internally.
It’s only delaying the processing of events
It’s important to remember that when the C3-List decides to apply an event directly on the mutable text, we are essentially just delaying the computation of the concurrency-aware C2-List. If we apply thousands events in this manner (because they are all non-concurrent) and then later a concurrent event is received, we still have to apply all of these thousands events from scratch.
With this in mind, if we have a big flattened compacted node and there’s an incoming event that targets the middle of the list, it might actually be preferable to split up compacted node into smaller ones. Imagine that the thousand events were actually done in ten different places (with hundred changes in each place). If we split up that into ten different compacted node first, it means that once a concurrent change happen in one place we don’t have to replay the events in the other places.
Taking advantage of a server
In most practical use cases of collaborative editing we have a server. Users want their content to be stored remotely in a safe location. 3C-list can take advantage of this in a few ways.
Sync events
The server acts a special client which only produces sync events. This is an event which doesn’t have any content in itself, but has N parent events: The latest event per client. This serves as a way for the server to tell the clients what has been reliably stored on the server.
After Alice and Bob has both exchanged all of their changes to the server, the following happens:
-
The server makes sure that all the changes have been safely stored.
-
The server sends a sync event (v15) with
Alice: v3
andBob: v4
as its parents. This gets assigned the version vector [15, 3, 4]. -
Alice’s and Bob’s "current parent event" list gets replaced with
Server: v15
. This is because the existing events in that list (Alice: v3
andBob: v4
) are both smaller (i.e. contained) than the latest server event. -
The UI can now show that their changes has been persisted (no spinner).
Using the server for safely pruning version vectors
The server is also capable of sharing the information which enables us to prune version vector:
-
Every client, when receiving a sync event will acknowledge back to the server that it’s been received. This implies that all future events from this client will be greater than the version vector for this event.
-
Once the server has received acknowledgements from all connected clients it can now broadcast that this sync event has reached consensus.
-
Clients can now prune their lists based on the version vector of that sync event.
Selective undo
Looking at the final 2C-list after Alice’s and Bob’s changes have been applied we notice that we can also do a selective undo. Instead of processing the 2C-list with a version vector, we’re going to do a selective undo for Alice at version N. Forget everything you know about version vectors, now the only condition we care about is (1) was the insertion/deletion done by Alice and (2) was it done before the given version N. If so, we consider it "visible".
Applying this process with Alice: v2
would end up constructing "H Sam!" which is indeed the sensible result of doing a partial undo of Alice’s changes while keeping Bob’s.
Comparison to Eg-walker
Eg-walker also uses the concept of sending events with parents and positions, and then interprets it locally using CRDT-like semantics.
However, the internal data structure is slightly different.
The 2C-list uses version vectors so that it can determine which part of the document should be visible for any incoming event.
This "visibility" information is used for resolving the position that’s part of the event.
Eg-walker on the other hand uses a single boolean state (i.e. visible: true/false
) which represents that state for a specific point in the event graph, and will update this state by walking the graph backwards and forwards.
The visibility state is annotated directly on top of an existing CRDT.
There’s three different operations which Eg-walker invokes internally:
-
We can apply an event. This looks up the position using the current visibility flags, applies the event on the CRDT while marking any changes as being visible.
-
We can retreat an event. This reverts the visibility of the nodes in the CRDT for the given event.
-
We can advance an event. This re-applies the visibility of the nodes in the CRDT for the given event.
In our example, the following will happen on Alice’s computer:
-
Alice’s events are all applied directly.
-
When Alice receive Bob’s first event she discovers it’s not the parent of the current visible version. It therefore retreats Alice’s events all the way back to the parent.
-
At this point the CRDT still contains all of Alice’s changes, but the visibility flags represents the document "Hi!".
-
It’s now possible to interpret the positions in Bob’s event to determine where in the CRDT the changes were done.
-
Eg-walker can then apply all of Bob’s events.
-
At the end we need to advance Alice’s events again so that the visibility flags are in sync with the CRDT.