CRDT Sensibility

Published on by David VanDusen.

In their first week with Commit, each new Engineering Partner takes on a hackathon onboarding project. They build a project, present it to the Commit community, and write a short summary of their experience. There are no restrictions, no limits, no joke.

I decided to spend my first week at Commit developing a sensibility for working with CRDTs (commutative/conflict-free replicated data types) in the context of a collaborative document editing application.

Why CRDTs

Around 2018/19, there was growing buzz about technologies that enable near real-time collaborative editing of documents in web apps without some drawbacks of other approaches (like Operational Transformation—does anybody remember Google Wave?)

I recall watching CRDTs and the Quest for Distributed Consistency, reading How Figma’s multiplayer technology works, and playing with Room.sh at around the same time.

The experience of building with CRDTs

I test-drove a popular CRDT library for JavaScript called Yjs by writing a simple app where users can modify a document whose data is represented as a tree. If the app is still online, it can be seen at hop.figureandsound.com.

After a couple snags and gotchas, I wrapped my head around the concepts pretty quickly. Here are my takeaways about working with Yjs.

There are limitations on the operations that can be performed

You can only do inserts and deletes. Moving a node is a delete plus an insert. Operations each get a unique ID (in the form of a Lamport Timestamp), but nodes are not uniquely identifiable, so there aren’t operations for moving them around the data structure. If the data structure is a tree, a move operation will delete the whole subtree under the moving node and reinsert a clone of it at its new parent.

Think through the lifecycle of a document

Depending on how many users are likely to edit a document concurrently, whether they’ll be doing so offline and syncing later, and how deeply nested the data structure is likely to get, there may be risks of (perceived) data loss. Of course, since the CRDT preserves all operations, no data is truly lost, but without care taken in the design of the user experience, I can imagine scenarios that, if not handled carefully, would cause users to panic upon seeing their changes disappear during a sync.

Understand the options and the ecosystem

It’s great to see that there is ongoing research into these technologies. The algorithms and data structures are continually being optimized. New network and persistence plugins are being added to the Yjs project. Choosing any CRDT implementation means joining an active community of researchers who are working on making this technology more mainstream.

Another tool in my tool belt

I would encourage any web developer to give Yjs a spin. Reading the paper and other supporting materials doesn’t take a big commitment, and making a simple app is all it takes to get a feel for how to use it. I look forward to getting a chance to apply the technology to a more ambitious project.