0x018 - CRDTs 🔃

0x018 - CRDTs 🔃

Get more 5-minute insights about dev trends every 3-4 weeks. To subscribe you need to code your way there via the home page (or the easy way here)...

👉
Brought to you by:

- Depot: Build Docker images up to 40x faster with persistent caching and native Intel and Arm support.

- Want to improve your product skills? PostHog has created a newsletter for you, check it out here.

I know it's been a while since my last post, I’ve just relocated to Dallas for a little break from life on the road. If you're around, I'm always up for a coffee chat 😄

CRDTs

Synonyms: Conflict-Free Replicated Data Types, eventual consistency.

Who is this issue for?
- Developers working on collaborative applications and distributed databases/storage.

TL;DR:

  • Problem: It’s hard to resolve consistency in a decentralized system - and it’s even harder to do so without a centralized server.
  • Solution: A bunch of algorithms that enable merging conflicting data into a consistent result, without a centralized server.
  • In Sum: Think of it like a version-controlled data structure, widely-used in collaborative apps and databases you're familiar with.

How does it work? 💡

Imagine that you and your colleague are building your own Google Docs-like product. At some point, some users collaborate together and change the same word. Maybe they transform something colloquial to something more formal, or just fix a typo. How do you resolve this inconsistency? What if they both lose connectivity at the same time, or a third colleague joins the party and introduces a third edit? What do you do now?

The answer is simple: CRDTs.

Each CRDT comes with a merge function, that - when faced with two different states - merges them, ensuring a consistent result for all parties. For the mathematically inclined, this merge function is always commutative, associative, and idempotent.

Consider the Last-Writer-Wins (LWW) CRDT as one example (because there are many). In this CRDT, the most recent update to a piece of data takes precedence. In our Google Docs example, when both people edit the same data, the system timestamps each edit. The LWW approach will then favor the edit with the later timestamp, effectively resolving the conflict in favor of the most recent update. “Ok”, you can say, “That’s pretty trivial, what’s the big deal?” - Great question. 

Some CRDTs are simple in theory, but this simplicity is precisely their strength: they provide an elegant solution for data consistency in distributed systems, without the need for complex locking mechanisms or centralized control.

Keep in mind that the idea of CRDTs was only formalized for the first time in 2011 so implementations and papers are still coming out.

PS, If you want a deeper dive (that is also interactive) to CRDT mechanics, I wholeheartedly recommend you read “An interactive intro to CRDTs”.

Questions ❔

  • CRDTs sound a lot like OT (Operational Transformation). What’s the difference?
    • OT-based systems only seem to work well when there is a centralized server. If you want a P2P system, CRDTs work much better - read more here, and see a HN discussion here too.
  • What happens when there is a tie (i.e. the LWW timestamp is the same?)
    • There must be some kind of tie-breaker, which could be arbitrary. Logical clocks are also often used in this context.

Why? 🤔

  • Collaborative Applications: Allowing everyone to see the same eventual result even in the case of offline scenarios.
  • Security & Privacy: When a central server may pose security or privacy risks, CRDTs offer a conflict resolution mechanism without relying on centralization. The state is localized to all clients and can be transmitted peer-to-peer.
  • Real-time Applications: Like in chat applications, where message order is important.
  • Scalability: Not having a centralized server (bottleneck) as a requirement to enforce consistency means that you can scale much more easily with CRDTs. That’s why Google Docs has a collaboration limit.
  • Fault-tolerant: CRDTs enable maintaining functionality during temporary node disconnects as well as achieving consistency automatically when connectivity is restored.

Why not? 🙅

  • Storage Overhead: CRDTs typically require additional metadata to function, adding 1.5x to 2x more metadata (in memory!) compared to the original contents.
  • Centralization: If you already rely on a centralized server or database, a solution like OT might be more suitable.
  • Eventually Consistent: CRDTs are AP, but aren’t CP (see CAP). So they might not be well-suited for applications like banks where accounts can’t drop below $0.
  • Cheating: I haven’t found any serious multiplayer gaming implementation that use CRDTs because it’s hard to prevent cheating (without a centralized server).
  • Validation: Complex, custom data structures sometimes need specialized CRDTs. Writing your own CRDT is a pretty complex task. You can't simply take a text CRDT and put JSON or XML data inside it - you risk ending up with corrupted data.
  • CRDTS Are New: There aren't many mature CRDTs used by a lot of projects. The CRDT libraries that exist today don't integrate well with databases, and many have poor performance.
  • Memory consumption: CRDTs usually need the entire CRDT state to be loaded in memory while editing occurs. And the entire history of a document often ends up being stored indefinitely. This can scale poorly. You might need to experiment a bit before deploying a CRDT to make sure the library you choose will work well in production!

Tools & players 🛠️

🤠
My opinion: Remember that CRDT is a concept, and there are many different types of CRDT implementations. You will need to choose the one that best fits your use case. Automerge seems like an industry standard by this point and has support for several different languages. Remember that CRDTs aren’t only for text editor collaboration, but everything that can be represented as text.

Forecast 🧞

  • Education: With more and more collaborative applications coming out, the need for strong elegant solutions for conflict resolution is on the rise. This means that more developers, like you, need to understand and know about them. I think we’ve seen a boom in such real-time collaborative applications already, and we’ll see this trend continue (to see CRDTs in the wild, refer to the “Who uses it?” section).
  • Privacy and security: CRDTs enable more use cases for peer-to-peer applications that don’t require a central server. I suspect to see more technology built around them that is pushing the envelope in privacy and security-focused applications.
  • IoT (Internet of things): CRDTs appear well-suited for IoT devices with unstable connections, presenting a promising solution.
  • A Blockchain Complementary: With the ability to provide consistent conflict resolution without heavy computational requirements, we might witness the rise of more hybrid applications of CRDTs in the space. I’ve already seen some papers on the topic.
  • Offline applications: Considering an increasing sensitivity to privacy among consumers and a potential trend towards “Local-first software”, I can see how CRDTs can play a major role as a technological backbone to many of those offline systems.
  • Other buzzwords that pop to mind: Healthcare data synchronization, and Decentralized Supply Chain Management - you’re welcome to think about them yourself ;)

Examples ⚗️

Try it out yourself?

Who uses it? 🎡

Extra ✨

Thanks 🙏

I wanted to thank Seph Gentle (creator of diamond-types, sahredb and ShareJS) who was super kind and reviewed the heck out of this issue, Tom Granot (the best technical product marketer I know), our regular here on Unzip.

EOF

(Where I tend to share unrelated things).

  • If you haven't checked out 'Up for Grabs' yet, Tom and I have shifted our focus towards creating a YouTube channel aimed specifically at developers - that's right, you! We'll be sharing all our digital business ideas (which we can't pursue due to time constraints), completely free. All episodes will be much more technical than before and hopefully will help some of y’all to create some amazing products.
  • I'm personally stocked for the next issues, hopefully I'll make it ready in 2–3 weeks.
❤️
(Helping out a friend) @hunter is a solo maker that created UserSketch, which lets you chat with your own SaaS data using AI - show him some love.