February 28, 2024

#4 – Martin Kleppmann: CRDTs, Automerge, generic syncing servers & Bluesky

All episodes

February 28, 2024

#4 – Martin Kleppmann: CRDTs, Automerge, generic syncing servers & Bluesky

Transcript

Dowload transcript

0:00:00 Intro

0:00:00 So much work in building a web app goes into reinventing this backend

0:00:04 infrastructure that every single company has to reinvent again.

0:00:07 And so if we can make the data sync protocol and the data storage on the

0:00:12 servers efficient, like loading and synchronization of large collections of

0:00:16 documents, all of that can be generic.

0:00:18 So if one person is then building a graphics app and another person is

0:00:21 building a spreadsheet and another person is building a document

0:00:24 editor, they can all use the same syncing service as the backend.

0:00:29 That I think is part of the economic value proposition of local-first software.

0:00:34 Welcome to the localfirst.fm podcast.

0:00:37 I'm your host, Johannes Schickling, and I'm a web developer, a startup founder, and

0:00:41 love the craft of software engineering.

0:00:43 For the past few years, I've been on a journey to build a modern, high quality

0:00:47 music app using web technologies.

0:00:49 And in doing so, I've been falling down the rabbit hole of local-first software.

0:00:54 This podcast is your invitation to join me on that journey.

0:00:57 In this episode, I'm speaking to Martin Kleppmann, who is one of the authors

0:01:01 of the original local-first essay.

0:01:04 Martin has been exploring local-first software in CRDTs for over 10 years, which

0:01:09 has led to the creation of Automerge, which we discuss in depth in this episode.

0:01:14 We are also exploring the ideas of a generic sync server and the

0:01:17 impact this technology could have on local-first software in the future.

0:01:21 I also have a very special announcement today as I'm co organizing the

0:01:25 world's first local-first Conference.

0:01:27 It will happen on May 30th in Berlin, and I would love to see you there in person.

0:01:32 Go ahead and grab your tickets on localfirstconf.

0:01:35 com.

0:01:35 Before getting started, also a big thank you to Expo and Crab

0:01:39 Nebula for supporting this podcast.

0:01:41 And now my interview with Martin.

0:01:45 Hello, welcome, Martin.

0:01:46 Thank you so much for coming to the podcast.

0:01:49 Hi, Johannes.

0:01:49 Thank you for having me.

0:01:51 CRDTs

0:01:51 I'm super excited to have you on the show.

0:01:54 You're obviously no stranger in the local forest world.

0:01:57 Um, but would you briefly mind introducing yourself?

0:02:01 Uh, yeah, sure.

0:02:01 So I'm just recently an associate professor at the university of Cambridge.

0:02:07 I've been at Cambridge for quite a long time, but for a long time that was

0:02:10 like on fixed term academic contracts.

0:02:12 So this is my first permanent university position, which is nice.

0:02:16 It means I can keep doing this stuff long term.

0:02:19 Um, yeah, previous to that, I did in some past life work on startups.

0:02:24 sold a startup to LinkedIn back in 2012, but then shifted over to academia.

0:02:29 Amazing.

0:02:30 and so you're one of the coauthors of the local-first paper that was

0:02:35 published on the Ink & Switch site.

0:02:38 So, I think most people are also in the local-first space are familiar

0:02:42 with that, but I'm very curious, what is your personal story behind you sort

0:02:49 of finding your way to local-first?

0:02:52 Yeah, there is, it probably starts about 2013 or so, fairly shortly after we had

0:02:57 sold the startup to LinkedIn and I was at LinkedIn, but our project got canceled.

0:03:03 And so I was kind of looking around for new things and I came across this.

0:03:08 paper from Mark Shapiro and colleagues on conflict free

0:03:12 replicated data types or CRDTs.

0:03:14 I can't remember how I come across it.

0:03:15 Maybe somebody put it on Twitter or something like that.

0:03:17 And I read this thing and I was really intrigued by it because I felt that, you

0:03:21 know, this seemed like a way of Making the software a bit less cloud dominated.

0:03:27 I had got a bit frustrated with the whole startup world.

0:03:30 You know, as I was doing web based stuff, social media stuff, it's all very much

0:03:35 like centralized services, which put all of the user's data in one big database.

0:03:41 And I was just a bit uncomfortable with that.

0:03:43 I felt like, it's not really in the user's interest.

0:03:46 Obviously it's in the company's interests to try to collect as much data as they can

0:03:50 and monetize it in whatever way they can.

0:03:52 But for users, it's not really great.

0:03:54 And so I was sort of trying to overcome my unease with this by looking at

0:03:58 technological solutions that might help.

0:04:01 And so then I came across these CRDTs, which seemed like it could be

0:04:04 a part of the answer to the problem.

0:04:06 I felt like it was a way how you could make software that would run on the users.

0:04:11 device and store the data locally on the user's device where

0:04:14 nobody can take it away from you.

0:04:16 And at the same time have all the conveniences of cloud software with

0:04:20 like real time collaboration and sync across all of your devices

0:04:23 and being able to easily share data with other users and so on.

0:04:27 So that was kind of in core of the local-first idea there already,

0:04:31 but it then took us several more years before we really were able to

0:04:35 articulate it clearly enough ourselves.

0:04:37 And so then.

0:04:39 I can't remember when the local-first paper came out.

0:04:41 Was it 2019 or something like that?

0:04:43 So yeah, about 2014 I left LinkedIn for a year.

0:04:48 I spent on sabbatical writing my book and then 2015 I joined the university

0:04:52 and started working on CRDTs myself and then started like gradually

0:04:56 building up the technical foundations.

0:04:59 It then still took us quite a long time before we had like really articulated it.

0:05:03 but all that time was gradually working towards what we now call local-first.

0:05:08 I'm curious whether there were like any particular milestones you've

0:05:12 reached during those early research years where they're like moments

0:05:16 where you thought you hit some walls and you thought this was a dead end.

0:05:20 I mean, with any kind of research, it's always like lots of little

0:05:24 dead ends and then getting out of them and trying other things again.

0:05:27 But I think that's just part of the normal process.

0:05:30 So I'm not going to pretend it was smooth in any way.

0:05:33 Obviously, there's lots of things that didn't work along the way, but also

0:05:36 most of them are sort of in retrospect, kind of don't matter too much.

0:05:40 So like, you know, we have very detailed discussions about.

0:05:44 How a particular merge behavior should work, for example.

0:05:46 So if one user makes one change to a document, another user on a different

0:05:50 device makes a different change, we need to merge those things together.

0:05:52 You know, you can have hours and hours of debate about how

0:05:55 precisely that should work.

0:05:56 But then in the end, once you've settled on an answer and That answer seems to be

0:06:00 broadly okay, then, you know, then the question just becomes uninteresting and

0:06:04 we move on to more interesting things.

0:06:06 So yeah, there's, there's lots of that kind of things along the way.

0:06:10 And like a lot of changes we made to the implementations of these things,

0:06:13 like the software evolved a lot.

0:06:16 Like my first CRDT implementation was in Ruby.

0:06:19 And then that later turned into a JavaScript implementation, which was the

0:06:23 beginnings of what is now Automerge and then that's later got ported to Rust.

0:06:27 And so now the Rust implementation is our primary one.

0:06:29 So, you know, we've really gone through three languages there and God

0:06:33 knows how many orders of magnitude improvement in performance, the early

0:06:36 versions were extremely, extremely slow, but you know, it's, it gradually

0:06:40 gets better as we keep working on it.

0:06:42 I'm really eager to dive in deeper on Automerge and hearing

0:06:45 your side of the story on how Automerge came to where it is today.

0:06:50 Before going into Automerge, Automerge is a library to deal

0:06:54 with CRDTs, but not everyone might be super familiar with CRDTs.

0:06:59 I don't think there's a better person to explain what CRDTs are than you.

0:07:04 Could you give a quick summary, and introduction to CRDTs?

0:07:08 Yeah, so the basic idea is that you've got some data on multiple

0:07:12 devices, the user on each device can independently update that data,

0:07:16 possibly while the device is offline.

0:07:18 And then at some point later, the devices sync their updates.

0:07:20 And ideally, we just want them to merge their states together in some way.

0:07:24 And CRDTs are just algorithms that perform this kind of merging, plus

0:07:29 the data synchronization and so on.

0:07:30 So the idea is that.

0:07:32 You know, often the changes made on two different devices will affect

0:07:36 different parts of a document.

0:07:37 One person is updating one item in the to do list and another person is

0:07:40 updating a different item, and so it's fairly easy to merge those together.

0:07:44 In principle, you can end up with conflict cases where, like, it's

0:07:47 a graphic software, one user makes the rectangle red, another Person

0:07:51 makes the same rectangle green.

0:07:52 Well, what do you do?

0:07:53 Well, I mean, you probably just choose one of the two and then if the user doesn't

0:07:57 like it, they can change the color again.

0:07:58 So it's algorithms just for automating that kind of thing.

0:08:02 Because what we don't want is for the user to be shown like a pop up saying,

0:08:06 Hey, this file was changed on two devices.

0:08:08 Please pick which one you want to keep and which one you want to throw away.

0:08:11 I think that would be bad.

0:08:12 And like previous versions of Apple's pages, also did

0:08:17 that kind of thing, I guess.

0:08:18 I think if I remember correctly, but fortunately now we have better algorithms

0:08:22 which, which just allow changes to be merged together with minimal ceremony.

0:08:26 So that's really all CRDTs are about.

0:08:29 A huge amount of research has gone into like figuring out how

0:08:33 to make the merge behavior good.

0:08:35 So that depending on what types of edits people make, the end result is hopefully

0:08:40 something that was more or less what they expect, what the users expected and

0:08:44 also in making these algorithms fast.

0:08:46 Because, , you can implement these algorithms in a very simple way, but the

0:08:49 simple way tends to be very inefficient.

0:08:51 And so making it so that it doesn't take too much disk space, doesn't take

0:08:55 too much memory and it's generally fast, that actually requires quite a

0:08:58 lot of sophistication on the algorithm.

0:09:01 So that's where a lot of the investment has gone over the last few years,

0:09:04 but yeah, but that's broadly what CRDTs are and Automerge is just a

0:09:08 library that implements this stuff.

0:09:09 So there are other CRDT libraries out there, but, Automerge is the

0:09:16 Automerge

0:09:16 Yeah, I think Automerge is probably one of the most advanced

0:09:20 CRDT implementations right now.

0:09:22 And as you've mentioned, you built your first versions, not in

0:09:26 Rust as it is written today, but there were predecessors to this.

0:09:31 So given that this is now such a.

0:09:33 Such a long journey.

0:09:35 I think it's, if it's fair to say, , that you've been working on this for 10

0:09:38 years, I'd be very interested in hearing your reflections on the history and

0:09:44 the process of taking Automerge from the beginnings to where it is today.

0:09:50 Yeah.

0:09:50 So when I started working on CRDTs, there was no CRDT for JSON data, for example.

0:09:55 So there were existing data types for sets and maps and counters and

0:10:00 registers and things like that.

0:10:01 So just these kind of little atomic data types, but nothing

0:10:05 that really composed them together.

0:10:07 Uh, oh, and lists as well.

0:10:08 I mentioned that there were data types for lists.

0:10:11 And so in a way, JSON is simple, you know, it's just, you can put maps

0:10:14 inside lists and maps and lists inside maps and compose them arbitrarily.

0:10:19 But there's still interesting questions you have to answer, which

0:10:22 is like, for example, what if one user deletes an object while another user

0:10:26 makes an update inside that object?

0:10:28 How do you merge those things?

0:10:29 And so one of the first research papers I wrote was an algorithm for doing

0:10:34 a CRDT for JSON data, which answered exactly this kind of questions.

0:10:38 And then Automerge started out sort of conceptually as an implementation of

0:10:43 this paper, although we ended up actually choosing different behavior for Automerge

0:10:47 than the paper chose, but you know, after examining a bunch of applications and

0:10:52 what sort of behavior they would want, we came to the conclusion that a different

0:10:55 behavior was better, but that was basically the genesis of the whole thing.

0:11:00 So I can't remember which year that JSON CRDT came.

0:11:03 paper came out, but yeah, I was working on it like in 2015, 2016 ish.

0:11:08 And then, I think it was about 2017, Peter van Hardenberg got in touch with me.

0:11:12 So I knew Peter from back in my startup days because he was running

0:11:17 the Heroku Postgres team at the time.

0:11:20 And our company, which was called Reportive, was one of the bigger customers

0:11:25 of, uh, Heroku Postgres at the time.

0:11:28 And so We had, like, talked to Peter as part of, like, just scaling our database.

0:11:33 years later, I hear from Peter again, because he had read my JSON CRDT paper

0:11:38 and went like, Hey, we want to try actually building some apps with this.

0:11:41 Have you tried actually building some apps?

0:11:43 And I went, Oh, no, no, no.

0:11:45 I just do theory.

0:11:46 You know, I just write a paper and I have this extremely janky Ruby implementation

0:11:51 that actually only does half of what the, what was says in the paper.

0:11:55 So then, , it got together with, , Peter and Ink & Switch.

0:11:58 And I think Ink & Switch was quite new still at the time.

0:12:01 And we did, , this project together in which we essentially

0:12:05 built a adjacent CRDT.

0:12:07 That actually worked in JavaScript.

0:12:09 In fact, Orion Henry wrote the first version of that and brought it to me.

0:12:13 And I went like, yeah, nice API, but no, those algorithms are totally wrong.

0:12:17 And so then we worked together to make the algorithms right as well.

0:12:21 And it was a great collaboration because you know, the Ink & Switch folks were.

0:12:25 Just much better, like API design and also UI design and general app development

0:12:32 than I was, whereas I sort of brought the like more mathematical style of

0:12:35 thinking of analyzing the algorithms and making sure that they were correct,

0:12:39 and that was just a great collaboration.

0:12:41 So yeah, we've, we first wrote this library.

0:12:43 We originally called it Tesseract, but then there was already a

0:12:46 JavaScript library of that name.

0:12:47 So we renamed it to Automerge and that name has stuck since.

0:12:51 So yeah, I think Automerge started around 2017.

0:12:54 And then a few Ink & Switch projects used it, but it was very

0:12:58 much research quality software.

0:13:00 You know, it was extremely slow.

0:13:02 It had bugs.

0:13:03 The file format was extremely inefficient.

0:13:05 So it was kind of impractical to use for most things.

0:13:10 As a vehicle for doing research, it worked quite well.

0:13:13 But then at some point, like it became clear that, okay, we

0:13:16 actually want to start building more ambitious software on it.

0:13:19 And it's not really acceptable if it takes three minutes to

0:13:22 load your document off disk.

0:13:24 So, you know, okay, we have to make the.

0:13:26 figure out a new file format to make the file smaller and, , figure out new

0:13:31 algorithms to make the whole thing faster.

0:13:34 And then also we decided that the Rust implementation would be better.

0:13:37 Um, not so much because Rust is faster than JavaScript, but rather

0:13:40 because it's more cross platform.

0:13:42 And so we can compile Rust to WebAssembly for the web, but we can

0:13:45 also compile it to native libraries for iOS and Android, for example.

0:13:49 And so Orion did a lot of work on the port to Rust.

0:13:53 Uh, again, and a few others contributed to that, and Alex

0:13:57 Good got involved with that too.

0:13:59 But then at some point, two years ago or so, we then made the call to

0:14:03 make the Rust implementation, the primary implementation of Automerge.

0:14:07 So all of that JavaScript, I had, I'd been maintaining the JavaScript implementation

0:14:10 as this research code over the years, but we decided to just completely deprecate

0:14:15 that, throw away all of my old code.

0:14:16 And I've done, actually no work on the Rust code of the implementation.

0:14:20 So that's all been done by Alex and Orion and other people now.

0:14:24 And I've just moved into more of an advisory role, which suits me really well.

0:14:28 You know, I'm very happy to be the one not writing the code.

0:14:32 Other people are much better at writing the code than I am, but I know I can

0:14:34 think about the algorithms and the protocols and the data structures.

0:14:38 And that's what I find fun.

0:14:39 And so then, About a year ago or so, we then declared

0:14:43 Automerge to be production ready.

0:14:44 So at that point, then, you know, the Rust implementation was mature and fast.

0:14:50 and we got a sponsorship thing going with GitHub sponsors, which allowed

0:14:56 people who were commercially using, or companies that were commercially using

0:14:59 Automerge, to sponsor its development.

0:15:01 And that is now supporting the work of Alex Goods, who's now

0:15:04 professionally maintaining Automerge.

0:15:06 And that is just such a good arrangement now.

0:15:07 I'm really pleased with how that's working because it means that we have

0:15:11 high quality software that's being professionally maintained, but at the

0:15:14 same time, you know, we haven't had to go out and raise venture capital, which

0:15:18 we feared that that's, you know, might be at odds with the values of local-first.

0:15:23 And so this way by Essentially bootstrapping it off of

0:15:26 the sponsorship revenue.

0:15:28 I think that aligns everybody's interests very well.

0:15:30 And so that has allowed the project to do very well.

0:15:33 That is an incredible journey.

0:15:35 And I mean, this is for an open source project.

0:15:38 Particularly, I think most people use right now, Automerge still in a JavaScript

0:15:43 context for a JavaScript library, where I think you're thinking more in terms of dog

0:15:49 years, Automerge is really a monumental project and it has come incredibly far.

0:15:54 So I'm super excited for that.

0:15:57 So where's the project today?

0:15:58 You've mentioned that it's reached production readiness

0:16:02 around about last year.

0:16:03 Does that mean it's the APIs are final, the research behind it is

0:16:08 concluded and now it's just performance optimizations or what is left to do?

0:16:13 And I just, there's so much, so much we still want to do with it.

0:16:17 So what we mean with production ready is like, there are no egregious

0:16:20 bugs that we know about and the performance is good enough that.

0:16:24 You know, it's plausibly usable in real software, which some of the

0:16:28 research code definitely was not, but it's got much, much better, but

0:16:31 in terms of features, like it, I think we've only really just started.

0:16:36 So what Automerge started with is a basic JSON model, so you can have maps.

0:16:41 Where the keys are strings and the values can be either nested maps, or they can

0:16:45 be nested lists, or arbitrary recursion of those things, or primitive values

0:16:50 like strings and numbers and booleans.

0:16:53 And that's it.

0:16:53 Then, okay, we, we added counters because actually counters are

0:16:56 actually not very useful, but everyone seems to use them for demos.

0:17:00 So we include the counters so that we can have the demo as well.

0:17:03 Then, a big thing we added was rich text.

0:17:06 So that's something that a lot of applications need is.

0:17:10 text with formatting.

0:17:11 And the first version of that is released and implemented, though the

0:17:16 first version only supported inline formatting, such as bold and italic

0:17:20 but not block elements like headings or bullet points or things like that.

0:17:24 And so there's an updated version of that coming soon, which adds

0:17:28 support for block elements too.

0:17:29 So this is now nice.

0:17:31 You can put rich text anywhere inside a document.

0:17:33 So.

0:17:33 You know, it's, if you want to make a Google Docs equivalent thing,

0:17:36 you can do that, but you could also have, for example, a vector

0:17:39 graphics software that has some rich text just inside the text boxes.

0:17:42 And the rest is a drawing consisting of like arrows and lines and

0:17:48 freeform, whatever you want.

0:17:50 And so the JSON type document model has allowed extension in those

0:17:54 directions very well, but there's so much more we still want to do.

0:17:58 So.

0:17:58 Like an obvious missing thing is undo in collaborative software is actually quite

0:18:02 subtle in terms of the behavior you want.

0:18:06 And so in particular, it's not generally the case that you want to undo the

0:18:09 most recent operation, the most recent change to the document, because the most

0:18:13 recent change to document might've been made by somebody else in a part of the

0:18:16 document that you're not looking at.

0:18:17 And so.

0:18:18 Undoing somebody else's change in a completely different part of the

0:18:20 document is definitely not what you intended when you hit command Z.

0:18:24 So actually doing undo well requires, inspecting the editing history of

0:18:30 the document, which we can do because Automerge keeps the editing history

0:18:33 anyway, but actually surfacing that and making the right APIs

0:18:37 the right underlying algorithms, that's still some work in progress.

0:18:40 Another thing that we've long Try to add as a move operation so that,

0:18:44 for example, you could reorder items in lists or if you have a, say a

0:18:49 file system tree, you could drag a directory from one location to another.

0:18:54 That is also quite subtle to implement because you have to answer

0:18:57 questions like, what happens if two users can currently move the

0:19:01 same item to two different places?

0:19:03 You don't want to duplicate it.

0:19:04 In that case, you want to just.

0:19:06 pick one of the destinations.

0:19:07 Or you get weird things where like you have A and B which are siblings and one

0:19:12 user moves A to be a child of B while concurrently another user moves B to be a

0:19:16 child of A and now if you're not careful you could end up with a loop between A

0:19:20 and B and That would be a mess as well.

0:19:22 So to move operation very carefully has to handle those kinds of cases.

0:19:26 You know, we wrote the research paper about it several years ago, but

0:19:29 actually turning that into the kind of production quality code as part of

0:19:33 Automerge is still ongoing project.

0:19:35 And so those are kind of the near term things that we want to.

0:19:39 Features, examples of features that we want to add to Automerge.

0:19:42 Other stuff we want to do better are, for example, synchronizing

0:19:45 large collections of documents.

0:19:47 So at the moment, Automerge really just deals with one document at a time.

0:19:50 But in many apps, you know, you might have a collection of 100, 000 documents and

0:19:54 most of them don't change most very much.

0:19:57 So we need a protocol for efficiently figuring out which of those many documents

0:20:00 have changed and then synchronize only those which have changed and

0:20:07 Collections vs Databases

0:20:07 So you mentioning, uh, collections and that right now Automerge is only working

0:20:13 on sort of a single document level, but you want to go further into collections.

0:20:18 So collections makes me think of databases.

0:20:21 Can you contrast a little bit of how someone who thinks about data

0:20:26 primarily in terms of databases, how your brain needs to change to think

0:20:32 primarily in terms of Automerge and how.

0:20:35 What in the future where someone uses Automerge, do they still use databases?

0:20:40 Do you think about the data that Automerge just manages sort of

0:20:44 like as an implicit database?

0:20:46 How should I think about that in the future?

0:20:48 Yeah, I think there's, there's a lot of similarities between

0:20:51 Automerge and the database.

0:20:53 And we've sort of like internally joke that, you know, we're not writing a

0:20:57 database because writing a database is a crazy thing to do that nobody should

0:21:01 like try to write their own database, but it looks like we are writing a database.

0:21:05 And shh, don't tell anybody.

0:21:07 So like, yeah, a collection of documents definitely starts smelling

0:21:11 quite a lot like a document database.

0:21:13 There's sort of differences in data model and sort of a usage

0:21:17 pattern compared to like how.

0:21:20 Mainstream databases are built, you know, you can take MongoDB or even

0:21:24 the JSON support in Postgres and they give you a JSON data model.

0:21:27 And so in that sense, it's similar ish, but they don't really have

0:21:31 the conflict resolution aspect.

0:21:32 So they assume that all of your rights go to a single leader server and that

0:21:38 server just serializes all of the updates.

0:21:40 And therefore you never end up in a situation where you have to.

0:21:43 merged to diverged versions of the document.

0:21:46 Whereas in local-first, I mean, the whole point of local-first is that you

0:21:49 have to data locally on your own device.

0:21:51 So that means you inevitably end up in having to do this

0:21:53 kind of conflict resolution.

0:21:54 So even though the data model is maybe on a high level, similar to something

0:22:00 like MongoDB, the data synchronization and the conflict resolution aspects.

0:22:04 is something that's very different from a server oriented database.

0:22:08 So you could say it's a more client oriented database where it is intended

0:22:12 to be embedded into client software.

0:22:15 And that would get us quite close to, I think, what Automerge wants to be.

0:22:19 So we have the beginnings of something like that in a library called

0:22:23 Automerge repo, which is it's sort of a wrapper library around Automerge.

0:22:28 Automerge itself is basically just an in memory data structure library.

0:22:31 It does.

0:22:32 Nothing with disk or network, it's just purely an in memory data structure, but

0:22:37 Automerge repo adds the IO layer to it.

0:22:40 And so it provides adapters for like storing data, storing documents on

0:22:44 disk and loading them again, and for synchronizing things over the network.

0:22:48 And it also manages a collection of documents.

0:22:51 And so this is how the whole thing starts looking a bit more like a database.

0:22:55 Another difference I would also note compared to something like MongoDB is

0:22:59 that a lot of these server side databases assume that a single document actually

0:23:04 doesn't get updated all that often.

0:23:06 though you know, you might do 100 writes to a document over

0:23:09 the lifetime of a document.

0:23:10 Whereas the types of Documents we're thinking about every keystroke when you're

0:23:15 writing a text is a right to the document.

0:23:17 And so you can easily accumulate hundreds of thousands of rights to a

0:23:20 document over the lifetime of a document.

0:23:22 And if you have that sort of high rate of updates, that forces entirely different

0:23:28 data structures and data formats.

0:23:30 so I think if you try to use MongoDB or Postgres, And, you know, write a

0:23:34 new version of a document on every keystroke, they would not perform very

0:23:37 well because they actually write an entire new copy of the document to disk

0:23:41 every time you update the document.

0:23:43 And, you know, that's just not going to work if you're making hundreds of

0:23:46 thousands of updates to a single document.

0:23:48 And so that's why Automerge then has got a whole bunch of clever data

0:23:53 structures and file formats in order to deal with those very frequent, very

0:23:58 frequent, but small updates to documents.

0:24:00 Automerge as an app data-layer

0:24:00 So I'm personally, as an app developer, I try to think about like, what is

0:24:06 the best foundation to build an app?

0:24:08 And so you've mentioned that the early prototypes that you've built for what

0:24:12 later became Automerge, you built with Ruby, maybe you probably built apps with

0:24:18 Rails in the past and Rails was really a great foundation to build a new app.

0:24:22 I'm wondering.

0:24:24 In the next five to 10 years, when you put on your local-first lens, like what

0:24:30 is the rails equivalent for local-first?

0:24:34 Should I think about Automerge becoming more and more like a new

0:24:40 kind of rails that's less of an app framework, but more of like a.

0:24:44 A data framework that takes over more and more app framework responsibilities.

0:24:50 Can you paint a bit of that picture for me?

0:24:52 I think the analogy is really good, actually.

0:24:54 I, yeah, I would think of Automerge maybe like as the active record component of

0:24:59 Rails or so it's the data component.

0:25:01 it's not a whole app framework by itself, but you could definitely

0:25:04 imagine building an app framework where it's an important part of it.

0:25:08 And the rest of the app framework would have to do stuff like reactivity of

0:25:12 updating the user interface in response to edits that have happened and figuring

0:25:17 out how to handle user inputs, blah, blah, blah, all that sort of things.

0:25:21 So I think that framework doesn't exist yet, but I would really love

0:25:26 to see somebody build the equivalent of rails, for local-first software.

0:25:32 So what are the missing pieces for that?

0:25:34 So you've mentioned that the way how you and Peter have met is through

0:25:39 Peter's previous work on Heroku.

0:25:42 So Heroku, I think, played a major role in making rails.

0:25:46 So.

0:25:47 Easy for developers since it's not just easy to work with it locally, but it's

0:25:50 also easy to roll it out into production.

0:25:53 So what does it mean for me right now, if I'm building my first little app, my first

0:25:58 little prototype with Automerge locally, what does it mean for me to roll that out,

0:26:03 that I can share it with my friends and use it sort of in a, in a bigger scale?

0:26:07 Yeah.

0:26:07 So at the moment It still requires a fair amount of.

0:26:12 The stuff you have to write yourself.

0:26:13 So for example, you know, we provide as part of automated repos, some

0:26:17 integrations with like React or.

0:26:19 It's failed to also as examples of how you can build use interfaces

0:26:23 on top of Automerge, but you know, it's very just basic example code.

0:26:27 I think it's not like an entire framework, but it's something that hopefully

0:26:31 people can use to start building apps.

0:26:34 Likewise for like the network synchronization.

0:26:37 We have a sync server, it's open source and quite simple, and you can just

0:26:41 deploy it yourself, but it lacks all of the features that you might want.

0:26:44 So there's no authentication, for example, which is something

0:26:48 probably most apps will want.

0:26:50 Really, we would like end to end encryption for the data synchronization

0:26:53 for many applications as well, so the server doesn't have to store

0:26:57 the plain text of your documents and a whole bunch of other things

0:27:01 related to synchronization.

0:27:03 So I think we will always want.

0:27:05 The option for people to develop these things themselves and run

0:27:08 it themselves if they want to.

0:27:10 But at the same time, I think there's a lot that could happen around having

0:27:14 it's kind of packaged up in a nicer way where maybe there's a hosted cloud

0:27:17 service that just provides a syncing service for local-first apps and if you.

0:27:23 Choose a certain framework which might be Automerge based and a certain

0:27:27 networking layer, then you can just use this synchronization service and

0:27:30 you don't have to run your own servers.

0:27:32 And that would be the sort of Heroku equivalent I would see of, of this world.

0:27:37 So I really hope somebody builds that.

0:27:39 and a part of the vision of local-first is that, you know, we'll probably

0:27:44 have to have cloud services involved in this data synchronization, but

0:27:48 if we can make the synchronization protocols an open standard.

0:27:52 Then hopefully there can be multiple different providers that can interoperate.

0:27:55 And so if it decides that one particular provider has changed their pricing

0:28:00 in a way that's too expensive or they're too unreliable or whatever.

0:28:03 You should be able to just point your app at a different provider

0:28:05 and just continue working.

0:28:07 And in some way, like Heroku had this as well, in that, you know, you didn't

0:28:10 have to write custom, you have to use custom Heroku APIs to write your

0:28:15 app, anything, you know, you just write a standard Rails app and you

0:28:18 deploy it by pushing to a Git repo.

0:28:20 And there was just a small amount of Heroku specific configuration.

0:28:23 And if you wanted to, you would always be able to take your app and run it on

0:28:26 a different hosting provider as well.

0:28:28 And so again, I think that's sort of Style I would like for local-first software

0:28:33 too, that we have this interoperability and we have multiple companies, could

0:28:38 be startups, could be big companies.

0:28:40 I don't really mind providing this kind of cloud syncing services for

0:28:44 local-first software in such a way that it can interop and you can easily

0:28:48 switch from one provider to another.

0:28:54 Thoughts on P2P

0:28:54 That sounds incredible.

0:28:56 and I'd love to love to see that.

0:28:58 It kind of makes me a bit reminiscent of the days of like

0:29:02 torrenting, et cetera, peer to peer.

0:29:04 We've talked to Peter in, uh, in a previous episode about peer to peer and

0:29:09 there's some real technical challenges that we, that need to be overcome and

0:29:14 maybe can't be overcome in the, in the shorter term, but I'm wondering.

0:29:19 How that sort of more abstract syncing service would compare to

0:29:23 some of the existing technologies.

0:29:25 I've mentioned peer to peer there because what was so interesting about

0:29:29 it is like that, it's that you formed the sort of ad hoc network where

0:29:33 people didn't, there was no server where it's something needed to be.

0:29:37 deploy to, but things just started working together.

0:29:40 So with that syncing service that you're mentioning, that could be kind

0:29:44 of a platform agnostic, would that be similar to peer to peer in that regard?

0:29:50 Or would you still need to kind of deploy a quote unquote backend

0:29:54 app to that syncing service that it actually does perform the work

0:29:58 you want to have performed for your particular local-first step?

0:30:02 I think.

0:30:03 The best results for, you know, for user quality of software would be for it to

0:30:09 use peer to peer when it's available and use a cloud service when not.

0:30:13 I think doing only peer to peer is really difficult because, for example,

0:30:17 you can only talk to another peer while it's online at the same time.

0:30:20 And if you've got two devices that are never online at the

0:30:22 same time, then you can never synchronize data with between them.

0:30:25 So.

0:30:26 That sucks pretty badly because people do just close their laptop from time to time

0:30:29 or turn off their smartphone or whatever.

0:30:32 So I think pure peer to peer just doesn't work reliably enough.

0:30:36 Plus there's all of the problems with like NAT traversal and just the networking

0:30:40 infrastructure doesn't work well enough.

0:30:42 However, when peer to peer does work, it's amazing.

0:30:44 And so if you've got two devices on the same network in the same building, it

0:30:48 seems outrageous to send all of your data via AWS US East One in Virginia,

0:30:54 if you could just send it via the local wifi from one device to another, right?

0:30:59 So then opportunistically using peer to peer when it happens to

0:31:03 be available is an amazing thing.

0:31:05 And it's, you know, it provides a lot of robustness and

0:31:08 independence from the network.

0:31:10 So that, for example, if you've got your laptop and your phone, and you're

0:31:13 in some remote location where you don't have internet access, you can still

0:31:16 sync data between the two of them.

0:31:18 And, you know, we have a sort of rudimentary version of that with

0:31:21 say, AirDrop on Apple devices, but that's like one off file transfers

0:31:25 really should be able to just do that for live synchronization as well.

0:31:29 So I feel like the combination of Cloud and peer to peer just

0:31:33 gives you capabilities that.

0:31:35 only cloud or only peer to peer doesn't.

0:31:37 And so that really seems to me like the most promising

0:31:40 direction is to combine the two.

0:31:42 And the nice thing with CRDTs is that they just don't care what

0:31:46 your networking layer is, right?

0:31:47 All you need is some way of getting some bytes from one device to another.

0:31:51 That's all they need.

0:31:52 And whether that goes via a local network or peer to peer over the internet via

0:31:56 a distributed hash table or via a cloud service or via multiple cloud services.

0:32:01 CRDT doesn't care that any communication channel will do.

0:32:06 That makes a, makes a lot of sense.

0:32:08 And this sort of hybrid nature where it optimistically uses the close peer

0:32:13 connection, where if that works, then the experience is even better, but it kind of

0:32:17 falls back to the cloud where it needs to.

0:32:20 And it also will give you some benefits maybe such as backup, et cetera.

0:32:27 Generic sync servers

0:32:27 So one thing with these cloud services.

0:32:30 Is that, you know, in the traditional way of building web apps, a lot of your

0:32:34 application logic lives in the backend.

0:32:36 You know, you have a backend database running on a server and then you wrap it

0:32:40 with some server side code written using some server side web framework, and then

0:32:46 you put it all behind a load balancer.

0:32:47 And so you've got this, all this huge infrastructure on the backend.

0:32:50 And one of the promises I see of local-firsts.

0:32:54 Is that actually because we've moved all of the interesting application

0:32:58 logic to the client app, to the end user device, the server side that remains

0:33:03 can be really simple and actually not contain any app specific code at all.

0:33:07 so my vision for these syncing services for local-first software

0:33:13 is that there's virtually no application code on the server.

0:33:16 The server is just this generic piece of software where you just

0:33:19 take it off the shelf and run it.

0:33:21 And, you know, you can just use a hosted cloud service.

0:33:24 Maybe AWS will run a local-first backend service and charge you a

0:33:29 few cents per gigabyte to use it.

0:33:31 And that would be amazing.

0:33:32 It can be, you know, this generic thing.

0:33:34 So you don't have every single app reinventing its own backend service.

0:33:38 You know, so much work in building a web app goes into reinventing this

0:33:43 backend infrastructure that every single company has to reinvent again.

0:33:47 And so if we can make the data sync protocol and the data storage on the

0:33:51 servers efficient, like loading and synchronization of large collections of

0:33:55 documents, all of that can be generic.

0:33:57 So if one person is then building a graphics app and another person

0:34:00 is building a spreadsheet and another person is building a

0:34:03 document editor, they can all use.

0:34:05 The same syncing service as the backend that I think is part of the economic

0:34:10 value proposition of, local-first software is that actually, you know,

0:34:14 we can just save ourselves a huge amount of software engineering work

0:34:17 by making these backends generic.

0:34:20 I couldn't agree more with that vision.

0:34:22 I totally want that.

0:34:24 Do you think Automerge will be the foundation for that?

0:34:27 Is there something more generic, something more abstract of like an open syncing

0:34:33 protocol, whatever that might be?

0:34:35 and Automerge would be one of multiple that implement compatibility with that.

0:34:41 If someone is interested in that vision right now, is there anything that

0:34:45 someone can take a look at and maybe deploy an early version of that already?

0:34:50 Yeah, I think Automerge is trying to be a solution for that, and I would

0:34:55 love for the Automerge protocols to be open standards one day.

0:35:00 I think, you know, we've thought about engaging with the IETF, for example,

0:35:04 for standardization, although I think right now is just too early because

0:35:08 it's all still very much work in progress and it hasn't settled enough

0:35:11 yet to be ready for standardization.

0:35:13 But in the long term, that's something we would definitely like.

0:35:15 And we would like there to be multiple interoperable implementations that

0:35:19 can all talk to each other and which are compatible with each other.

0:35:22 So yes, whether that ends up being exactly the Automerge wire

0:35:25 protocol or something a bit more abstract, I I'm not entirely sure.

0:35:29 I mean, other people are working on similar things.

0:35:32 So one project that comes to mind is braid, for example, which they are

0:35:36 engaging with the IETF and they're trying to build some standards or extensions

0:35:42 to HTTP to enable data synchronization.

0:35:45 And they're trying to do it in a way which is not specific to any particular CRDT

0:35:49 library or even using other approaches such as operational transformation.

0:35:53 So they're trying to be generic.

0:35:55 What I'm not sure yet is whether you can be generic and still

0:35:58 get good enough performance.

0:35:59 that's a trade off there.

0:36:00 So in the automotive sync protocol, we're able to make a lot of optimizations.

0:36:04 because we know a lot about the types of data and how they're exchanged and

0:36:09 we can control the data compression and the data formats and so on.

0:36:13 Because we control the stack, we can do a lot of interesting optimizations

0:36:18 there, which are more difficult if you have a generic protocol.

0:36:21 So I think that waits to be, we'll have to wait and see how

0:36:26 that develops in the future.

0:36:27 And I certainly believe some kind of protocol will become a widely

0:36:31 used open standard for synchronous for data sync in local-first apps.

0:36:36 It might be Automerge or it might be something else, but that's

0:36:38 generally the direction we're heading.

0:36:41 I'm really looking forward to that point.

0:36:43 I mean, local-first already today.

0:36:47 Is providing so much value, both to developers and to end users by

0:36:52 simplifying the developer experience by making apps faster, giving

0:36:56 you data ownership, et cetera.

0:36:58 But I think once we've reached that point where there's a more.

0:37:02 General purpose, generic syncing service that works possibly also across apps

0:37:07 that people can put a little node of that, for example, on a Raspberry Pi

0:37:12 running next to their home router.

0:37:14 I'm really looking forward to that.

0:37:16 So I can't wait for that.

0:37:17 Looking forward to maybe having you back in a year from now to hear

0:37:21 some more progress update where things add in that regard, but I'm

0:37:25 really looking forward to that.

0:37:26 Yeah, it's good to be very exciting to see what people build.

0:37:30 Bluesky

0:37:30 So besides your work on Automerge, you're also involved in the new project called

0:37:37 Bluesky, which came out of Twitter or now called X as I think was sort of also like

0:37:44 a research project inside of Twitter.

0:37:46 And that was now took its own path.

0:37:49 So, and you're involved there as an advisor.

0:37:52 I'm wondering whether there's any connection to your interest

0:37:56 in local-first as well, or whether those are separate paths.

0:38:00 That is a sort of, um, high level connection.

0:38:03 I would say, you know, Bluesky is a social network it's decentralized and it aims

0:38:08 to provide a bunch of features which just don't exist on like Twitter and

0:38:14 Facebook and a centralized social network.

0:38:16 So in particular, it's built on an open protocol and there are multiple

0:38:20 different implementations, interoperable implementations of that protocol.

0:38:24 And moreover, multiple hosting providers that can run

0:38:29 different parts of the system.

0:38:30 And Bluesky is designed in such a way that it's very easy to move your account

0:38:35 from one provider to another, for example.

0:38:37 So for example, if you don't agree with one provider's moderation policies,

0:38:42 it's fine, you can go to a different one, who's more aligned with you, or

0:38:45 you could even run your own if you're technically, enthusiastic enough.

0:38:49 So on a technical level, a lot of the implementation of.

0:38:52 Bluesky looks quite different from something like Automerge.

0:38:55 There's no CRDTs in Bluesky, for example, but the sort of philosophy and the

0:38:59 values that it embeds in the software are actually quite similar to local-first.

0:39:04 This idea that users should control their own data, you know, you should

0:39:09 always be able to have a copy of your own data that you can just take with

0:39:12 you or move to a different provider.

0:39:14 That concept is.

0:39:15 Exists very much across both local-first and Bluesky in the case of Bluesky, of

0:39:20 course, you know, it's a social network.

0:39:21 So the entire social network consists of the data from many different people, the

0:39:25 posts, the likes, the follows and so on.

0:39:27 But the way it works is that all of the data from a particular user goes

0:39:31 into a repository, which you can think of a bit like as a git repository.

0:39:35 And so every post that you make, every user you follow, every like you make.

0:39:41 Every user action of your own goes into your own repository, and that is your

0:39:45 own, and you can download a copy of it, and on the server, it's literally just

0:39:48 a SQLite database, there's a separate SQLite database for every single user,

0:39:52 and you can just get a copy of it, and even if your provider just suddenly

0:39:55 disappears, you can upload a copy of that.

0:39:58 To a different provider, change your user ID to point to the new provider

0:40:02 and everything just continues working.

0:40:03 And so that idea of having easy interoperability and easy migration

0:40:08 paths from one provider to another, that's something that I think both

0:40:13 Bluesky and local-first share.

0:40:15 But then the, otherwise the implementations end up being different.

0:40:18 Like it doesn't really make sense to have a local-first social network, because for

0:40:21 example, working offline makes sense if you're talking about a document editor.

0:40:25 It doesn't really make sense in a social network because the whole point

0:40:28 is communicating with other people so that the offline aspects, for example,

0:40:31 don't really feature in Bluesky, but sort of the data ownership aspects do.

0:40:37 A social network with local-first approach

0:40:37 I agree that there is a big difference between a social network like Bluesky.

0:40:42 And more like productivity or personal apps, I'm still curious,

0:40:47 given that they share a bunch of similar values and some technical

0:40:51 similarities to better understand what if you were to try to build Bluesky

0:40:57 with a more local-first approach.

0:40:59 There's a few technologies that leverage syncing behavior for

0:41:02 SQLite or maybe replacing SQLite with Automerge just in theory.

0:41:08 I'd be very curious to understand, is there a certain impedance mismatch

0:41:13 that you'd be running into by trying to build something like a social

0:41:17 network with a local-first approach?

0:41:20 I'd be curious to understand where you really run into troubles there.

0:41:24 Yeah, so the data for one individual user, you could easily put in

0:41:28 an Automerge document just as well as you put it in SQLite.

0:41:32 I think that that would make fairly little difference that you

0:41:34 could certainly use Automerge to synchronize the data for a given user.

0:41:38 What's different in a social network is that you have these global

0:41:42 views, which are aggregated over everybody, which is just not something

0:41:46 that exists in a document editor.

0:41:47 So like in a social network, you know, want to know all of

0:41:49 the likes on a particular post.

0:41:51 And if each user writes their like to their own repository, that means you

0:41:56 have to index all of the repositories, look for all of the repositories that

0:41:59 contain a like of a particular content, piece of content, and then add them up.

0:42:02 And that gives you your number of likes.

0:42:04 Or if you want to get all of the replies on a particular thread, again, you

0:42:07 have to look at all of the posts that have been made by any user anywhere

0:42:10 in the network and find all the sign reply to a particular piece of content.

0:42:14 That just requires this kind of global view of everything, if you want to do

0:42:18 it properly, you can kind of do it in a somewhat local version, which is kind of

0:42:24 what ActivityPub and Mastodon try to do.

0:42:26 So there's no global index in with Mastodon.

0:42:29 There's, you know, no, nobody really maintains a copy of the entire network,

0:42:33 but if user A replies to user B, then the User A's server sends a notification

0:42:39 to user B's server, and therefore user B's server finds out about this reply,

0:42:43 just adds it to its local database.

0:42:45 But that way you can end up with a problem of different servers seeing

0:42:49 different reply threads, because not every reply is notified to every server.

0:42:54 And so then you get Weird inconsistencies are depending on which server you're on.

0:42:59 You see a different set of replies to a particular post, which is a

0:43:02 bit strange, but that's just a part of the way that Mastodon works.

0:43:06 And that's something we try to avoid in Bluesky by instead saying,

0:43:10 okay, like the individual repos is just a single user's data.

0:43:14 And then in order to do something like a reply thread, Actually, we have a big

0:43:18 indexing service that works a bit like a web search engine, which crawls the

0:43:22 content of all of the individual user repositories and aggregates it all.

0:43:26 And assembles the reply threads.

0:43:28 And so that's something where there's no equivalent to that in local-first

0:43:31 software, I think, because that's just something that like document editing

0:43:35 style apps just don't need to do.

0:43:37 They just don't need to actually do aggregations across many apps.

0:43:40 I would say that maybe an exception to that is if you want to do search across

0:43:44 many documents, for example, in that case, you do need to build a search index.

0:43:48 But it's still a search index containing only the documents for a particular

0:43:52 user, or maybe all of the documents for a particular company, but it's not all

0:43:56 of the documents in the entire world.

0:43:57 That makes a lot of sense.

0:43:59 And I think it's sort of intuitive where like local-first starts out really dense

0:44:03 about like your own documents, maybe the documents just on your other device

0:44:07 or on the device of a friend of yours.

0:44:10 So the network, the suspending is like still pretty dense and this is what

0:44:15 makes all of those technologies work almost trivially, but the more you go

0:44:20 global with this to sort of like social network level, this is where that, uh,

0:44:25 is really put, put to the test and it's probably not the best starting point

0:44:30 that being said, I think this might still also be an interesting project

0:44:33 for some, some folks who might want to rebuild an app in a local-first way, but

0:44:41 there might still be some more global nature to some parts of the data that

0:44:45 maybe could be complimented in some way.

0:44:48 Maybe there's some new architectural patterns that are emerging.

0:44:52 for Overtone, for example, I'm trying to build the app in a local-first way,

0:44:57 where really like all of the, your music metadata and actually your app.

0:45:01 Your music data is locally available if possible, but music as such has

0:45:08 also a very global aspect to it right in the world of Spotify, you have

0:45:13 practically like infinite amounts of music that you can't just like all.

0:45:17 locally download there's too much and also other people have other kinds of music.

0:45:23 So I'm also trying to explore sort of hybrid solutions there, which are

0:45:28 really interesting design challenges.

0:45:30 I'm eager to share more of that on a separate occasion.

0:45:34 And you've actually already provided me some great feedback and some

0:45:37 personal conversations before.

0:45:39 So, yeah, this is a really interesting case study in a.

0:45:42 And I love exploring pushing local-first a little bit to its

0:45:46 limits through various app use cases.

0:45:50 So your involvement in Bluesky is a very interesting, at least

0:45:54 theoretical case study at this point.

0:45:56 So you've mentioning working offline for Bluesky.

0:46:01 And that it might be not the primary use case.

0:46:05 I want to use this as a segue, as I see a little bit of confusion sometimes

0:46:09 on Twitter, where people synonymously talk about local-first and offline

0:46:15 first, and there is a difference and I want to share a little bit

0:46:20 more broadly what that difference is, what is a offline-first app?

0:46:24 What is a local-first app?

0:46:26 Where are they different?

0:46:27 So maybe you can share your perspective on that topic.

0:46:31 Yeah, I would say that local-first includes offline first, but it tries

0:46:34 to be a lot more than that as well.

0:46:37 So the term offline first existed long before local-first, and

0:46:40 obviously we were aware of it.

0:46:41 And in fact, we modeled the term local-first after offline first to some

0:46:46 degree, because we thought it was a good term, and it captured something that we

0:46:50 wanted, but it was not really sufficient.

0:46:53 Because yes, having users being able to work offline is,

0:46:55 it's Obviously a good idea.

0:46:57 It seems ridiculous if people can't work offline, but we wanted to also

0:47:02 capture this idea of personal data ownership so that the data is yours

0:47:07 and it can't be taken away from you.

0:47:09 So in particular, for example, if there's some software that Stops working.

0:47:14 If the company that made the software goes out of business, then I would

0:47:18 argue that's not local-first.

0:47:19 So it could be offline first.

0:47:21 So it could be that, you know, it's a nice Google Docs style document

0:47:25 editor just take Google Docs as an example, like, okay, you know, it.

0:47:29 It works fine.

0:47:30 You can even, if you choose the right settings, make it work

0:47:33 offline, and you can, you can edit your docs in whatever way you want.

0:47:37 But if Google decides to just discontinue the service, hypothetically, or if

0:47:42 Google just decides to block your account because some automated system has flagged

0:47:46 you as violating the terms of service, whether you did or not doesn't matter.

0:47:50 You basically have no recourse.

0:47:52 And at that point, you're just locked out and you lose all of your data.

0:47:54 And so The fact that the app allowed you to work offline is kind of beside

0:48:00 the point then because you still don't have ownership of the data.

0:48:03 And so it's that, this idea that you should not, never be

0:48:07 locked out of your own data.

0:48:09 That's really something that we wanted to capture in the idea of local-first.

0:48:13 And so now if you can, can't be locked out of your data, that

0:48:16 kind of implies that you must have the data on your own device.

0:48:19 Which then also implies that you can probably edit it offline, because if

0:48:23 you've got it locally anyway, then why not just enable offline editing?

0:48:28 But the kind of the chain of reasoning goes in a different direction.

0:48:31 We would start with the data ownership and then offline editing

0:48:36 Local-first vs Offline-first

0:48:36 That makes a lot of sense, and I think that makes it really clear.

0:48:40 I see a lot of people referring to offline first, almost synonymously as

0:48:44 to some glorified version of aggressive caching, but the way how you lined it

0:48:50 out here makes that a lot more clear.

0:48:52 And I suppose this is not just having access.

0:48:54 To some form of the data that you can like download a CSV from all of your user

0:49:00 data, but that the software is actually still fully functional or as functional

0:49:05 as somehow possible, even in the worst case where the folks who are building the

0:49:10 software are no longer able to work on it.

0:49:13 And to really provide a better alternative to SaaS software X shuts

0:49:18 down and the entire app is just.

0:49:22 It's gone with probably all of your data.

0:49:24 So I think that's a really clear alternative.

0:49:28 Yeah, exactly.

0:49:29 Like I, you know, you do get this thing all the time when some SaaS, startup

0:49:33 shuts down and they give you two weeks to download a zip file of JSON.

0:49:39 You know, what can you do with that zip file of JSON?

0:49:41 You can't re upload it into any other software.

0:49:43 So basically it's just big fat middle finger to the users.

0:49:46 So really local-first is an attempt to overcome that in a way that,, you

0:49:52 know, at the very least, you know, for example, if the software can operate

0:49:55 peer to peer, that could mean then at least you have a peer to peer fallback.

0:49:59 So even if all of the cloud services go away, it could still operate.

0:50:02 Or if it uses a backend service that's interoperable, so you can

0:50:06 switch it to a different provider.

0:50:07 That means then you could still use the software that, you know, maybe you

0:50:11 purchase a license to the software in sort of the traditional non subscription type

0:50:16 business model, and then you could use it in perpetuity, perhaps by pointing it at a

0:50:21 different syncing backend, or in the worst case, running your own syncing backend,

0:50:24 if you really must, but ideally just switching it over to a different provider.

0:50:28 And I'm hoping that's like the local-first term should , try to encapsulate

0:50:35 What does local-first need to really succeed

0:50:35 I fully agree.

0:50:36 I'm curious now that you've been thinking about local-first now for more than 10

0:50:42 years, and we've come really far in that period of time when it comes to CRDTs

0:50:49 and Automerge is production ready to use.

0:50:52 At the same time, given the ambitions that you've outlined for, it feels

0:50:57 like we're just getting started.

0:50:59 I do think that already is a good time to really switch your default instead

0:51:05 of going cloud first, go local-first for app use cases where it's possible.

0:51:10 But I think it's still very much the minority of developers.

0:51:14 Who built this way.

0:51:16 And given that you've seen such a broad spectrum of different data

0:51:20 architectures that you've also outlined brilliantly in the book, Data Intensive

0:51:24 Applications, I'm curious what you see as things that still hold back

0:51:32 local-first to become more mainstream.

0:51:34 Is it just a matter of time that there's more progress around

0:51:38 Automerge around other technologies?

0:51:41 Are there some other things that you would like to see?

0:51:44 Yeah, I mean, there's, it's such a big conceptual shift, I think, which is a

0:51:48 challenge, you know, because there's a huge amount of say, educational

0:51:53 materials on how to build web apps, you know, entire university courses

0:51:57 are built around the idea of teaching people how to do this thing, coding boot

0:52:01 camps, documentation for huge amount of software projects, books, videos, you

0:52:07 name it, you know, everything is that there's just so much infrastructure

0:52:10 on teaching people how to build it.

0:52:13 Apps in the centralized cloud way and local-first is just much newer.

0:52:17 And so it hasn't had the benefits of decades of investment.

0:52:21 Moreover, you know, there's the cloud providers have a strong

0:52:23 commercial incentive to produce good quality documentation

0:52:26 on how to use their services.

0:52:28 So it's not surprising that there's good documentation available for those things.

0:52:32 And I'm hoping that at some point there will be big companies built

0:52:36 on the local-first paradigm as well, which then are similarly able to.

0:52:40 Fund the development of this sort of documentation and learning

0:52:44 materials and so on, but it's just going to take a while.

0:52:47 So I would see that as probably one of the biggest challenges.

0:52:51 It's just a new way of thinking and people are not familiar with it.

0:52:55 I think once people get it, then a lot of people seem to get excited

0:53:00 about it and buy into it as well.

0:53:02 And, you know, sometimes there's.

0:53:04 There's concerns that, you know, this is not for all apps.

0:53:07 And I'm the first to acknowledge, yes, local-first is not for every single app.

0:53:10 There's some apps which are best to build in a sort of centralized cloud way.

0:53:15 That's totally fine.

0:53:16 So I think part of it is also helping people understand for which

0:53:19 types of apps would you pick a local-first approach versus for which

0:53:22 do you pick a centralized approach.

0:53:24 And then of course, like just the general ecosystem needs, needs a lot more work.

0:53:29 So, you know, the software libraries that we use, things like Automerge

0:53:33 are They're pretty robust already, but it's still fairly new software compared

0:53:37 to, you know, a web framework that has been around for 20 years or more.

0:53:42 Uh, so one thing that I find encouraging is just within the

0:53:46 last year or so, it seems that.

0:53:48 A whole bunch of startups have started using the local-first term just on

0:53:54 their product marketing pages as just something they assume readers

0:53:59 of the page will be familiar with.

0:54:01 And that I find very encouraging.

0:54:02 It's, it sort of shows that, you know, people are buying into the idea

0:54:06 that enough that they are willing to, you know, have their product

0:54:09 foundation on it and their marketing around it, explaining to users why

0:54:14 it's valuable to have local-first.

0:54:16 And I think this is the way it will succeed.

0:54:18 You know, it's the local-first will succeed only if many, many people in

0:54:21 many, many different companies are able to use it to their advantage in

0:54:25 order to provide a better experience to their users and their customers.

0:54:29 And Build sustainable businesses on top of the idea and so on.

0:54:32 So It has to work for everybody.

0:54:35 And I think it will work for everybody because it's, you know, it's a win win.

0:54:38 it's good for the app developers.

0:54:40 It's good for the users.

0:54:42 I think questions still to be had about exactly what the business models look

0:54:45 like, but I think that can probably also be figured out and then that

0:54:49 way it works well across the board.

0:54:51 A business-model for local-first applications

0:54:51 Yeah, I love that observation and I agree, I think some of the favorite

0:54:56 tools that I'm using, they are all like, maybe not adhering to all seven

0:55:02 local-first principles, but directionally, they are going in the direction of

0:55:06 local-first, and it's almost like a quality badge that some products associate

0:55:12 themselves with say like, Hey, we're trying to build this app local-first.

0:55:16 And I, as a user know, Oh, this means it's probably one of the

0:55:20 fastest app experiences that I get.

0:55:22 I feel much better about the data that I'm putting into it.

0:55:26 So it's just, it gives me a much better baseline in terms

0:55:29 of my expectations as a user.

0:55:32 And I'm happy for the developers building it since they probably

0:55:35 also have much more fun time.

0:55:37 So, but you've also mentioned the question marks around the

0:55:41 business model of local-first.

0:55:42 And I remember from like the good old days when you downloaded software and

0:55:48 you needed to buy it, you needed a serial number, but then there were also

0:55:53 a large group of people who would just crack software and use it illegally.

0:55:58 And I think at that point, it was really seen as a solution that SaaS would just

0:56:04 rent out your software on a monthly basis.

0:56:07 And that sort of solved, the entire pirated software problem.

0:56:12 So I'm wondering, is local-first pointing in a direction to go

0:56:16 back towards download software?

0:56:18 Hopefully.

0:56:19 Pay for that serial number, is there a best of both worlds, something that's not

0:56:24 quite you rent your software AKA cloud and has all the problems, but maybe as

0:56:30 a business, you do don't need to worry about pirated software and you get paid

0:56:36 if you choose to have a paid plan as well.

0:56:40 Do you have thoughts on what a business model in the local-first

0:56:43 first world looks like?

0:56:44 Yeah, I personally wouldn't mind going back to the model of license

0:56:48 keys and perpetual licenses.

0:56:50 I personally quite liked it, but I do totally understand that for the companies

0:56:54 making the software, like having recurring revenue is really, really nice.

0:56:58 Even besides the piracy things you mentioned.

0:57:01 And to some extent, I think there's no nothing stopping people just.

0:57:05 Doing subscription apps, if they're local-first as well, you know, just

0:57:08 the fact that we've moved some of the logic from a server backend into

0:57:12 the client doesn't stop you from being able to do a subscription.

0:57:16 We can just tell people it's SaaS and sell it in the same way.

0:57:19 And maybe that will work just fine.

0:57:21 I mean, It is true that because we have this idea of the user data ownership

0:57:27 in local-first, you can't quite hold a gun to the user's head in the same

0:57:32 way and saying like, if you don't pay your subscription, we will delete

0:57:35 all your data, which is something that cloud software can very much do.

0:57:39 And so it's possible that that means that then, you know, more people will drop

0:57:43 off and stop paying the subscription.

0:57:45 You know, you could make this.

0:57:46 the software simply not work anymore if the user hasn't paid their subscription.

0:57:50 And of course, people could go in with a hex editor and change

0:57:54 it so that it remove that check.

0:57:57 But to be honest, not many people are going to do that probably.

0:57:59 If they did, they would be in the same category as the people

0:58:01 who did, who pirated licensed keys in the old software model.

0:58:05 Like there's no way you can extract any money from them anyway.

0:58:09 Basically, it's probably not worth worrying about them too much and

0:58:12 instead focus on those users.

0:58:14 You can monetize who will pay their bills.

0:58:16 And you know, as long as a reasonable percentage of the

0:58:19 people pay, that's still fine.

0:58:21 Peter van Hardenberg likes to say that back in the day of pirated software,

0:58:25 people would worry that, you know, 95 percent of software is pirated

0:58:29 and only 5 percent of users pay.

0:58:31 But actually with freemium software, A lot of starters would be very happy with a 5

0:58:35 percent conversion rates of free to paid.

0:58:37 That's a really good conversion rate.

0:58:39 So actually if you view it through that angle, you know, just.

0:58:44 Not worrying too much about the people who are not going to pay anyway, and make

0:58:48 sure that you provide a good experience for those customers who do want to pay.

0:58:52 I think it's, it should be fine to build a solid businesses that way.

0:58:56 I agree.

0:58:57 And I'm looking forward to see which sort of models do emerge.

0:59:02 And if anything, I think the cloud has really rewarded.

0:59:07 a very small number of like huge kind of monopoly like companies.

0:59:13 And I'm kind of nostalgic about the days where you had a lot more smaller

0:59:18 software vendors who really put a lot of care into for a particular audience

0:59:23 might be a niche audience built the best possible software for them.

0:59:26 And those are then probably also the people who would pay for software.

0:59:29 So I'm optimistic and I'm looking forward to see.

0:59:32 Which sort of business models will emerge and yeah, can't

0:59:36 wait to see where this is going.

0:59:38 Yeah.

0:59:38 that's one of

0:59:39 the

0:59:39 things that makes me excited about local-first as well as hopefully it

0:59:43 should just become a lot cheaper to build and run software because cloud

0:59:47 software is just ridiculously expensive because like you need a backend team

0:59:51 and the front end team and the backend team needs to be on call 24/7 in case

0:59:55 the servers go down and then, you know, suddenly you've got a huge team and costs.

0:59:59 A lot of money just to pay all those developers.

1:00:01 And then you have to have a mainstream app for a big audience in

1:00:05 order to have a big enough market.

1:00:07 And so that then cuts out all of this kind of indie software developers

1:00:10 that you were talking about.

1:00:11 And so we're hoping with local-first software, if we can just commoditize

1:00:15 the whole backend so that app developers don't have to write their own backend.

1:00:18 All you're doing is pulling some local-first framework off the shelf.

1:00:23 And writing your custom app logic in your front end, it just becomes a

1:00:26 so much cheaper to develop the app.

1:00:28 You don't have to worry about the whole 24/7 on call rotation.

1:00:31 And then that makes it economically feasible again, to have these niche apps

1:00:35 that are built by one or two people.

1:00:37 And they only have a small customer base, but that's fine.

1:00:39 You, all you need to do is provide a decent income for those two people.

1:00:42 And then you can have these niche apps that.

1:00:44 Really perfectly serve a particular audience and just

1:00:47 do that one thing really well.

1:00:49 That's something I would, I would really like to see.

1:00:51 And we're starting to see beginnings of this, for example, like one

1:00:54 of one of our big contributors to Automerge works on an app for

1:00:59 assistant directors of movie shoots.

1:01:02 to plan their schedule of when they're going to shoot what and which actor

1:01:06 they need for which scene on which set, with which props, et cetera.

1:01:10 And, you know, it's a super niche piece of software, but I really, really

1:01:13 want him to succeed because I think it's just a great example of if we

1:01:17 can make it easy for him to build this kind of software for his particular.

1:01:22 Use case, then we can do the same thing for 10, 000 other niches as well.

1:01:27 Yeah, I fully agree.

1:01:29 This is something I'm super excited about.

1:01:31 local-first as a whole is if he goes for life and realize how little in some

1:01:38 ways software has penetrated our real.

1:01:41 Live where you interact with something and then you think

1:01:44 about, wait, we have computers, we have technologies to solve this.

1:01:48 Why hasn't it arrived in these parts of our life yet?

1:01:52 Where would make life better?

1:01:54 And I think the answer is typically incentive models of the cloud.

1:01:58 If you build something for the cloud, you build it for like a, you need to

1:02:01 build it for a huge audience, et cetera.

1:02:03 Otherwise it's not worth it.

1:02:04 Particularly if you go venture capital based.

1:02:08 So I think this is where local-first really completely flips the moth,

1:02:12 allows people who are passionate about a particular use case, a particular

1:02:17 niche to go for that niche and that you don't need to worry about reaching

1:02:23 a giant audience if you don't want to.

1:02:25 And I think local-first can really change the economics there.

1:02:28 So I'm super excited about that.

1:02:31 That's almost like a second order effect.

1:02:33 And I'm sure there will be others that I can't really think about right now.

1:02:37 But I have a gut feeling that it will be a good one.

1:02:41 So yeah, Martin, this has been a real pleasure to have you on the show today

1:02:46 and sharing all of those anecdotes, the thoughts on the, where things are

1:02:51 coming from, where things are going.

1:02:54 So do you have anything else that you want to share with the

1:02:57 audience before wrapping up?

1:02:58 Not really.

1:02:59 I'm, just very happy if people are interested in local-firsts.

1:03:04 So, I mean, thank you to you for running this podcast for helping,

1:03:08 popularize the idea further.

1:03:10 And thank you to everyone who's listening and for being interested in it.

1:03:13 And I hope the community will continue growing further as we get more people.

1:03:18 You know, just building it in the direction for what they want it to be.

1:03:23 So I think, you know, we, we can just provide a set of starting values and

1:03:27 some technical tooling, but in the end, it'll all depend on what the

1:03:32 community decides to build around it.

1:03:34 And so I'm really excited to see what will come when, as people

1:03:39 Outro

1:03:39 Awesome.

1:03:40 Yeah.

1:03:40 Whenever we do our next show together, I'm sure there will be a lot more apps

1:03:44 being built in local-first that we can already point to that did not exist today.

1:03:49 So I'm really looking forward to that.

1:03:51 Martin, thank you so much for coming on.

1:03:53 Thank you, Johannes.

1:03:54 It's been great.

1:03:55 Thank you for listening to the localfirst.fm podcast.

1:03:57 If you've enjoyed this episode and haven't done so already, please subscribe and

1:04:01 leave a review wherever you're listening.

1:04:03 Please also tell your friends about it.

1:04:05 If you think they could be interested in local-first, if you have feedback,

1:04:08 questions or ideas for the podcast, please get in touch via hello at

1:04:12 localfirst.fm or use the feedback form on our website, special thanks to Expo and

1:04:18 Crab Nebula for supporting this podcast.

1:04:20 See you next time.