localfirst.fm
All episodes
February 28, 2024

#4 – Martin Kleppmann: CRDTs, Automerge, generic syncing servers & Bluesky

#4 – Martin Kleppmann: CRDTs, Automerge, generic syncing servers & Bluesky
Sponsored byExpoCrabNebula
Show notes

Transcript

0:00:00 Intro
0:00:00 So much work in building a web app goes into reinventing this backend
0:00:04 infrastructure that every single company has to reinvent again.
0:00:07 And so if we can make the data sync protocol and the data storage on the
0:00:12 servers efficient, like loading and synchronization of large collections of
0:00:16 documents, all of that can be generic.
0:00:18 So if one person is then building a graphics app and another person is
0:00:21 building a spreadsheet and another person is building a document
0:00:24 editor, they can all use the same syncing service as the backend.
0:00:29 That I think is part of the economic value proposition of local-first software.
0:00:34 Welcome to the localfirst.fm podcast.
0:00:37 I'm your host, Johannes Schickling, and I'm a web developer, a startup founder, and
0:00:41 love the craft of software engineering.
0:00:43 For the past few years, I've been on a journey to build a modern, high quality
0:00:47 music app using web technologies.
0:00:49 And in doing so, I've been falling down the rabbit hole of local-first software.
0:00:54 This podcast is your invitation to join me on that journey.
0:00:57 In this episode, I'm speaking to Martin Kleppmann, who is one of the authors
0:01:01 of the original local-first essay.
0:01:04 Martin has been exploring local-first software in CRDTs for over 10 years, which
0:01:09 has led to the creation of Automerge, which we discuss in depth in this episode.
0:01:14 We are also exploring the ideas of a generic sync server and the
0:01:17 impact this technology could have on local-first software in the future.
0:01:21 I also have a very special announcement today as I'm co organizing the
0:01:25 world's first local-first Conference.
0:01:27 It will happen on May 30th in Berlin, and I would love to see you there in person.
0:01:32 Go ahead and grab your tickets on localfirstconf.
0:01:35 com.
0:01:35 Before getting started, also a big thank you to Expo and Crab
0:01:39 Nebula for supporting this podcast.
0:01:41 And now my interview with Martin.
0:01:45 Hello, welcome, Martin.
0:01:46 Thank you so much for coming to the podcast.
0:01:49 Hi, Johannes.
0:01:49 Thank you for having me.
0:01:51 CRDTs
0:01:51 I'm super excited to have you on the show.
0:01:54 You're obviously no stranger in the local forest world.
0:01:57 Um, but would you briefly mind introducing yourself?
0:02:01 Uh, yeah, sure.
0:02:01 So I'm just recently an associate professor at the university of Cambridge.
0:02:07 I've been at Cambridge for quite a long time, but for a long time that was
0:02:10 like on fixed term academic contracts.
0:02:12 So this is my first permanent university position, which is nice.
0:02:16 It means I can keep doing this stuff long term.
0:02:19 Um, yeah, previous to that, I did in some past life work on startups.
0:02:24 sold a startup to LinkedIn back in 2012, but then shifted over to academia.
0:02:29 Amazing.
0:02:30 and so you're one of the coauthors of the local-first paper that was
0:02:35 published on the Ink & Switch site.
0:02:38 So, I think most people are also in the local-first space are familiar
0:02:42 with that, but I'm very curious, what is your personal story behind you sort
0:02:49 of finding your way to local-first?
0:02:52 Yeah, there is, it probably starts about 2013 or so, fairly shortly after we had
0:02:57 sold the startup to LinkedIn and I was at LinkedIn, but our project got canceled.
0:03:03 And so I was kind of looking around for new things and I came across this.
0:03:08 paper from Mark Shapiro and colleagues on conflict free
0:03:12 replicated data types or CRDTs.
0:03:14 I can't remember how I come across it.
0:03:15 Maybe somebody put it on Twitter or something like that.
0:03:17 And I read this thing and I was really intrigued by it because I felt that, you
0:03:21 know, this seemed like a way of Making the software a bit less cloud dominated.
0:03:27 I had got a bit frustrated with the whole startup world.
0:03:30 You know, as I was doing web based stuff, social media stuff, it's all very much
0:03:35 like centralized services, which put all of the user's data in one big database.
0:03:41 And I was just a bit uncomfortable with that.
0:03:43 I felt like, it's not really in the user's interest.
0:03:46 Obviously it's in the company's interests to try to collect as much data as they can
0:03:50 and monetize it in whatever way they can.
0:03:52 But for users, it's not really great.
0:03:54 And so I was sort of trying to overcome my unease with this by looking at
0:03:58 technological solutions that might help.
0:04:01 And so then I came across these CRDTs, which seemed like it could be
0:04:04 a part of the answer to the problem.
0:04:06 I felt like it was a way how you could make software that would run on the users.
0:04:11 device and store the data locally on the user's device where
0:04:14 nobody can take it away from you.
0:04:16 And at the same time have all the conveniences of cloud software with
0:04:20 like real time collaboration and sync across all of your devices
0:04:23 and being able to easily share data with other users and so on.
0:04:27 So that was kind of in core of the local-first idea there already,
0:04:31 but it then took us several more years before we really were able to
0:04:35 articulate it clearly enough ourselves.
0:04:37 And so then.
0:04:39 I can't remember when the local-first paper came out.
0:04:41 Was it 2019 or something like that?
0:04:43 So yeah, about 2014 I left LinkedIn for a year.
0:04:48 I spent on sabbatical writing my book and then 2015 I joined the university
0:04:52 and started working on CRDTs myself and then started like gradually
0:04:56 building up the technical foundations.
0:04:59 It then still took us quite a long time before we had like really articulated it.
0:05:03 but all that time was gradually working towards what we now call local-first.
0:05:08 I'm curious whether there were like any particular milestones you've
0:05:12 reached during those early research years where they're like moments
0:05:16 where you thought you hit some walls and you thought this was a dead end.
0:05:20 I mean, with any kind of research, it's always like lots of little
0:05:24 dead ends and then getting out of them and trying other things again.
0:05:27 But I think that's just part of the normal process.
0:05:30 So I'm not going to pretend it was smooth in any way.
0:05:33 Obviously, there's lots of things that didn't work along the way, but also
0:05:36 most of them are sort of in retrospect, kind of don't matter too much.
0:05:40 So like, you know, we have very detailed discussions about.
0:05:44 How a particular merge behavior should work, for example.
0:05:46 So if one user makes one change to a document, another user on a different
0:05:50 device makes a different change, we need to merge those things together.
0:05:52 You know, you can have hours and hours of debate about how
0:05:55 precisely that should work.
0:05:56 But then in the end, once you've settled on an answer and That answer seems to be
0:06:00 broadly okay, then, you know, then the question just becomes uninteresting and
0:06:04 we move on to more interesting things.
0:06:06 So yeah, there's, there's lots of that kind of things along the way.
0:06:10 And like a lot of changes we made to the implementations of these things,
0:06:13 like the software evolved a lot.
0:06:16 Like my first CRDT implementation was in Ruby.
0:06:19 And then that later turned into a JavaScript implementation, which was the
0:06:23 beginnings of what is now Automerge and then that's later got ported to Rust.
0:06:27 And so now the Rust implementation is our primary one.
0:06:29 So, you know, we've really gone through three languages there and God
0:06:33 knows how many orders of magnitude improvement in performance, the early
0:06:36 versions were extremely, extremely slow, but you know, it's, it gradually
0:06:40 gets better as we keep working on it.
0:06:42 I'm really eager to dive in deeper on Automerge and hearing
0:06:45 your side of the story on how Automerge came to where it is today.
0:06:50 Before going into Automerge, Automerge is a library to deal
0:06:54 with CRDTs, but not everyone might be super familiar with CRDTs.
0:06:59 I don't think there's a better person to explain what CRDTs are than you.
0:07:04 Could you give a quick summary, and introduction to CRDTs?
0:07:08 Yeah, so the basic idea is that you've got some data on multiple
0:07:12 devices, the user on each device can independently update that data,
0:07:16 possibly while the device is offline.
0:07:18 And then at some point later, the devices sync their updates.
0:07:20 And ideally, we just want them to merge their states together in some way.
0:07:24 And CRDTs are just algorithms that perform this kind of merging, plus
0:07:29 the data synchronization and so on.
0:07:30 So the idea is that.
0:07:32 You know, often the changes made on two different devices will affect
0:07:36 different parts of a document.
0:07:37 One person is updating one item in the to do list and another person is
0:07:40 updating a different item, and so it's fairly easy to merge those together.
0:07:44 In principle, you can end up with conflict cases where, like, it's
0:07:47 a graphic software, one user makes the rectangle red, another Person
0:07:51 makes the same rectangle green.
0:07:52 Well, what do you do?
0:07:53 Well, I mean, you probably just choose one of the two and then if the user doesn't
0:07:57 like it, they can change the color again.
0:07:58 So it's algorithms just for automating that kind of thing.
0:08:02 Because what we don't want is for the user to be shown like a pop up saying,
0:08:06 Hey, this file was changed on two devices.
0:08:08 Please pick which one you want to keep and which one you want to throw away.
0:08:11 I think that would be bad.
0:08:12 And like previous versions of Apple's pages, also did
0:08:17 that kind of thing, I guess.
0:08:18 I think if I remember correctly, but fortunately now we have better algorithms
0:08:22 which, which just allow changes to be merged together with minimal ceremony.
0:08:26 So that's really all CRDTs are about.
0:08:29 A huge amount of research has gone into like figuring out how
0:08:33 to make the merge behavior good.
0:08:35 So that depending on what types of edits people make, the end result is hopefully
0:08:40 something that was more or less what they expect, what the users expected and
0:08:44 also in making these algorithms fast.
0:08:46 Because, , you can implement these algorithms in a very simple way, but the
0:08:49 simple way tends to be very inefficient.
0:08:51 And so making it so that it doesn't take too much disk space, doesn't take
0:08:55 too much memory and it's generally fast, that actually requires quite a
0:08:58 lot of sophistication on the algorithm.
0:09:01 So that's where a lot of the investment has gone over the last few years,
0:09:04 but yeah, but that's broadly what CRDTs are and Automerge is just a
0:09:08 library that implements this stuff.
0:09:09 So there are other CRDT libraries out there, but, Automerge is the
0:09:16 Automerge
0:09:16 Yeah, I think Automerge is probably one of the most advanced
0:09:20 CRDT implementations right now.
0:09:22 And as you've mentioned, you built your first versions, not in
0:09:26 Rust as it is written today, but there were predecessors to this.
0:09:31 So given that this is now such a.
0:09:33 Such a long journey.
0:09:35 I think it's, if it's fair to say, , that you've been working on this for 10
0:09:38 years, I'd be very interested in hearing your reflections on the history and
0:09:44 the process of taking Automerge from the beginnings to where it is today.
0:09:50 Yeah.
0:09:50 So when I started working on CRDTs, there was no CRDT for JSON data, for example.
0:09:55 So there were existing data types for sets and maps and counters and
0:10:00 registers and things like that.
0:10:01 So just these kind of little atomic data types, but nothing
0:10:05 that really composed them together.
0:10:07 Uh, oh, and lists as well.
0:10:08 I mentioned that there were data types for lists.
0:10:11 And so in a way, JSON is simple, you know, it's just, you can put maps
0:10:14 inside lists and maps and lists inside maps and compose them arbitrarily.
0:10:19 But there's still interesting questions you have to answer, which
0:10:22 is like, for example, what if one user deletes an object while another user
0:10:26 makes an update inside that object?
0:10:28 How do you merge those things?
0:10:29 And so one of the first research papers I wrote was an algorithm for doing
0:10:34 a CRDT for JSON data, which answered exactly this kind of questions.
0:10:38 And then Automerge started out sort of conceptually as an implementation of
0:10:43 this paper, although we ended up actually choosing different behavior for Automerge
0:10:47 than the paper chose, but you know, after examining a bunch of applications and
0:10:52 what sort of behavior they would want, we came to the conclusion that a different
0:10:55 behavior was better, but that was basically the genesis of the whole thing.
0:11:00 So I can't remember which year that JSON CRDT came.
0:11:03 paper came out, but yeah, I was working on it like in 2015, 2016 ish.
0:11:08 And then, I think it was about 2017, Peter van Hardenberg got in touch with me.
0:11:12 So I knew Peter from back in my startup days because he was running
0:11:17 the Heroku Postgres team at the time.
0:11:20 And our company, which was called Reportive, was one of the bigger customers
0:11:25 of, uh, Heroku Postgres at the time.
0:11:28 And so We had, like, talked to Peter as part of, like, just scaling our database.
0:11:33 years later, I hear from Peter again, because he had read my JSON CRDT paper
0:11:38 and went like, Hey, we want to try actually building some apps with this.
0:11:41 Have you tried actually building some apps?
0:11:43 And I went, Oh, no, no, no.
0:11:45 I just do theory.
0:11:46 You know, I just write a paper and I have this extremely janky Ruby implementation
0:11:51 that actually only does half of what the, what was says in the paper.
0:11:55 So then, , it got together with, , Peter and Ink & Switch.
0:11:58 And I think Ink & Switch was quite new still at the time.
0:12:01 And we did, , this project together in which we essentially
0:12:05 built a adjacent CRDT.
0:12:07 That actually worked in JavaScript.
0:12:09 In fact, Orion Henry wrote the first version of that and brought it to me.
0:12:13 And I went like, yeah, nice API, but no, those algorithms are totally wrong.
0:12:17 And so then we worked together to make the algorithms right as well.
0:12:21 And it was a great collaboration because you know, the Ink & Switch folks were.
0:12:25 Just much better, like API design and also UI design and general app development
0:12:32 than I was, whereas I sort of brought the like more mathematical style of
0:12:35 thinking of analyzing the algorithms and making sure that they were correct,
0:12:39 and that was just a great collaboration.
0:12:41 So yeah, we've, we first wrote this library.
0:12:43 We originally called it Tesseract, but then there was already a
0:12:46 JavaScript library of that name.
0:12:47 So we renamed it to Automerge and that name has stuck since.
0:12:51 So yeah, I think Automerge started around 2017.
0:12:54 And then a few Ink & Switch projects used it, but it was very
0:12:58 much research quality software.
0:13:00 You know, it was extremely slow.
0:13:02 It had bugs.
0:13:03 The file format was extremely inefficient.
0:13:05 So it was kind of impractical to use for most things.
0:13:10 As a vehicle for doing research, it worked quite well.
0:13:13 But then at some point, like it became clear that, okay, we
0:13:16 actually want to start building more ambitious software on it.
0:13:19 And it's not really acceptable if it takes three minutes to
0:13:22 load your document off disk.
0:13:24 So, you know, okay, we have to make the.
0:13:26 figure out a new file format to make the file smaller and, , figure out new
0:13:31 algorithms to make the whole thing faster.
0:13:34 And then also we decided that the Rust implementation would be better.
0:13:37 Um, not so much because Rust is faster than JavaScript, but rather
0:13:40 because it's more cross platform.
0:13:42 And so we can compile Rust to WebAssembly for the web, but we can
0:13:45 also compile it to native libraries for iOS and Android, for example.
0:13:49 And so Orion did a lot of work on the port to Rust.
0:13:53 Uh, again, and a few others contributed to that, and Alex
0:13:57 Good got involved with that too.
0:13:59 But then at some point, two years ago or so, we then made the call to
0:14:03 make the Rust implementation, the primary implementation of Automerge.
0:14:07 So all of that JavaScript, I had, I'd been maintaining the JavaScript implementation
0:14:10 as this research code over the years, but we decided to just completely deprecate
0:14:15 that, throw away all of my old code.
0:14:16 And I've done, actually no work on the Rust code of the implementation.
0:14:20 So that's all been done by Alex and Orion and other people now.
0:14:24 And I've just moved into more of an advisory role, which suits me really well.
0:14:28 You know, I'm very happy to be the one not writing the code.
0:14:32 Other people are much better at writing the code than I am, but I know I can
0:14:34 think about the algorithms and the protocols and the data structures.
0:14:38 And that's what I find fun.
0:14:39 And so then, About a year ago or so, we then declared
0:14:43 Automerge to be production ready.
0:14:44 So at that point, then, you know, the Rust implementation was mature and fast.
0:14:50 and we got a sponsorship thing going with GitHub sponsors, which allowed
0:14:56 people who were commercially using, or companies that were commercially using
0:14:59 Automerge, to sponsor its development.
0:15:01 And that is now supporting the work of Alex Goods, who's now
0:15:04 professionally maintaining Automerge.
0:15:06 And that is just such a good arrangement now.
0:15:07 I'm really pleased with how that's working because it means that we have
0:15:11 high quality software that's being professionally maintained, but at the
0:15:14 same time, you know, we haven't had to go out and raise venture capital, which
0:15:18 we feared that that's, you know, might be at odds with the values of local-first.
0:15:23 And so this way by Essentially bootstrapping it off of
0:15:26 the sponsorship revenue.
0:15:28 I think that aligns everybody's interests very well.
0:15:30 And so that has allowed the project to do very well.
0:15:33 That is an incredible journey.
0:15:35 And I mean, this is for an open source project.
0:15:38 Particularly, I think most people use right now, Automerge still in a JavaScript
0:15:43 context for a JavaScript library, where I think you're thinking more in terms of dog
0:15:49 years, Automerge is really a monumental project and it has come incredibly far.
0:15:54 So I'm super excited for that.
0:15:57 So where's the project today?
0:15:58 You've mentioned that it's reached production readiness
0:16:02 around about last year.
0:16:03 Does that mean it's the APIs are final, the research behind it is
0:16:08 concluded and now it's just performance optimizations or what is left to do?
0:16:13 And I just, there's so much, so much we still want to do with it.
0:16:17 So what we mean with production ready is like, there are no egregious
0:16:20 bugs that we know about and the performance is good enough that.
0:16:24 You know, it's plausibly usable in real software, which some of the
0:16:28 research code definitely was not, but it's got much, much better, but
0:16:31 in terms of features, like it, I think we've only really just started.
0:16:36 So what Automerge started with is a basic JSON model, so you can have maps.
0:16:41 Where the keys are strings and the values can be either nested maps, or they can
0:16:45 be nested lists, or arbitrary recursion of those things, or primitive values
0:16:50 like strings and numbers and booleans.
0:16:53 And that's it.
0:16:53 Then, okay, we, we added counters because actually counters are
0:16:56 actually not very useful, but everyone seems to use them for demos.
0:17:00 So we include the counters so that we can have the demo as well.
0:17:03 Then, a big thing we added was rich text.
0:17:06 So that's something that a lot of applications need is.
0:17:10 text with formatting.
0:17:11 And the first version of that is released and implemented, though the
0:17:16 first version only supported inline formatting, such as bold and italic
0:17:20 but not block elements like headings or bullet points or things like that.
0:17:24 And so there's an updated version of that coming soon, which adds
0:17:28 support for block elements too.
0:17:29 So this is now nice.
0:17:31 You can put rich text anywhere inside a document.
0:17:33 So.
0:17:33 You know, it's, if you want to make a Google Docs equivalent thing,
0:17:36 you can do that, but you could also have, for example, a vector
0:17:39 graphics software that has some rich text just inside the text boxes.
0:17:42 And the rest is a drawing consisting of like arrows and lines and
0:17:48 freeform, whatever you want.
0:17:50 And so the JSON type document model has allowed extension in those
0:17:54 directions very well, but there's so much more we still want to do.
0:17:58 So.
0:17:58 Like an obvious missing thing is undo in collaborative software is actually quite
0:18:02 subtle in terms of the behavior you want.
0:18:06 And so in particular, it's not generally the case that you want to undo the
0:18:09 most recent operation, the most recent change to the document, because the most
0:18:13 recent change to document might've been made by somebody else in a part of the
0:18:16 document that you're not looking at.
0:18:17 And so.
0:18:18 Undoing somebody else's change in a completely different part of the
0:18:20 document is definitely not what you intended when you hit command Z.
0:18:24 So actually doing undo well requires, inspecting the editing history of
0:18:30 the document, which we can do because Automerge keeps the editing history
0:18:33 anyway, but actually surfacing that and making the right APIs
0:18:37 the right underlying algorithms, that's still some work in progress.
0:18:40 Another thing that we've long Try to add as a move operation so that,
0:18:44 for example, you could reorder items in lists or if you have a, say a
0:18:49 file system tree, you could drag a directory from one location to another.
0:18:54 That is also quite subtle to implement because you have to answer
0:18:57 questions like, what happens if two users can currently move the
0:19:01 same item to two different places?
0:19:03 You don't want to duplicate it.
0:19:04 In that case, you want to just.
0:19:06 pick one of the destinations.
0:19:07 Or you get weird things where like you have A and B which are siblings and one
0:19:12 user moves A to be a child of B while concurrently another user moves B to be a
0:19:16 child of A and now if you're not careful you could end up with a loop between A
0:19:20 and B and That would be a mess as well.
0:19:22 So to move operation very carefully has to handle those kinds of cases.
0:19:26 You know, we wrote the research paper about it several years ago, but
0:19:29 actually turning that into the kind of production quality code as part of
0:19:33 Automerge is still ongoing project.
0:19:35 And so those are kind of the near term things that we want to.
0:19:39 Features, examples of features that we want to add to Automerge.
0:19:42 Other stuff we want to do better are, for example, synchronizing
0:19:45 large collections of documents.
0:19:47 So at the moment, Automerge really just deals with one document at a time.
0:19:50 But in many apps, you know, you might have a collection of 100, 000 documents and
0:19:54 most of them don't change most very much.
0:19:57 So we need a protocol for efficiently figuring out which of those many documents
0:20:00 have changed and then synchronize only those which have changed and
0:20:07 Collections vs Databases
0:20:07 So you mentioning, uh, collections and that right now Automerge is only working
0:20:13 on sort of a single document level, but you want to go further into collections.
0:20:18 So collections makes me think of databases.
0:20:21 Can you contrast a little bit of how someone who thinks about data
0:20:26 primarily in terms of databases, how your brain needs to change to think
0:20:32 primarily in terms of Automerge and how.
0:20:35 What in the future where someone uses Automerge, do they still use databases?
0:20:40 Do you think about the data that Automerge just manages sort of
0:20:44 like as an implicit database?
0:20:46 How should I think about that in the future?
0:20:48 Yeah, I think there's, there's a lot of similarities between
0:20:51 Automerge and the database.
0:20:53 And we've sort of like internally joke that, you know, we're not writing a
0:20:57 database because writing a database is a crazy thing to do that nobody should
0:21:01 like try to write their own database, but it looks like we are writing a database.
0:21:05 And shh, don't tell anybody.
0:21:07 So like, yeah, a collection of documents definitely starts smelling
0:21:11 quite a lot like a document database.
0:21:13 There's sort of differences in data model and sort of a usage
0:21:17 pattern compared to like how.
0:21:20 Mainstream databases are built, you know, you can take MongoDB or even
0:21:24 the JSON support in Postgres and they give you a JSON data model.
0:21:27 And so in that sense, it's similar ish, but they don't really have
0:21:31 the conflict resolution aspect.
0:21:32 So they assume that all of your rights go to a single leader server and that
0:21:38 server just serializes all of the updates.
0:21:40 And therefore you never end up in a situation where you have to.
0:21:43 merged to diverged versions of the document.
0:21:46 Whereas in local-first, I mean, the whole point of local-first is that you
0:21:49 have to data locally on your own device.
0:21:51 So that means you inevitably end up in having to do this
0:21:53 kind of conflict resolution.
0:21:54 So even though the data model is maybe on a high level, similar to something
0:22:00 like MongoDB, the data synchronization and the conflict resolution aspects.
0:22:04 is something that's very different from a server oriented database.
0:22:08 So you could say it's a more client oriented database where it is intended
0:22:12 to be embedded into client software.
0:22:15 And that would get us quite close to, I think, what Automerge wants to be.
0:22:19 So we have the beginnings of something like that in a library called
0:22:23 Automerge repo, which is it's sort of a wrapper library around Automerge.
0:22:28 Automerge itself is basically just an in memory data structure library.
0:22:31 It does.
0:22:32 Nothing with disk or network, it's just purely an in memory data structure, but
0:22:37 Automerge repo adds the IO layer to it.
0:22:40 And so it provides adapters for like storing data, storing documents on
0:22:44 disk and loading them again, and for synchronizing things over the network.
0:22:48 And it also manages a collection of documents.
0:22:51 And so this is how the whole thing starts looking a bit more like a database.
0:22:55 Another difference I would also note compared to something like MongoDB is
0:22:59 that a lot of these server side databases assume that a single document actually
0:23:04 doesn't get updated all that often.
0:23:06 though you know, you might do 100 writes to a document over
0:23:09 the lifetime of a document.
0:23:10 Whereas the types of Documents we're thinking about every keystroke when you're
0:23:15 writing a text is a right to the document.
0:23:17 And so you can easily accumulate hundreds of thousands of rights to a
0:23:20 document over the lifetime of a document.
0:23:22 And if you have that sort of high rate of updates, that forces entirely different
0:23:28 data structures and data formats.
0:23:30 so I think if you try to use MongoDB or Postgres, And, you know, write a
0:23:34 new version of a document on every keystroke, they would not perform very
0:23:37 well because they actually write an entire new copy of the document to disk
0:23:41 every time you update the document.
0:23:43 And, you know, that's just not going to work if you're making hundreds of
0:23:46 thousands of updates to a single document.
0:23:48 And so that's why Automerge then has got a whole bunch of clever data
0:23:53 structures and file formats in order to deal with those very frequent, very
0:23:58 frequent, but small updates to documents.
0:24:00 Automerge as an app data-layer
0:24:00 So I'm personally, as an app developer, I try to think about like, what is
0:24:06 the best foundation to build an app?
0:24:08 And so you've mentioned that the early prototypes that you've built for what
0:24:12 later became Automerge, you built with Ruby, maybe you probably built apps with
0:24:18 Rails in the past and Rails was really a great foundation to build a new app.
0:24:22 I'm wondering.
0:24:24 In the next five to 10 years, when you put on your local-first lens, like what
0:24:30 is the rails equivalent for local-first?
0:24:34 Should I think about Automerge becoming more and more like a new
0:24:40 kind of rails that's less of an app framework, but more of like a.
0:24:44 A data framework that takes over more and more app framework responsibilities.
0:24:50 Can you paint a bit of that picture for me?
0:24:52 I think the analogy is really good, actually.
0:24:54 I, yeah, I would think of Automerge maybe like as the active record component of
0:24:59 Rails or so it's the data component.
0:25:01 it's not a whole app framework by itself, but you could definitely
0:25:04 imagine building an app framework where it's an important part of it.
0:25:08 And the rest of the app framework would have to do stuff like reactivity of
0:25:12 updating the user interface in response to edits that have happened and figuring
0:25:17 out how to handle user inputs, blah, blah, blah, all that sort of things.
0:25:21 So I think that framework doesn't exist yet, but I would really love
0:25:26 to see somebody build the equivalent of rails, for local-first software.
0:25:32 So what are the missing pieces for that?
0:25:34 So you've mentioned that the way how you and Peter have met is through
0:25:39 Peter's previous work on Heroku.
0:25:42 So Heroku, I think, played a major role in making rails.
0:25:46 So.
0:25:47 Easy for developers since it's not just easy to work with it locally, but it's
0:25:50 also easy to roll it out into production.
0:25:53 So what does it mean for me right now, if I'm building my first little app, my first
0:25:58 little prototype with Automerge locally, what does it mean for me to roll that out,
0:26:03 that I can share it with my friends and use it sort of in a, in a bigger scale?
0:26:07 Yeah.
0:26:07 So at the moment It still requires a fair amount of.
0:26:12 The stuff you have to write yourself.
0:26:13 So for example, you know, we provide as part of automated repos, some
0:26:17 integrations with like React or.
0:26:19 It's failed to also as examples of how you can build use interfaces
0:26:23 on top of Automerge, but you know, it's very just basic example code.
0:26:27 I think it's not like an entire framework, but it's something that hopefully
0:26:31 people can use to start building apps.
0:26:34 Likewise for like the network synchronization.
0:26:37 We have a sync server, it's open source and quite simple, and you can just
0:26:41 deploy it yourself, but it lacks all of the features that you might want.
0:26:44 So there's no authentication, for example, which is something
0:26:48 probably most apps will want.
0:26:50 Really, we would like end to end encryption for the data synchronization
0:26:53 for many applications as well, so the server doesn't have to store
0:26:57 the plain text of your documents and a whole bunch of other things
0:27:01 related to synchronization.
0:27:03 So I think we will always want.
0:27:05 The option for people to develop these things themselves and run
0:27:08 it themselves if they want to.
0:27:10 But at the same time, I think there's a lot that could happen around having
0:27:14 it's kind of packaged up in a nicer way where maybe there's a hosted cloud
0:27:17 service that just provides a syncing service for local-first apps and if you.
0:27:23 Choose a certain framework which might be Automerge based and a certain
0:27:27 networking layer, then you can just use this synchronization service and
0:27:30 you don't have to run your own servers.
0:27:32 And that would be the sort of Heroku equivalent I would see of, of this world.
0:27:37 So I really hope somebody builds that.
0:27:39 and a part of the vision of local-first is that, you know, we'll probably
0:27:44 have to have cloud services involved in this data synchronization, but
0:27:48 if we can make the synchronization protocols an open standard.
0:27:52 Then hopefully there can be multiple different providers that can interoperate.
0:27:55 And so if it decides that one particular provider has changed their pricing
0:28:00 in a way that's too expensive or they're too unreliable or whatever.
0:28:03 You should be able to just point your app at a different provider
0:28:05 and just continue working.
0:28:07 And in some way, like Heroku had this as well, in that, you know, you didn't
0:28:10 have to write custom, you have to use custom Heroku APIs to write your
0:28:15 app, anything, you know, you just write a standard Rails app and you
0:28:18 deploy it by pushing to a Git repo.
0:28:20 And there was just a small amount of Heroku specific configuration.
0:28:23 And if you wanted to, you would always be able to take your app and run it on
0:28:26 a different hosting provider as well.
0:28:28 And so again, I think that's sort of Style I would like for local-first software
0:28:33 too, that we have this interoperability and we have multiple companies, could
0:28:38 be startups, could be big companies.
0:28:40 I don't really mind providing this kind of cloud syncing services for
0:28:44 local-first software in such a way that it can interop and you can easily
0:28:48 switch from one provider to another.
0:28:54 Thoughts on P2P
0:28:54 That sounds incredible.
0:28:56 and I'd love to love to see that.
0:28:58 It kind of makes me a bit reminiscent of the days of like
0:29:02 torrenting, et cetera, peer to peer.
0:29:04 We've talked to Peter in, uh, in a previous episode about peer to peer and
0:29:09 there's some real technical challenges that we, that need to be overcome and
0:29:14 maybe can't be overcome in the, in the shorter term, but I'm wondering.
0:29:19 How that sort of more abstract syncing service would compare to
0:29:23 some of the existing technologies.
0:29:25 I've mentioned peer to peer there because what was so interesting about
0:29:29 it is like that, it's that you formed the sort of ad hoc network where
0:29:33 people didn't, there was no server where it's something needed to be.
0:29:37 deploy to, but things just started working together.
0:29:40 So with that syncing service that you're mentioning, that could be kind
0:29:44 of a platform agnostic, would that be similar to peer to peer in that regard?
0:29:50 Or would you still need to kind of deploy a quote unquote backend
0:29:54 app to that syncing service that it actually does perform the work
0:29:58 you want to have performed for your particular local-first step?
0:30:02 I think.
0:30:03 The best results for, you know, for user quality of software would be for it to
0:30:09 use peer to peer when it's available and use a cloud service when not.
0:30:13 I think doing only peer to peer is really difficult because, for example,
0:30:17 you can only talk to another peer while it's online at the same time.
0:30:20 And if you've got two devices that are never online at the
0:30:22 same time, then you can never synchronize data with between them.
0:30:25 So.
0:30:26 That sucks pretty badly because people do just close their laptop from time to time
0:30:29 or turn off their smartphone or whatever.
0:30:32 So I think pure peer to peer just doesn't work reliably enough.
0:30:36 Plus there's all of the problems with like NAT traversal and just the networking
0:30:40 infrastructure doesn't work well enough.
0:30:42 However, when peer to peer does work, it's amazing.
0:30:44 And so if you've got two devices on the same network in the same building, it
0:30:48 seems outrageous to send all of your data via AWS US East One in Virginia,
0:30:54 if you could just send it via the local wifi from one device to another, right?
0:30:59 So then opportunistically using peer to peer when it happens to
0:31:03 be available is an amazing thing.
0:31:05 And it's, you know, it provides a lot of robustness and
0:31:08 independence from the network.
0:31:10 So that, for example, if you've got your laptop and your phone, and you're
0:31:13 in some remote location where you don't have internet access, you can still
0:31:16 sync data between the two of them.
0:31:18 And, you know, we have a sort of rudimentary version of that with
0:31:21 say, AirDrop on Apple devices, but that's like one off file transfers
0:31:25 really should be able to just do that for live synchronization as well.
0:31:29 So I feel like the combination of Cloud and peer to peer just
0:31:33 gives you capabilities that.
0:31:35 only cloud or only peer to peer doesn't.
0:31:37 And so that really seems to me like the most promising
0:31:40 direction is to combine the two.
0:31:42 And the nice thing with CRDTs is that they just don't care what
0:31:46 your networking layer is, right?
0:31:47 All you need is some way of getting some bytes from one device to another.
0:31:51 That's all they need.
0:31:52 And whether that goes via a local network or peer to peer over the internet via
0:31:56 a distributed hash table or via a cloud service or via multiple cloud services.
0:32:01 CRDT doesn't care that any communication channel will do.
0:32:06 That makes a, makes a lot of sense.
0:32:08 And this sort of hybrid nature where it optimistically uses the close peer
0:32:13 connection, where if that works, then the experience is even better, but it kind of
0:32:17 falls back to the cloud where it needs to.
0:32:20 And it also will give you some benefits maybe such as backup, et cetera.
0:32:27 Generic sync servers
0:32:27 So one thing with these cloud services.
0:32:30 Is that, you know, in the traditional way of building web apps, a lot of your
0:32:34 application logic lives in the backend.
0:32:36 You know, you have a backend database running on a server and then you wrap it
0:32:40 with some server side code written using some server side web framework, and then
0:32:46 you put it all behind a load balancer.
0:32:47 And so you've got this, all this huge infrastructure on the backend.
0:32:50 And one of the promises I see of local-firsts.
0:32:54 Is that actually because we've moved all of the interesting application
0:32:58 logic to the client app, to the end user device, the server side that remains
0:33:03 can be really simple and actually not contain any app specific code at all.
0:33:07 so my vision for these syncing services for local-first software
0:33:13 is that there's virtually no application code on the server.
0:33:16 The server is just this generic piece of software where you just
0:33:19 take it off the shelf and run it.
0:33:21 And, you know, you can just use a hosted cloud service.
0:33:24 Maybe AWS will run a local-first backend service and charge you a
0:33:29 few cents per gigabyte to use it.
0:33:31 And that would be amazing.
0:33:32 It can be, you know, this generic thing.
0:33:34 So you don't have every single app reinventing its own backend service.
0:33:38 You know, so much work in building a web app goes into reinventing this
0:33:43 backend infrastructure that every single company has to reinvent again.
0:33:47 And so if we can make the data sync protocol and the data storage on the
0:33:51 servers efficient, like loading and synchronization of large collections of
0:33:55 documents, all of that can be generic.
0:33:57 So if one person is then building a graphics app and another person
0:34:00 is building a spreadsheet and another person is building a
0:34:03 document editor, they can all use.
0:34:05 The same syncing service as the backend that I think is part of the economic
0:34:10 value proposition of, local-first software is that actually, you know,
0:34:14 we can just save ourselves a huge amount of software engineering work
0:34:17 by making these backends generic.
0:34:20 I couldn't agree more with that vision.
0:34:22 I totally want that.
0:34:24 Do you think Automerge will be the foundation for that?
0:34:27 Is there something more generic, something more abstract of like an open syncing
0:34:33 protocol, whatever that might be?
0:34:35 and Automerge would be one of multiple that implement compatibility with that.
0:34:41 If someone is interested in that vision right now, is there anything that
0:34:45 someone can take a look at and maybe deploy an early version of that already?
0:34:50 Yeah, I think Automerge is trying to be a solution for that, and I would
0:34:55 love for the Automerge protocols to be open standards one day.
0:35:00 I think, you know, we've thought about engaging with the IETF, for example,
0:35:04 for standardization, although I think right now is just too early because
0:35:08 it's all still very much work in progress and it hasn't settled enough
0:35:11 yet to be ready for standardization.
0:35:13 But in the long term, that's something we would definitely like.
0:35:15 And we would like there to be multiple interoperable implementations that
0:35:19 can all talk to each other and which are compatible with each other.
0:35:22 So yes, whether that ends up being exactly the Automerge wire
0:35:25 protocol or something a bit more abstract, I I'm not entirely sure.
0:35:29 I mean, other people are working on similar things.
0:35:32 So one project that comes to mind is braid, for example, which they are
0:35:36 engaging with the IETF and they're trying to build some standards or extensions
0:35:42 to HTTP to enable data synchronization.
0:35:45 And they're trying to do it in a way which is not specific to any particular CRDT
0:35:49 library or even using other approaches such as operational transformation.
0:35:53 So they're trying to be generic.
0:35:55 What I'm not sure yet is whether you can be generic and still
0:35:58 get good enough performance.
0:35:59 that's a trade off there.
0:36:00 So in the automotive sync protocol, we're able to make a lot of optimizations.
0:36:04 because we know a lot about the types of data and how they're exchanged and
0:36:09 we can control the data compression and the data formats and so on.
0:36:13 Because we control the stack, we can do a lot of interesting optimizations
0:36:18 there, which are more difficult if you have a generic protocol.
0:36:21 So I think that waits to be, we'll have to wait and see how
0:36:26 that develops in the future.
0:36:27 And I certainly believe some kind of protocol will become a widely
0:36:31 used open standard for synchronous for data sync in local-first apps.
0:36:36 It might be Automerge or it might be something else, but that's
0:36:38 generally the direction we're heading.
0:36:41 I'm really looking forward to that point.
0:36:43 I mean, local-first already today.
0:36:47 Is providing so much value, both to developers and to end users by
0:36:52 simplifying the developer experience by making apps faster, giving
0:36:56 you data ownership, et cetera.
0:36:58 But I think once we've reached that point where there's a more.
0:37:02 General purpose, generic syncing service that works possibly also across apps
0:37:07 that people can put a little node of that, for example, on a Raspberry Pi
0:37:12 running next to their home router.
0:37:14 I'm really looking forward to that.
0:37:16 So I can't wait for that.
0:37:17 Looking forward to maybe having you back in a year from now to hear
0:37:21 some more progress update where things add in that regard, but I'm
0:37:25 really looking forward to that.
0:37:26 Yeah, it's good to be very exciting to see what people build.
0:37:30 Bluesky
0:37:30 So besides your work on Automerge, you're also involved in the new project called
0:37:37 Bluesky, which came out of Twitter or now called X as I think was sort of also like
0:37:44 a research project inside of Twitter.
0:37:46 And that was now took its own path.
0:37:49 So, and you're involved there as an advisor.
0:37:52 I'm wondering whether there's any connection to your interest
0:37:56 in local-first as well, or whether those are separate paths.
0:38:00 That is a sort of, um, high level connection.
0:38:03 I would say, you know, Bluesky is a social network it's decentralized and it aims
0:38:08 to provide a bunch of features which just don't exist on like Twitter and
0:38:14 Facebook and a centralized social network.
0:38:16 So in particular, it's built on an open protocol and there are multiple
0:38:20 different implementations, interoperable implementations of that protocol.
0:38:24 And moreover, multiple hosting providers that can run
0:38:29 different parts of the system.
0:38:30 And Bluesky is designed in such a way that it's very easy to move your account
0:38:35 from one provider to another, for example.
0:38:37 So for example, if you don't agree with one provider's moderation policies,
0:38:42 it's fine, you can go to a different one, who's more aligned with you, or
0:38:45 you could even run your own if you're technically, enthusiastic enough.
0:38:49 So on a technical level, a lot of the implementation of.
0:38:52 Bluesky looks quite different from something like Automerge.
0:38:55 There's no CRDTs in Bluesky, for example, but the sort of philosophy and the
0:38:59 values that it embeds in the software are actually quite similar to local-first.
0:39:04 This idea that users should control their own data, you know, you should
0:39:09 always be able to have a copy of your own data that you can just take with
0:39:12 you or move to a different provider.
0:39:14 That concept is.
0:39:15 Exists very much across both local-first and Bluesky in the case of Bluesky, of
0:39:20 course, you know, it's a social network.
0:39:21 So the entire social network consists of the data from many different people, the
0:39:25 posts, the likes, the follows and so on.
0:39:27 But the way it works is that all of the data from a particular user goes
0:39:31 into a repository, which you can think of a bit like as a git repository.
0:39:35 And so every post that you make, every user you follow, every like you make.
0:39:41 Every user action of your own goes into your own repository, and that is your
0:39:45 own, and you can download a copy of it, and on the server, it's literally just
0:39:48 a SQLite database, there's a separate SQLite database for every single user,
0:39:52 and you can just get a copy of it, and even if your provider just suddenly
0:39:55 disappears, you can upload a copy of that.
0:39:58 To a different provider, change your user ID to point to the new provider
0:40:02 and everything just continues working.
0:40:03 And so that idea of having easy interoperability and easy migration
0:40:08 paths from one provider to another, that's something that I think both
0:40:13 Bluesky and local-first share.
0:40:15 But then the, otherwise the implementations end up being different.
0:40:18 Like it doesn't really make sense to have a local-first social network, because for
0:40:21 example, working offline makes sense if you're talking about a document editor.
0:40:25 It doesn't really make sense in a social network because the whole point
0:40:28 is communicating with other people so that the offline aspects, for example,
0:40:31 don't really feature in Bluesky, but sort of the data ownership aspects do.
0:40:37 A social network with local-first approach
0:40:37 I agree that there is a big difference between a social network like Bluesky.
0:40:42 And more like productivity or personal apps, I'm still curious,
0:40:47 given that they share a bunch of similar values and some technical
0:40:51 similarities to better understand what if you were to try to build Bluesky
0:40:57 with a more local-first approach.
0:40:59 There's a few technologies that leverage syncing behavior for
0:41:02 SQLite or maybe replacing SQLite with Automerge just in theory.
0:41:08 I'd be very curious to understand, is there a certain impedance mismatch
0:41:13 that you'd be running into by trying to build something like a social
0:41:17 network with a local-first approach?
0:41:20 I'd be curious to understand where you really run into troubles there.
0:41:24 Yeah, so the data for one individual user, you could easily put in
0:41:28 an Automerge document just as well as you put it in SQLite.
0:41:32 I think that that would make fairly little difference that you
0:41:34 could certainly use Automerge to synchronize the data for a given user.
0:41:38 What's different in a social network is that you have these global
0:41:42 views, which are aggregated over everybody, which is just not something
0:41:46 that exists in a document editor.
0:41:47 So like in a social network, you know, want to know all of
0:41:49 the likes on a particular post.
0:41:51 And if each user writes their like to their own repository, that means you
0:41:56 have to index all of the repositories, look for all of the repositories that
0:41:59 contain a like of a particular content, piece of content, and then add them up.
0:42:02 And that gives you your number of likes.
0:42:04 Or if you want to get all of the replies on a particular thread, again, you
0:42:07 have to look at all of the posts that have been made by any user anywhere
0:42:10 in the network and find all the sign reply to a particular piece of content.
0:42:14 That just requires this kind of global view of everything, if you want to do
0:42:18 it properly, you can kind of do it in a somewhat local version, which is kind of
0:42:24 what ActivityPub and Mastodon try to do.
0:42:26 So there's no global index in with Mastodon.
0:42:29 There's, you know, no, nobody really maintains a copy of the entire network,
0:42:33 but if user A replies to user B, then the User A's server sends a notification
0:42:39 to user B's server, and therefore user B's server finds out about this reply,
0:42:43 just adds it to its local database.
0:42:45 But that way you can end up with a problem of different servers seeing
0:42:49 different reply threads, because not every reply is notified to every server.
0:42:54 And so then you get Weird inconsistencies are depending on which server you're on.
0:42:59 You see a different set of replies to a particular post, which is a
0:43:02 bit strange, but that's just a part of the way that Mastodon works.
0:43:06 And that's something we try to avoid in Bluesky by instead saying,
0:43:10 okay, like the individual repos is just a single user's data.
0:43:14 And then in order to do something like a reply thread, Actually, we have a big
0:43:18 indexing service that works a bit like a web search engine, which crawls the
0:43:22 content of all of the individual user repositories and aggregates it all.
0:43:26 And assembles the reply threads.
0:43:28 And so that's something where there's no equivalent to that in local-first
0:43:31 software, I think, because that's just something that like document editing
0:43:35 style apps just don't need to do.
0:43:37 They just don't need to actually do aggregations across many apps.
0:43:40 I would say that maybe an exception to that is if you want to do search across
0:43:44 many documents, for example, in that case, you do need to build a search index.
0:43:48 But it's still a search index containing only the documents for a particular
0:43:52 user, or maybe all of the documents for a particular company, but it's not all
0:43:56 of the documents in the entire world.
0:43:57 That makes a lot of sense.
0:43:59 And I think it's sort of intuitive where like local-first starts out really dense
0:44:03 about like your own documents, maybe the documents just on your other device
0:44:07 or on the device of a friend of yours.
0:44:10 So the network, the suspending is like still pretty dense and this is what
0:44:15 makes all of those technologies work almost trivially, but the more you go
0:44:20 global with this to sort of like social network level, this is where that, uh,
0:44:25 is really put, put to the test and it's probably not the best starting point
0:44:30 that being said, I think this might still also be an interesting project
0:44:33 for some, some folks who might want to rebuild an app in a local-first way, but
0:44:41 there might still be some more global nature to some parts of the data that
0:44:45 maybe could be complimented in some way.
0:44:48 Maybe there's some new architectural patterns that are emerging.
0:44:52 for Overtone, for example, I'm trying to build the app in a local-first way,
0:44:57 where really like all of the, your music metadata and actually your app.
0:45:01 Your music data is locally available if possible, but music as such has
0:45:08 also a very global aspect to it right in the world of Spotify, you have
0:45:13 practically like infinite amounts of music that you can't just like all.
0:45:17 locally download there's too much and also other people have other kinds of music.
0:45:23 So I'm also trying to explore sort of hybrid solutions there, which are
0:45:28 really interesting design challenges.
0:45:30 I'm eager to share more of that on a separate occasion.
0:45:34 And you've actually already provided me some great feedback and some
0:45:37 personal conversations before.
0:45:39 So, yeah, this is a really interesting case study in a.
0:45:42 And I love exploring pushing local-first a little bit to its
0:45:46 limits through various app use cases.
0:45:50 So your involvement in Bluesky is a very interesting, at least
0:45:54 theoretical case study at this point.
0:45:56 So you've mentioning working offline for Bluesky.
0:46:01 And that it might be not the primary use case.
0:46:05 I want to use this as a segue, as I see a little bit of confusion sometimes
0:46:09 on Twitter, where people synonymously talk about local-first and offline
0:46:15 first, and there is a difference and I want to share a little bit
0:46:20 more broadly what that difference is, what is a offline-first app?
0:46:24 What is a local-first app?
0:46:26 Where are they different?
0:46:27 So maybe you can share your perspective on that topic.
0:46:31 Yeah, I would say that local-first includes offline first, but it tries
0:46:34 to be a lot more than that as well.
0:46:37 So the term offline first existed long before local-first, and
0:46:40 obviously we were aware of it.
0:46:41 And in fact, we modeled the term local-first after offline first to some
0:46:46 degree, because we thought it was a good term, and it captured something that we
0:46:50 wanted, but it was not really sufficient.
0:46:53 Because yes, having users being able to work offline is,
0:46:55 it's Obviously a good idea.
0:46:57 It seems ridiculous if people can't work offline, but we wanted to also
0:47:02 capture this idea of personal data ownership so that the data is yours
0:47:07 and it can't be taken away from you.
0:47:09 So in particular, for example, if there's some software that Stops working.
0:47:14 If the company that made the software goes out of business, then I would
0:47:18 argue that's not local-first.
0:47:19 So it could be offline first.
0:47:21 So it could be that, you know, it's a nice Google Docs style document
0:47:25 editor just take Google Docs as an example, like, okay, you know, it.
0:47:29 It works fine.
0:47:30 You can even, if you choose the right settings, make it work
0:47:33 offline, and you can, you can edit your docs in whatever way you want.
0:47:37 But if Google decides to just discontinue the service, hypothetically, or if
0:47:42 Google just decides to block your account because some automated system has flagged
0:47:46 you as violating the terms of service, whether you did or not doesn't matter.
0:47:50 You basically have no recourse.
0:47:52 And at that point, you're just locked out and you lose all of your data.
0:47:54 And so The fact that the app allowed you to work offline is kind of beside
0:48:00 the point then because you still don't have ownership of the data.
0:48:03 And so it's that, this idea that you should not, never be
0:48:07 locked out of your own data.
0:48:09 That's really something that we wanted to capture in the idea of local-first.
0:48:13 And so now if you can, can't be locked out of your data, that
0:48:16 kind of implies that you must have the data on your own device.
0:48:19 Which then also implies that you can probably edit it offline, because if
0:48:23 you've got it locally anyway, then why not just enable offline editing?
0:48:28 But the kind of the chain of reasoning goes in a different direction.
0:48:31 We would start with the data ownership and then offline editing
0:48:36 Local-first vs Offline-first
0:48:36 That makes a lot of sense, and I think that makes it really clear.
0:48:40 I see a lot of people referring to offline first, almost synonymously as
0:48:44 to some glorified version of aggressive caching, but the way how you lined it
0:48:50 out here makes that a lot more clear.
0:48:52 And I suppose this is not just having access.
0:48:54 To some form of the data that you can like download a CSV from all of your user
0:49:00 data, but that the software is actually still fully functional or as functional
0:49:05 as somehow possible, even in the worst case where the folks who are building the
0:49:10 software are no longer able to work on it.
0:49:13 And to really provide a better alternative to SaaS software X shuts
0:49:18 down and the entire app is just.
0:49:22 It's gone with probably all of your data.
0:49:24 So I think that's a really clear alternative.
0:49:28 Yeah, exactly.
0:49:29 Like I, you know, you do get this thing all the time when some SaaS, startup
0:49:33 shuts down and they give you two weeks to download a zip file of JSON.
0:49:39 You know, what can you do with that zip file of JSON?
0:49:41 You can't re upload it into any other software.
0:49:43 So basically it's just big fat middle finger to the users.
0:49:46 So really local-first is an attempt to overcome that in a way that,, you
0:49:52 know, at the very least, you know, for example, if the software can operate
0:49:55 peer to peer, that could mean then at least you have a peer to peer fallback.
0:49:59 So even if all of the cloud services go away, it could still operate.
0:50:02 Or if it uses a backend service that's interoperable, so you can
0:50:06 switch it to a different provider.
0:50:07 That means then you could still use the software that, you know, maybe you
0:50:11 purchase a license to the software in sort of the traditional non subscription type
0:50:16 business model, and then you could use it in perpetuity, perhaps by pointing it at a
0:50:21 different syncing backend, or in the worst case, running your own syncing backend,
0:50:24 if you really must, but ideally just switching it over to a different provider.
0:50:28 And I'm hoping that's like the local-first term should , try to encapsulate
0:50:35 What does local-first need to really succeed
0:50:35 I fully agree.
0:50:36 I'm curious now that you've been thinking about local-first now for more than 10
0:50:42 years, and we've come really far in that period of time when it comes to CRDTs
0:50:49 and Automerge is production ready to use.
0:50:52 At the same time, given the ambitions that you've outlined for, it feels
0:50:57 like we're just getting started.
0:50:59 I do think that already is a good time to really switch your default instead
0:51:05 of going cloud first, go local-first for app use cases where it's possible.
0:51:10 But I think it's still very much the minority of developers.
0:51:14 Who built this way.
0:51:16 And given that you've seen such a broad spectrum of different data
0:51:20 architectures that you've also outlined brilliantly in the book, Data Intensive
0:51:24 Applications, I'm curious what you see as things that still hold back
0:51:32 local-first to become more mainstream.
0:51:34 Is it just a matter of time that there's more progress around
0:51:38 Automerge around other technologies?
0:51:41 Are there some other things that you would like to see?
0:51:44 Yeah, I mean, there's, it's such a big conceptual shift, I think, which is a
0:51:48 challenge, you know, because there's a huge amount of say, educational
0:51:53 materials on how to build web apps, you know, entire university courses
0:51:57 are built around the idea of teaching people how to do this thing, coding boot
0:52:01 camps, documentation for huge amount of software projects, books, videos, you
0:52:07 name it, you know, everything is that there's just so much infrastructure
0:52:10 on teaching people how to build it.
0:52:13 Apps in the centralized cloud way and local-first is just much newer.
0:52:17 And so it hasn't had the benefits of decades of investment.
0:52:21 Moreover, you know, there's the cloud providers have a strong
0:52:23 commercial incentive to produce good quality documentation
0:52:26 on how to use their services.
0:52:28 So it's not surprising that there's good documentation available for those things.
0:52:32 And I'm hoping that at some point there will be big companies built
0:52:36 on the local-first paradigm as well, which then are similarly able to.
0:52:40 Fund the development of this sort of documentation and learning
0:52:44 materials and so on, but it's just going to take a while.
0:52:47 So I would see that as probably one of the biggest challenges.
0:52:51 It's just a new way of thinking and people are not familiar with it.
0:52:55 I think once people get it, then a lot of people seem to get excited
0:53:00 about it and buy into it as well.
0:53:02 And, you know, sometimes there's.
0:53:04 There's concerns that, you know, this is not for all apps.
0:53:07 And I'm the first to acknowledge, yes, local-first is not for every single app.
0:53:10 There's some apps which are best to build in a sort of centralized cloud way.
0:53:15 That's totally fine.
0:53:16 So I think part of it is also helping people understand for which
0:53:19 types of apps would you pick a local-first approach versus for which
0:53:22 do you pick a centralized approach.
0:53:24 And then of course, like just the general ecosystem needs, needs a lot more work.
0:53:29 So, you know, the software libraries that we use, things like Automerge
0:53:33 are They're pretty robust already, but it's still fairly new software compared
0:53:37 to, you know, a web framework that has been around for 20 years or more.
0:53:42 Uh, so one thing that I find encouraging is just within the
0:53:46 last year or so, it seems that.
0:53:48 A whole bunch of startups have started using the local-first term just on
0:53:54 their product marketing pages as just something they assume readers
0:53:59 of the page will be familiar with.
0:54:01 And that I find very encouraging.
0:54:02 It's, it sort of shows that, you know, people are buying into the idea
0:54:06 that enough that they are willing to, you know, have their product
0:54:09 foundation on it and their marketing around it, explaining to users why
0:54:14 it's valuable to have local-first.
0:54:16 And I think this is the way it will succeed.
0:54:18 You know, it's the local-first will succeed only if many, many people in
0:54:21 many, many different companies are able to use it to their advantage in
0:54:25 order to provide a better experience to their users and their customers.
0:54:29 And Build sustainable businesses on top of the idea and so on.
0:54:32 So It has to work for everybody.
0:54:35 And I think it will work for everybody because it's, you know, it's a win win.
0:54:38 it's good for the app developers.
0:54:40 It's good for the users.
0:54:42 I think questions still to be had about exactly what the business models look
0:54:45 like, but I think that can probably also be figured out and then that
0:54:49 way it works well across the board.
0:54:51 A business-model for local-first applications
0:54:51 Yeah, I love that observation and I agree, I think some of the favorite
0:54:56 tools that I'm using, they are all like, maybe not adhering to all seven
0:55:02 local-first principles, but directionally, they are going in the direction of
0:55:06 local-first, and it's almost like a quality badge that some products associate
0:55:12 themselves with say like, Hey, we're trying to build this app local-first.
0:55:16 And I, as a user know, Oh, this means it's probably one of the
0:55:20 fastest app experiences that I get.
0:55:22 I feel much better about the data that I'm putting into it.
0:55:26 So it's just, it gives me a much better baseline in terms
0:55:29 of my expectations as a user.
0:55:32 And I'm happy for the developers building it since they probably
0:55:35 also have much more fun time.
0:55:37 So, but you've also mentioned the question marks around the
0:55:41 business model of local-first.
0:55:42 And I remember from like the good old days when you downloaded software and
0:55:48 you needed to buy it, you needed a serial number, but then there were also
0:55:53 a large group of people who would just crack software and use it illegally.
0:55:58 And I think at that point, it was really seen as a solution that SaaS would just
0:56:04 rent out your software on a monthly basis.
0:56:07 And that sort of solved, the entire pirated software problem.
0:56:12 So I'm wondering, is local-first pointing in a direction to go
0:56:16 back towards download software?
0:56:18 Hopefully.
0:56:19 Pay for that serial number, is there a best of both worlds, something that's not
0:56:24 quite you rent your software AKA cloud and has all the problems, but maybe as
0:56:30 a business, you do don't need to worry about pirated software and you get paid
0:56:36 if you choose to have a paid plan as well.
0:56:40 Do you have thoughts on what a business model in the local-first
0:56:43 first world looks like?
0:56:44 Yeah, I personally wouldn't mind going back to the model of license
0:56:48 keys and perpetual licenses.
0:56:50 I personally quite liked it, but I do totally understand that for the companies
0:56:54 making the software, like having recurring revenue is really, really nice.
0:56:58 Even besides the piracy things you mentioned.
0:57:01 And to some extent, I think there's no nothing stopping people just.
0:57:05 Doing subscription apps, if they're local-first as well, you know, just
0:57:08 the fact that we've moved some of the logic from a server backend into
0:57:12 the client doesn't stop you from being able to do a subscription.
0:57:16 We can just tell people it's SaaS and sell it in the same way.
0:57:19 And maybe that will work just fine.
0:57:21 I mean, It is true that because we have this idea of the user data ownership
0:57:27 in local-first, you can't quite hold a gun to the user's head in the same
0:57:32 way and saying like, if you don't pay your subscription, we will delete
0:57:35 all your data, which is something that cloud software can very much do.
0:57:39 And so it's possible that that means that then, you know, more people will drop
0:57:43 off and stop paying the subscription.
0:57:45 You know, you could make this.
0:57:46 the software simply not work anymore if the user hasn't paid their subscription.
0:57:50 And of course, people could go in with a hex editor and change
0:57:54 it so that it remove that check.
0:57:57 But to be honest, not many people are going to do that probably.
0:57:59 If they did, they would be in the same category as the people
0:58:01 who did, who pirated licensed keys in the old software model.
0:58:05 Like there's no way you can extract any money from them anyway.
0:58:09 Basically, it's probably not worth worrying about them too much and
0:58:12 instead focus on those users.
0:58:14 You can monetize who will pay their bills.
0:58:16 And you know, as long as a reasonable percentage of the
0:58:19 people pay, that's still fine.
0:58:21 Peter van Hardenberg likes to say that back in the day of pirated software,
0:58:25 people would worry that, you know, 95 percent of software is pirated
0:58:29 and only 5 percent of users pay.
0:58:31 But actually with freemium software, A lot of starters would be very happy with a 5
0:58:35 percent conversion rates of free to paid.
0:58:37 That's a really good conversion rate.
0:58:39 So actually if you view it through that angle, you know, just.
0:58:44 Not worrying too much about the people who are not going to pay anyway, and make
0:58:48 sure that you provide a good experience for those customers who do want to pay.
0:58:52 I think it's, it should be fine to build a solid businesses that way.
0:58:56 I agree.
0:58:57 And I'm looking forward to see which sort of models do emerge.
0:59:02 And if anything, I think the cloud has really rewarded.
0:59:07 a very small number of like huge kind of monopoly like companies.
0:59:13 And I'm kind of nostalgic about the days where you had a lot more smaller
0:59:18 software vendors who really put a lot of care into for a particular audience
0:59:23 might be a niche audience built the best possible software for them.
0:59:26 And those are then probably also the people who would pay for software.
0:59:29 So I'm optimistic and I'm looking forward to see.
0:59:32 Which sort of business models will emerge and yeah, can't
0:59:36 wait to see where this is going.
0:59:38 Yeah.
0:59:38 that's one of
0:59:39 the
0:59:39 things that makes me excited about local-first as well as hopefully it
0:59:43 should just become a lot cheaper to build and run software because cloud
0:59:47 software is just ridiculously expensive because like you need a backend team
0:59:51 and the front end team and the backend team needs to be on call 24/7 in case
0:59:55 the servers go down and then, you know, suddenly you've got a huge team and costs.
0:59:59 A lot of money just to pay all those developers.
1:00:01 And then you have to have a mainstream app for a big audience in
1:00:05 order to have a big enough market.
1:00:07 And so that then cuts out all of this kind of indie software developers
1:00:10 that you were talking about.
1:00:11 And so we're hoping with local-first software, if we can just commoditize
1:00:15 the whole backend so that app developers don't have to write their own backend.
1:00:18 All you're doing is pulling some local-first framework off the shelf.
1:00:23 And writing your custom app logic in your front end, it just becomes a
1:00:26 so much cheaper to develop the app.
1:00:28 You don't have to worry about the whole 24/7 on call rotation.
1:00:31 And then that makes it economically feasible again, to have these niche apps
1:00:35 that are built by one or two people.
1:00:37 And they only have a small customer base, but that's fine.
1:00:39 You, all you need to do is provide a decent income for those two people.
1:00:42 And then you can have these niche apps that.
1:00:44 Really perfectly serve a particular audience and just
1:00:47 do that one thing really well.
1:00:49 That's something I would, I would really like to see.
1:00:51 And we're starting to see beginnings of this, for example, like one
1:00:54 of one of our big contributors to Automerge works on an app for
1:00:59 assistant directors of movie shoots.
1:01:02 to plan their schedule of when they're going to shoot what and which actor
1:01:06 they need for which scene on which set, with which props, et cetera.
1:01:10 And, you know, it's a super niche piece of software, but I really, really
1:01:13 want him to succeed because I think it's just a great example of if we
1:01:17 can make it easy for him to build this kind of software for his particular.
1:01:22 Use case, then we can do the same thing for 10, 000 other niches as well.
1:01:27 Yeah, I fully agree.
1:01:29 This is something I'm super excited about.
1:01:31 local-first as a whole is if he goes for life and realize how little in some
1:01:38 ways software has penetrated our real.
1:01:41 Live where you interact with something and then you think
1:01:44 about, wait, we have computers, we have technologies to solve this.
1:01:48 Why hasn't it arrived in these parts of our life yet?
1:01:52 Where would make life better?
1:01:54 And I think the answer is typically incentive models of the cloud.
1:01:58 If you build something for the cloud, you build it for like a, you need to
1:02:01 build it for a huge audience, et cetera.
1:02:03 Otherwise it's not worth it.
1:02:04 Particularly if you go venture capital based.
1:02:08 So I think this is where local-first really completely flips the moth,
1:02:12 allows people who are passionate about a particular use case, a particular
1:02:17 niche to go for that niche and that you don't need to worry about reaching
1:02:23 a giant audience if you don't want to.
1:02:25 And I think local-first can really change the economics there.
1:02:28 So I'm super excited about that.
1:02:31 That's almost like a second order effect.
1:02:33 And I'm sure there will be others that I can't really think about right now.
1:02:37 But I have a gut feeling that it will be a good one.
1:02:41 So yeah, Martin, this has been a real pleasure to have you on the show today
1:02:46 and sharing all of those anecdotes, the thoughts on the, where things are
1:02:51 coming from, where things are going.
1:02:54 So do you have anything else that you want to share with the
1:02:57 audience before wrapping up?
1:02:58 Not really.
1:02:59 I'm, just very happy if people are interested in local-firsts.
1:03:04 So, I mean, thank you to you for running this podcast for helping,
1:03:08 popularize the idea further.
1:03:10 And thank you to everyone who's listening and for being interested in it.
1:03:13 And I hope the community will continue growing further as we get more people.
1:03:18 You know, just building it in the direction for what they want it to be.
1:03:23 So I think, you know, we, we can just provide a set of starting values and
1:03:27 some technical tooling, but in the end, it'll all depend on what the
1:03:32 community decides to build around it.
1:03:34 And so I'm really excited to see what will come when, as people
1:03:39 Outro
1:03:39 Awesome.
1:03:40 Yeah.
1:03:40 Whenever we do our next show together, I'm sure there will be a lot more apps
1:03:44 being built in local-first that we can already point to that did not exist today.
1:03:49 So I'm really looking forward to that.
1:03:51 Martin, thank you so much for coming on.
1:03:53 Thank you, Johannes.
1:03:54 It's been great.
1:03:55 Thank you for listening to the localfirst.fm podcast.
1:03:57 If you've enjoyed this episode and haven't done so already, please subscribe and
1:04:01 leave a review wherever you're listening.
1:04:03 Please also tell your friends about it.
1:04:05 If you think they could be interested in local-first, if you have feedback,
1:04:08 questions or ideas for the podcast, please get in touch via hello at
1:04:12 localfirst.fm or use the feedback form on our website, special thanks to Expo and
1:04:18 Crab Nebula for supporting this podcast.
1:04:20 See you next time.