1 00:00:00,000 --> 00:00:02,669 And this also feeds into features that you might want to give to your 2 00:00:02,669 --> 00:00:04,380 users, especially in productivity apps. 3 00:00:04,400 --> 00:00:07,540 You want to have that change history where you can see what was everyone doing. 4 00:00:07,870 --> 00:00:09,300 You also want to have undo's. 5 00:00:09,640 --> 00:00:13,610 basically what you can do for undo is when you create this action or this mutation 6 00:00:13,610 --> 00:00:17,750 describing, the high level intent, you can also tag along with it, a mutation saying, 7 00:00:17,760 --> 00:00:19,650 here's how to undo this operation later. 8 00:00:19,945 --> 00:00:22,815 And then you store that somewhere and then you just have a queue somewhere in 9 00:00:22,815 --> 00:00:24,225 your app that's like the action queue. 10 00:00:24,365 --> 00:00:27,475 You can go through that and undo things in a nice way. 11 00:00:27,776 --> 00:00:31,032 Hopefully, users will be happier with this than if you just, you 12 00:00:31,032 --> 00:00:34,682 know, revert states exactly, ignoring collaborators updates. 13 00:00:35,215 --> 00:00:37,315 Welcome to the local-first FM podcast. 14 00:00:37,505 --> 00:00:40,465 I'm your host, Johannes Schickling, and I'm a web developer, a 15 00:00:40,465 --> 00:00:43,435 startup founder, and love the craft of software engineering. 16 00:00:43,905 --> 00:00:47,735 For the past few years, I've been on a journey to build a modern, high quality 17 00:00:47,735 --> 00:00:49,505 music app using web technologies. 18 00:00:49,885 --> 00:00:53,665 And in doing so, I've been falling down the rabbit hole of local-first software. 19 00:00:54,237 --> 00:00:57,277 This podcast is your invitation to join me on that journey. 20 00:00:57,680 --> 00:01:01,130 In this episode, I'm speaking to Matthew Weidner, a computer science 21 00:01:01,150 --> 00:01:05,550 PhD student at Carnegie Mellon University, focusing on distributed 22 00:01:05,550 --> 00:01:07,550 systems and local-first software. 23 00:01:07,907 --> 00:01:12,277 Matthew has recently published an extensive blog post about architectures 24 00:01:12,277 --> 00:01:16,847 for central server collaboration, which we explore in depth in this conversation, 25 00:01:17,117 --> 00:01:21,147 comparing different approaches, such as CRDTs and event sourcing. 26 00:01:21,418 --> 00:01:24,998 Before getting started, also a big thank you to Rocicorp and 27 00:01:24,998 --> 00:01:26,878 Expo for supporting this podcast. 28 00:01:27,278 --> 00:01:29,148 And now my interview with Matthew. 29 00:01:29,901 --> 00:01:30,641 Hey, Matthew. 30 00:01:30,861 --> 00:01:32,531 Thank you so much for coming to the show. 31 00:01:32,541 --> 00:01:33,181 How are you doing? 32 00:01:33,771 --> 00:01:34,211 I'm good. 33 00:01:34,251 --> 00:01:34,461 Yeah. 34 00:01:34,461 --> 00:01:35,211 Thanks for inviting me. 35 00:01:36,141 --> 00:01:36,511 Yeah. 36 00:01:36,511 --> 00:01:38,401 Super excited to, to have you here. 37 00:01:38,411 --> 00:01:44,171 I think, our shared friend, Geoffrey Litt introduced us and he and, Matt Wondlaw 38 00:01:44,181 --> 00:01:48,761 and a few others have, when you were writing this blog post, the architectures 39 00:01:48,771 --> 00:01:53,111 for central collaboration, all of my friends shared this blog post with me. 40 00:01:53,476 --> 00:01:57,766 And it has since, like, served as a really, really reliable and 41 00:01:57,776 --> 00:02:03,506 good foundation to just provide an orientation around, yeah, how do syncing 42 00:02:03,506 --> 00:02:05,876 systems, et cetera, how do they work? 43 00:02:06,136 --> 00:02:10,396 So this has been the, the initial touch point for me, but would you 44 00:02:10,406 --> 00:02:11,916 mind briefly introducing yourself? 45 00:02:12,616 --> 00:02:12,976 sure. 46 00:02:13,066 --> 00:02:13,256 Yeah. 47 00:02:13,256 --> 00:02:14,006 So I'm Matthew. 48 00:02:14,086 --> 00:02:15,766 I'm a researcher and developer. 49 00:02:16,046 --> 00:02:19,463 I've been thinking about, local-first software, more generally the problem 50 00:02:19,463 --> 00:02:22,823 of how do we make collaborative software easier to program. 51 00:02:23,228 --> 00:02:28,028 So that's been, I guess, five years of PhD work and now working full time on a 52 00:02:28,258 --> 00:02:30,118 collaborative app, at a small company. 53 00:02:30,638 --> 00:02:33,478 And yeah, the, the question for me has always been, how can we make 54 00:02:33,558 --> 00:02:36,668 building a collaborative app in the style of Google Docs or Figma 55 00:02:36,968 --> 00:02:41,028 as easy as making a smartphone app or a local only desktop app? 56 00:02:41,501 --> 00:02:42,081 Amazing. 57 00:02:42,131 --> 00:02:46,041 I'm curious, what led you, like when you say five years ago, you started working 58 00:02:46,041 --> 00:02:48,641 on this, what led you to, to that point? 59 00:02:48,661 --> 00:02:50,811 What motivated you to, to look into this? 60 00:02:51,571 --> 00:02:53,341 yeah, so it actually started a little earlier. 61 00:02:53,341 --> 00:02:56,871 So six years ago, I was doing a master's degree at the University of Cambridge. 62 00:02:57,211 --> 00:02:59,931 I had to pick a master's thesis project, and some of the Ph. 63 00:02:59,951 --> 00:03:00,111 D. 64 00:03:00,111 --> 00:03:03,231 students talked about what their lab group was doing, the TrueData group, 65 00:03:03,241 --> 00:03:05,911 where they were working on an end to end encrypted version of Google Docs. 66 00:03:06,371 --> 00:03:09,291 The idea is that some professions, like lawyers or journalists, they want the 67 00:03:09,291 --> 00:03:12,951 collaboration of Google Docs, but they don't trust their data to a third party. 68 00:03:13,266 --> 00:03:14,836 where the, you know, the employees can look at it or 69 00:03:14,836 --> 00:03:16,066 it's on someone else's servers. 70 00:03:16,366 --> 00:03:18,816 So they wanted this end to end encryption where you say only 71 00:03:18,816 --> 00:03:21,426 you and your collaborators can read the unencrypted data. 72 00:03:22,116 --> 00:03:23,966 So I thought this sounded like a really interesting project. 73 00:03:23,966 --> 00:03:25,596 I just joined them for my master's thesis. 74 00:03:25,666 --> 00:03:29,986 Turned out to be working with Alastair Beresford and Martin Kleppmann, um, 75 00:03:30,076 --> 00:03:31,506 mostly on the cryptography side. 76 00:03:31,776 --> 00:03:34,516 Then after that, I decided that actually the collaboration side 77 00:03:34,726 --> 00:03:37,296 sounded more interesting, and I wanted to work on that for my PhD. 78 00:03:37,723 --> 00:03:38,633 Very interesting. 79 00:03:38,833 --> 00:03:41,823 What did the technology landscape at that point look like? 80 00:03:41,823 --> 00:03:45,313 I mean, today there's like Automerge and quite a few other technologies 81 00:03:45,583 --> 00:03:47,323 that already try to attempt this. 82 00:03:47,663 --> 00:03:50,123 what did the technology landscape back then look like? 83 00:03:50,683 --> 00:03:52,813 So this was before the local-first essay. 84 00:03:52,813 --> 00:03:56,003 I think I actually saw a draft of the local-first essay that 85 00:03:56,003 --> 00:03:57,553 year, now as a master's student. 86 00:03:57,963 --> 00:04:01,853 Automerge I believe had started, YJS had started, but I hadn't 87 00:04:01,853 --> 00:04:03,123 heard of people using it yet. 88 00:04:03,653 --> 00:04:06,703 but yes, people were just getting started to use this idea of. 89 00:04:07,083 --> 00:04:10,573 collaborative data structures for the web, not necessarily with central 90 00:04:10,593 --> 00:04:15,253 servers like these CRDT libraries were just getting started, and I don't 91 00:04:15,263 --> 00:04:18,663 know if the local-first world had really even started yet at that point. 92 00:04:19,149 --> 00:04:19,649 Right. 93 00:04:19,689 --> 00:04:20,059 Yeah. 94 00:04:20,069 --> 00:04:23,389 I think there are so many people who thought about similar problems 95 00:04:23,419 --> 00:04:25,799 over like decades before then. 96 00:04:25,809 --> 00:04:29,689 There was like CouchDB and PouchDB and like a lot of great minds 97 00:04:29,699 --> 00:04:32,899 already thought about this, but I feel like the real momentum 98 00:04:32,909 --> 00:04:34,989 started with the local-first essay. 99 00:04:35,389 --> 00:04:37,969 So I'm curious, take me through a little bit of like the, the 100 00:04:37,969 --> 00:04:40,159 five years working on that. 101 00:04:40,379 --> 00:04:42,129 What were some of the milestones? 102 00:04:42,129 --> 00:04:44,729 How did you go about starting this in the first place? 103 00:04:45,159 --> 00:04:45,579 Sure. 104 00:04:45,909 --> 00:04:49,689 So the, the main things I was coming at it from a more academic perspective, like 105 00:04:49,689 --> 00:04:51,589 I really have a theory math background. 106 00:04:51,849 --> 00:04:54,849 So I was looking at the, the theory of CRDTs, these conflict 107 00:04:54,849 --> 00:04:55,909 free replicated data types. 108 00:04:56,404 --> 00:04:59,614 Which, sort of, the idea is that it's a data structure that's 109 00:04:59,624 --> 00:05:00,974 copied on multiple devices. 110 00:05:01,164 --> 00:05:04,874 You put your data in it, like your app's data, and then one user can change their 111 00:05:04,874 --> 00:05:06,274 copy of the data whenever they want. 112 00:05:06,524 --> 00:05:09,684 At some point later, you'll sync up in the background and come to 113 00:05:09,684 --> 00:05:12,214 a convergent copy where everyone's looking at the same document again. 114 00:05:12,611 --> 00:05:15,491 This is really designed for the sort of peer to peer model where you don't 115 00:05:15,491 --> 00:05:18,711 necessarily have central authority, it's just everyone updating their own data. 116 00:05:19,141 --> 00:05:21,861 and also this local-first spirit, where you always update the local copy of 117 00:05:21,861 --> 00:05:25,351 your data first, and then you talk to everyone else and say, here's my changes. 118 00:05:25,941 --> 00:05:29,471 So I spent the first year really just reading the papers in that field. 119 00:05:29,521 --> 00:05:34,521 So there's a classic paper by Mark Shapiro right now for 2011, a lot of papers by 120 00:05:34,531 --> 00:05:38,831 Carlos Vaccaro and his collaborators, yeah, just trying to learn what are these 121 00:05:38,831 --> 00:05:40,411 data structures, what can we do with them. 122 00:05:41,009 --> 00:05:41,339 Got it. 123 00:05:41,399 --> 00:05:46,686 And so after that, you started your own implementations of CRDTs. 124 00:05:46,766 --> 00:05:50,156 And was there any sort of reference app that you oriented this around? 125 00:05:50,526 --> 00:05:51,186 Not really. 126 00:05:51,186 --> 00:05:53,216 So there's actually, there's a reference CRDT. 127 00:05:53,226 --> 00:05:56,316 So we started with this paper, which is very theoretical about this way 128 00:05:56,316 --> 00:05:58,906 that you could maybe combine two CRDTs. 129 00:05:59,266 --> 00:06:02,146 So the example we use, which is a bit silly, is if you have. 130 00:06:02,386 --> 00:06:06,316 a number that you can add things to, like maybe a bank account balance 131 00:06:06,316 --> 00:06:09,106 you can add to, you can also multiply to if you're applying the interest. 132 00:06:09,396 --> 00:06:12,756 How do you combine these two operations in a single CRDT that can 133 00:06:12,756 --> 00:06:14,736 be updated with either add or multiply? 134 00:06:15,416 --> 00:06:18,526 So then my advisor had this idea, let's implement this in a library. 135 00:06:18,894 --> 00:06:23,354 and there that already set some sort of unique design principles, 136 00:06:23,354 --> 00:06:26,284 which is that we're going to assume you're making your own CRDTs. 137 00:06:26,294 --> 00:06:29,914 It's not just a collection of CRDTs we give to you, like map, list, et cetera, 138 00:06:30,214 --> 00:06:33,694 actually going to be whatever, and then some way to combine them together. 139 00:06:34,494 --> 00:06:38,014 So that was really the starting point, is that we want to make a place where you 140 00:06:38,014 --> 00:06:40,344 can make your own CRDTs and compose them. 141 00:06:40,664 --> 00:06:43,184 I don't think we really had a specific application in mind at the beginning. 142 00:06:44,033 --> 00:06:49,443 Was that technology ever released or open source or talked about in some way? 143 00:06:50,023 --> 00:06:52,413 So we did make a open source library about it. 144 00:06:52,563 --> 00:06:53,673 It's called Collabs. 145 00:06:53,683 --> 00:06:54,793 So it's written in TypeScript. 146 00:06:55,183 --> 00:06:56,713 we have a documentation site. 147 00:06:56,713 --> 00:06:58,893 I think it's collabs.readthedocs.Io. 148 00:06:59,413 --> 00:07:01,263 it's definitely still an academic project. 149 00:07:01,313 --> 00:07:04,973 So it's really about, here are these data structures that you can play 150 00:07:04,973 --> 00:07:06,443 with and you can make your own things. 151 00:07:06,833 --> 00:07:11,263 we do have some basic demo apps, like your basic, uh, You know, text editor. 152 00:07:11,573 --> 00:07:13,453 there's a to do list sort of thing somewhere. 153 00:07:13,453 --> 00:07:16,743 And then there is an archive paper about it that you can read, which goes into 154 00:07:16,743 --> 00:07:19,633 more detail about the system design and why we did things the way we did. 155 00:07:20,233 --> 00:07:20,613 Got it. 156 00:07:20,683 --> 00:07:25,763 And so it sounds like you've really gone super deep on this, mostly 157 00:07:25,773 --> 00:07:28,263 oriented from the CRDT side of things. 158 00:07:28,653 --> 00:07:33,213 But, as you read the papers, as you were working on this, you also got 159 00:07:33,213 --> 00:07:37,313 a better understanding of the larger space and the other approaches. 160 00:07:37,633 --> 00:07:40,393 And I think you got more curious about the other approaches and this 161 00:07:40,393 --> 00:07:45,428 is what you've laid out so clearly and brilliantly in this blog post that will 162 00:07:45,428 --> 00:07:47,098 be linked in the, in the show notes. 163 00:07:47,118 --> 00:07:51,278 And I highly recommend anyone who's listening to read it in depth, if 164 00:07:51,278 --> 00:07:52,818 you're curious about those topics. 165 00:07:53,205 --> 00:07:58,075 so the, the blog post called Architectures for Central Server Collaboration, and 166 00:07:58,075 --> 00:08:02,970 it provides a really nice way to think about this, like provides of like a. 167 00:08:03,060 --> 00:08:07,210 Hierarchical structure of what are the design decisions? 168 00:08:07,220 --> 00:08:08,170 What are the trade offs? 169 00:08:08,170 --> 00:08:11,040 What are the concerns about the different approaches. 170 00:08:11,530 --> 00:08:15,910 And so I've, I'd love to just go through that step by step. 171 00:08:16,260 --> 00:08:17,580 maybe you want to walk us through it. 172 00:08:17,940 --> 00:08:18,380 Sure. 173 00:08:18,710 --> 00:08:19,490 Let's see. 174 00:08:19,620 --> 00:08:19,780 Yeah. 175 00:08:19,780 --> 00:08:23,910 So the, the idea of this blog post is we're thinking about. 176 00:08:24,345 --> 00:08:25,785 Real time collaborative apps. 177 00:08:25,905 --> 00:08:29,285 So these are apps like Google Docs, Figma, Notion, that sort of thing. 178 00:08:29,695 --> 00:08:32,755 And sort of the distinguishing feature of these apps compared to 179 00:08:32,825 --> 00:08:36,875 more traditional web apps is that, you know, when you make a change, it 180 00:08:36,875 --> 00:08:38,525 updates your local copy immediately. 181 00:08:38,765 --> 00:08:41,955 It's not just click a button, go back to the server, get a 182 00:08:41,955 --> 00:08:43,155 new web page and show it to you. 183 00:08:43,165 --> 00:08:45,355 It's click a button and something updates on your own screen 184 00:08:45,655 --> 00:08:47,885 immediately and eventually it'll tell the server what you did. 185 00:08:48,175 --> 00:08:51,525 So this blog post was trying to think about, in general, with these real 186 00:08:51,545 --> 00:08:54,615 time collaborative apps, like, what are we doing in a semantic sense? 187 00:08:54,655 --> 00:08:56,655 Like, what does it mean to be real time collaborative? 188 00:08:57,015 --> 00:09:00,900 And then, what sort of, you know, the high level of how you can implement 189 00:09:00,900 --> 00:09:02,920 that in the most flexible way possible. 190 00:09:03,190 --> 00:09:08,020 And so you've derived a couple of like really nice ways to, to think about 191 00:09:08,020 --> 00:09:12,980 that, like in terms of dimensions and later on you, you can nicely 192 00:09:12,980 --> 00:09:15,140 summarize it in a nice overview table. 193 00:09:15,577 --> 00:09:19,627 would you mind motivating some of the dimensions that you come up with here? 194 00:09:20,257 --> 00:09:20,647 Sure. 195 00:09:21,117 --> 00:09:21,597 Let's see. 196 00:09:21,597 --> 00:09:24,397 So I guess just for context, my own background is, as I said, thinking 197 00:09:24,397 --> 00:09:26,007 about it from a CRDT perspective. 198 00:09:26,397 --> 00:09:30,327 This is very much the perspective if you have some data structures, 199 00:09:30,367 --> 00:09:33,777 which are usually pretty low level, like maps and lists, and you have 200 00:09:33,777 --> 00:09:37,037 some prescribed operations that you can perform on them, and then it'll 201 00:09:37,047 --> 00:09:39,167 sync it for you under the hood. 202 00:09:39,537 --> 00:09:43,887 And then also in the CRDT model, it's usually not really assuming a central 203 00:09:43,887 --> 00:09:47,087 server, where the central server is doing basically the same thing as the clients. 204 00:09:47,377 --> 00:09:51,387 So what the dimensions are thinking about is, okay, what can we do that's 205 00:09:51,387 --> 00:09:52,887 different from just the CRDT model? 206 00:09:53,317 --> 00:09:56,007 And there is Yeah, there's really three dimensions. 207 00:09:56,077 --> 00:10:00,197 I guess maybe the most interesting one is the is how you describe 208 00:10:00,217 --> 00:10:02,807 operations on the collaborative state. 209 00:10:03,397 --> 00:10:06,977 So you have sort of the, the database or key value store model, which is, 210 00:10:06,977 --> 00:10:09,597 you have these low level state changes. 211 00:10:09,617 --> 00:10:14,027 Like when I check a box in to do list, that's creating a row in a database 212 00:10:14,087 --> 00:10:17,947 that says, you know, to do list checked, true, that sort of thing. 213 00:10:18,227 --> 00:10:21,947 And then there's also this opposite model, which is sort of the more event sourcing 214 00:10:21,947 --> 00:10:25,927 approach where you have these high level operations, sometimes called mutations. 215 00:10:26,457 --> 00:10:29,647 And this is where, when you change the data, you're actually telling the server 216 00:10:29,677 --> 00:10:31,277 exactly what the user's intent was. 217 00:10:31,287 --> 00:10:34,767 You say, the user wants to check this box and make it true, and 218 00:10:34,767 --> 00:10:38,217 then you broadcast that high level intent back to the other users. 219 00:10:38,532 --> 00:10:40,832 And tell them what to do and how to update their state. 220 00:10:41,552 --> 00:10:44,892 And I think this is also like this distinction between the 221 00:10:45,082 --> 00:10:49,822 intent of a mutation and the, the change more directly. 222 00:10:50,112 --> 00:10:55,022 I think this can be, a little bit of a subtle difference for 223 00:10:55,042 --> 00:10:58,752 people who haven't built something with either approaches yet. 224 00:10:59,232 --> 00:11:05,287 But, uh, I think to draw an analogy from the web world, When you're working 225 00:11:05,307 --> 00:11:09,027 with something like Redux, this is where I'm not sure whether you ever 226 00:11:09,027 --> 00:11:12,477 built some, some front end apps with Redux, but this is where you have, for 227 00:11:12,477 --> 00:11:17,567 example, if I remember correctly, the, the concept of an, of an action, which 228 00:11:17,577 --> 00:11:22,747 is basically the idea of an event where you declaratively say like, okay, there 229 00:11:22,797 --> 00:11:29,057 is an action or there is an event for, someone wants to complete this to do. 230 00:11:29,687 --> 00:11:34,517 Then further down the road, there's like a reducer, which then in, for example, 231 00:11:34,517 --> 00:11:41,417 maintains a list of to-dos and maybe kicks it out or maybe, overrides a property 232 00:11:41,417 --> 00:11:46,177 in the to-dos array and says something is done as opposed to the other approach 233 00:11:46,497 --> 00:11:49,767 where you directly mutate the, the state. 234 00:11:50,237 --> 00:11:55,567 Which is, for example, in the web world, we're using something like MobX, etc. 235 00:11:55,767 --> 00:11:59,937 And so now we're talking here about the equivalence for distributed states, 236 00:12:00,307 --> 00:12:05,637 and where CRDTs, I think, give us more the analogy, this might be a stretch, 237 00:12:05,637 --> 00:12:09,577 but give us more of like an equivalent of like something like MobX, where 238 00:12:09,577 --> 00:12:11,557 you mutate the state more directly. 239 00:12:11,962 --> 00:12:16,832 And the CRDT underpinnings nicely make that principled constrain 240 00:12:16,832 --> 00:12:20,692 you in the in the right way and then also distribute the state. 241 00:12:20,742 --> 00:12:23,102 Did I summarize this in the right way? 242 00:12:23,672 --> 00:12:24,662 Yes, good description. 243 00:12:25,032 --> 00:12:28,112 Maybe another way to think about it that's in more illustrative 244 00:12:28,132 --> 00:12:31,392 than to do list is to think about like the the video game example. 245 00:12:31,882 --> 00:12:35,912 So for example in a video game if you press an arrow key on your keyboard you 246 00:12:35,912 --> 00:12:39,762 can do sort of the high level intent is I want my character to move forward. 247 00:12:40,092 --> 00:12:41,952 And then your game server will interpret that intent. 248 00:12:42,062 --> 00:12:44,342 It'll try to move your character forward, but if there's a wall 249 00:12:44,342 --> 00:12:45,562 in the way, it'll stop you. 250 00:12:45,782 --> 00:12:47,872 And if you step on a pressure plate, it'll do something. 251 00:12:48,405 --> 00:12:51,865 and then ultimately compute the actual state changes, which are 252 00:12:51,865 --> 00:12:54,845 the low level things of like, what coordinates are my player at now? 253 00:12:55,075 --> 00:12:57,065 what is the state of the world in terms of, you know, 254 00:12:57,085 --> 00:12:58,495 doors that are open or closed? 255 00:12:58,735 --> 00:13:01,515 And it'll send those low level state changes back to clients. 256 00:13:02,090 --> 00:13:04,020 So that's another example of this distinction between high 257 00:13:04,020 --> 00:13:05,680 level versus low level intent. 258 00:13:06,133 --> 00:13:11,623 Right, and I think this is now also a really important distinction because in 259 00:13:11,623 --> 00:13:18,403 the Redux or MobX example, it's, like, all of that is happening on the local device. 260 00:13:18,603 --> 00:13:22,143 There's no cheating in that regard, but when you're talking about games, 261 00:13:22,433 --> 00:13:23,823 they can actually be cheating. 262 00:13:23,823 --> 00:13:27,573 And how do you prevent that particularly in a multiplayer context? 263 00:13:27,913 --> 00:13:32,693 And this is where you, what do you do on the client and what do you do on the 264 00:13:32,693 --> 00:13:37,723 server, maybe need to be different things where the server acts more than authority. 265 00:13:38,063 --> 00:13:42,410 And the client rather provides, instructions as opposed to providing 266 00:13:42,410 --> 00:13:47,480 the authoritative source of truth for the actual state of a world. 267 00:13:47,970 --> 00:13:53,820 And so this is where the intent is not equal to the reality 268 00:13:53,820 --> 00:13:55,570 that is coming out of it. 269 00:13:55,910 --> 00:14:00,820 And I think this is nicely illustrated in your article through this game example, 270 00:14:01,060 --> 00:14:05,100 where you can basically send to the server, like, Hey, I want to move forward. 271 00:14:05,345 --> 00:14:09,075 The server knows where you were before, and the server tells you 272 00:14:09,085 --> 00:14:10,785 afterwards, like, now you're here. 273 00:14:11,215 --> 00:14:16,285 The client locally can probably, if everything is in an okay state, has 274 00:14:16,305 --> 00:14:20,972 probably already arrived at the same conclusion, but, at least this way the 275 00:14:21,012 --> 00:14:26,922 client can't override to say the player position is somewhere in an illegal state. 276 00:14:27,748 --> 00:14:30,898 Maybe this sort of transitions into the next point or another 277 00:14:30,898 --> 00:14:31,978 dimension in the article. 278 00:14:32,263 --> 00:14:35,463 Which is, what does the server actually do when it receives 279 00:14:35,463 --> 00:14:38,333 an operation, in particular an operation that's out of date? 280 00:14:38,783 --> 00:14:42,053 So the classic example is if you have a like counter, like a post has 281 00:14:42,053 --> 00:14:45,033 some number of likes on it, if it has six likes and I send a command 282 00:14:45,033 --> 00:14:48,153 to the server that says, I like it, change the number of likes to seven. 283 00:14:48,583 --> 00:14:52,123 But what if someone else also liked the post in the meantime, and their 284 00:14:52,173 --> 00:14:53,953 like made it to the server first? 285 00:14:54,403 --> 00:14:56,783 So now the like count's already seven, I don't want to set it to seven 286 00:14:56,783 --> 00:14:58,233 again, I want to increase it to eight. 287 00:14:58,463 --> 00:15:01,525 And there's a, yeah, so basically there's a few philosophies in how the 288 00:15:01,525 --> 00:15:04,405 server should process this operation so that it still makes sense. 289 00:15:04,405 --> 00:15:08,275 I mean, technically it's legal to keep the original operation as 290 00:15:08,275 --> 00:15:10,985 just set the count to 7, but that's not really what the users expect. 291 00:15:11,218 --> 00:15:15,748 So the one philosophy, sort of the CRDT way, is to say, I'm going to phrase 292 00:15:15,748 --> 00:15:19,198 my operations in such a way that the server will know what I want it to 293 00:15:19,198 --> 00:15:21,173 do, And it'll do the correct thing. 294 00:15:21,523 --> 00:15:26,173 So for a light counter, the classic way is you say, increase the light count by one. 295 00:15:26,513 --> 00:15:29,163 The server can get that, and even if the count has gone up since what you 296 00:15:29,163 --> 00:15:32,273 originally thought it was, it's still going to add one and do the proper thing. 297 00:15:32,343 --> 00:15:34,483 So you're going to end up with eight lights instead of seven. 298 00:15:34,767 --> 00:15:37,867 And sort of the other spirit is the operational transformation spirit. 299 00:15:38,177 --> 00:15:41,267 So this is an older technique for collaborative apps that's used by 300 00:15:41,317 --> 00:15:44,777 Google Docs and was developed in the 90s for the Jupyter collaboration system. 301 00:15:45,337 --> 00:15:48,697 And here the spirit is, the server is going to look at your operation, it's 302 00:15:48,697 --> 00:15:52,237 going to look at all the intervening operations that you didn't know about but 303 00:15:52,267 --> 00:15:56,667 the server has received already, and it's going to use those to sort of compute 304 00:15:56,717 --> 00:15:58,277 what your new intent is supposed to be. 305 00:15:58,627 --> 00:16:01,477 So this example, you would tell the server, change the like count 306 00:16:01,487 --> 00:16:04,747 to seven, but the server would see that there was an intervening change 307 00:16:04,747 --> 00:16:06,007 the like count operation already. 308 00:16:06,087 --> 00:16:09,747 It's going to rewrite your operation as change the like count to eight, and 309 00:16:09,747 --> 00:16:13,067 actually apply that to its state and send that operation to the other users. 310 00:16:13,635 --> 00:16:14,035 Got it. 311 00:16:14,045 --> 00:16:18,518 So, and this is basically about the, the convergence aspect And I 312 00:16:18,528 --> 00:16:22,878 suppose where this code is running, this can equally work on the 313 00:16:22,878 --> 00:16:24,708 client as well as on the server. 314 00:16:25,078 --> 00:16:30,328 So this is sort of orthogonal to the, the game example case that we talked 315 00:16:30,338 --> 00:16:32,708 about, which is more about the authority. 316 00:16:33,388 --> 00:16:33,918 Yeah. 317 00:16:34,108 --> 00:16:34,308 Yeah. 318 00:16:34,308 --> 00:16:36,348 So this isn't about how does the server. 319 00:16:36,698 --> 00:16:40,038 interpret operations from, like, a correctness permissions perspective. 320 00:16:40,258 --> 00:16:43,288 It's just how does the server handle operations that are sort of 321 00:16:43,298 --> 00:16:46,328 stale, in the sense that the client originally applied them one state, 322 00:16:46,678 --> 00:16:49,358 but by the time they arrived at the server, the state had updated because 323 00:16:49,358 --> 00:16:50,528 other people were doing things. 324 00:16:50,738 --> 00:16:52,238 Now the server has to figure out what to do. 325 00:16:52,578 --> 00:16:55,138 Yes, this is the server side rebasing. 326 00:16:55,453 --> 00:16:58,883 This is where the server has to rebase your operation, or 327 00:16:58,883 --> 00:17:01,823 the incoming operations, on top of whatever its new state is. 328 00:17:02,203 --> 00:17:05,623 And sort of the analogy is to git rebasing, where you might try to apply 329 00:17:05,623 --> 00:17:09,833 a commit on top of some new commits that weren't there when you first tried it. 330 00:17:10,310 --> 00:17:10,730 Got it. 331 00:17:11,030 --> 00:17:15,060 Okay, so that is one dimension that you've nicely dissected 332 00:17:15,130 --> 00:17:16,990 here in this, in this blog post. 333 00:17:17,320 --> 00:17:18,300 what is the next one? 334 00:17:19,324 --> 00:17:23,184 So the next one is the the optimistic local updates on the client. 335 00:17:23,594 --> 00:17:26,364 So now if we assume there's an central server, everyone's taking 336 00:17:26,364 --> 00:17:29,264 these updates, they're sending these operations to the server, the server 337 00:17:29,264 --> 00:17:30,614 knows what the state's supposed to be. 338 00:17:31,064 --> 00:17:34,554 And what you could say is just the traditional, web app model. 339 00:17:34,554 --> 00:17:38,194 If I submit an operation to the server, it processes it, it sends back, sends me 340 00:17:38,194 --> 00:17:39,764 back the result, and now I get to see it. 341 00:17:40,284 --> 00:17:43,844 So if you think like, um, you know, traditional HTML form, you submit your 342 00:17:43,844 --> 00:17:46,674 operation to the server, it gives you a new page back saying what it is. 343 00:17:46,931 --> 00:17:48,601 But with modern apps, we want to do better than that. 344 00:17:48,621 --> 00:17:52,111 We want to say that when I perform an operation on the client, it's going 345 00:17:52,111 --> 00:17:54,251 to update my own state immediately. 346 00:17:54,691 --> 00:17:57,551 And that's an optimistic update because I'm sort of optimistically 347 00:17:57,551 --> 00:18:00,141 assuming that the server is actually going to receive my update. 348 00:18:00,141 --> 00:18:01,981 It's going to process it in the way I expected. 349 00:18:02,281 --> 00:18:03,561 No one else is going to interfere. 350 00:18:04,111 --> 00:18:07,121 this is just a nice property in terms of making the app feel more responsive. 351 00:18:07,151 --> 00:18:08,831 You want to see your key presses immediately. 352 00:18:08,861 --> 00:18:10,761 You want to see that button get checked immediately. 353 00:18:10,924 --> 00:18:13,424 So the question is then, how do we actually do that? 354 00:18:13,654 --> 00:18:17,014 Or, I guess the first question is even, what is the correct answer? 355 00:18:17,084 --> 00:18:19,754 What does it mean to optimistically update my state? 356 00:18:20,234 --> 00:18:23,774 And I guess, yeah, sort of the conclusion I came to that, you know, 357 00:18:23,774 --> 00:18:27,504 people have come to in computer games as well, is that you want to take 358 00:18:27,724 --> 00:18:32,584 the latest state you've received from the server, plus your own optimistic 359 00:18:32,584 --> 00:18:34,044 local operations on top of that. 360 00:18:34,364 --> 00:18:36,154 And that's always what the correct state is. 361 00:18:36,474 --> 00:18:38,564 And even as you receive or perform new operations, you're 362 00:18:38,564 --> 00:18:40,064 just maintaining that state. 363 00:18:40,541 --> 00:18:45,271 Like from your first dimension, which is about server side rebasing, now it's 364 00:18:45,271 --> 00:18:50,911 a lot of the same ideas, but applied on the client where you need to make 365 00:18:50,911 --> 00:18:56,151 the same trade off decisions again, you might come up with different conclusions 366 00:18:56,361 --> 00:18:59,771 based on the server and based on the client, depending on your use cases. 367 00:18:59,821 --> 00:19:02,717 So that, that is the second dimension. 368 00:19:03,087 --> 00:19:07,677 And, then you're, you talk about the, the form of operations. 369 00:19:07,717 --> 00:19:14,877 So how, a state is changing based on mutations, based on state changes. 370 00:19:15,137 --> 00:19:17,407 Can you go a little bit more into, into detail here? 371 00:19:18,077 --> 00:19:18,487 Sure. 372 00:19:18,847 --> 00:19:19,117 Yes. 373 00:19:19,147 --> 00:19:21,997 This is what we were talking about at the beginning, where when you, you check 374 00:19:22,007 --> 00:19:25,722 a box in a to do list, you want to say, Am I updating a row in a database that 375 00:19:25,722 --> 00:19:28,942 doesn't know anything about to do lists, or am I sending a high level mutation 376 00:19:28,942 --> 00:19:32,292 that says, like, this user wants to check the to do list and, you know, 377 00:19:32,542 --> 00:19:36,002 do that action or maybe do something else if that's not valid anymore. 378 00:19:36,202 --> 00:19:38,852 So here we get to choose which form of operations we want. 379 00:19:38,902 --> 00:19:42,757 We want to send these high or low level from the client to the server. 380 00:19:42,997 --> 00:19:45,537 Then once the server updates its state, does it want to send high or 381 00:19:45,537 --> 00:19:47,487 low level changes back to the clients? 382 00:19:48,000 --> 00:19:50,320 yeah, so the video game example is an interesting one where you actually 383 00:19:50,320 --> 00:19:51,890 make different choices usually. 384 00:19:52,140 --> 00:19:55,050 So usually you'll send the high level operations from clients to the server. 385 00:19:55,130 --> 00:19:57,810 You say, I want to move forward, I want to shoot my crossbow. 386 00:19:58,260 --> 00:20:00,990 And then on the way back from the server to the client, usually it 387 00:20:01,000 --> 00:20:02,290 won't send those actual actions. 388 00:20:02,360 --> 00:20:06,300 It'll just send the results, which are changes to some basic key value store. 389 00:20:06,644 --> 00:20:10,264 But you can also make different choices, like you can say, you 390 00:20:10,264 --> 00:20:14,004 know, Git is an example where it's sort of high level mutations. 391 00:20:14,264 --> 00:20:17,474 You're saying, like, I want to, you know, change this text paragraph in 392 00:20:17,474 --> 00:20:21,764 a specific file, and Git will send those exact operations to every client. 393 00:20:21,824 --> 00:20:24,134 It's not going to interpret them at all on the server and change 394 00:20:24,134 --> 00:20:25,344 them into a low level change. 395 00:20:26,034 --> 00:20:30,054 Whereas if you use something like the Firebase database, that's all low level. 396 00:20:30,234 --> 00:20:32,614 You send low level changes to Google servers. 397 00:20:32,829 --> 00:20:35,879 Where you say, I want to, you know, set this key to this value or I want 398 00:20:35,899 --> 00:20:37,969 to delete this object in the database. 399 00:20:38,169 --> 00:20:41,569 And it's going to send that change back to clients without having any idea what 400 00:20:41,569 --> 00:20:42,979 the keys and values actually represent. 401 00:20:43,279 --> 00:20:44,019 That makes sense. 402 00:20:44,269 --> 00:20:51,209 And so I think this is also nicely drawing a boundary between the more declarative 403 00:20:51,229 --> 00:20:56,389 approaches that you have in mutations that you can reason more clearly about, 404 00:20:56,399 --> 00:20:58,239 like in the context of your domain. 405 00:20:58,684 --> 00:21:02,154 But it also only makes sense in the context of your domain. 406 00:21:02,414 --> 00:21:05,654 Whereas with state changes, this is the appeal of CRDTs. 407 00:21:06,004 --> 00:21:12,104 This is you just mutate a document and, the, the underlying mechanics, make 408 00:21:12,104 --> 00:21:17,859 sure that the state changes are behaving in, in a useful way since I, I suppose 409 00:21:17,869 --> 00:21:22,759 like listening to the state changes yourself in your app, that's no fun. 410 00:21:22,769 --> 00:21:26,699 So you really want, a system like CRDTs to make sense of that 411 00:21:26,789 --> 00:21:30,759 . So now with those three dimensions and I go through them again, the 412 00:21:30,759 --> 00:21:34,939 server side rebasing, the optimistic updates and the form of operations 413 00:21:34,939 --> 00:21:39,595 like declarative versus state based, now you've combined all of that in a 414 00:21:39,595 --> 00:21:45,462 really nice, classification table where we get a whole bunch of like matrix 415 00:21:45,642 --> 00:21:48,272 cells here with different technologies. 416 00:21:48,592 --> 00:21:52,645 So, Again, highly recommend, actually reading this and looking at the 417 00:21:52,645 --> 00:21:56,605 beautiful table for yourself, but in the different cells, you've 418 00:21:56,615 --> 00:22:01,519 also filled in a couple of existing technologies and see where they slot in. 419 00:22:01,829 --> 00:22:05,909 So would you mind going through the different technologies and maybe 420 00:22:05,909 --> 00:22:07,239 sharing what's interesting about it? 421 00:22:07,582 --> 00:22:07,982 Sure. 422 00:22:08,342 --> 00:22:11,522 So I guess first I can talk about the one cell just near the bottom, right 423 00:22:11,522 --> 00:22:12,572 in the table, if you're looking at it. 424 00:22:12,947 --> 00:22:17,437 Which is the CRDTxCRDT cell. 425 00:22:18,317 --> 00:22:21,077 So this is basically the place where I spent my most time reading 426 00:22:21,077 --> 00:22:24,077 about CRDTs, working on this academic open source library. 427 00:22:24,587 --> 00:22:29,147 And that's where the operations that users send are really these low level state 428 00:22:29,147 --> 00:22:33,727 changes to some sort of magical replicated database, where you update the database, 429 00:22:33,967 --> 00:22:37,417 like normally on your local device, and it promises to do this synchronization in 430 00:22:37,417 --> 00:22:40,297 the background and make sure that everyone converges to the same state immediately 431 00:22:40,617 --> 00:22:43,847 without really caring about what specifically your data or operations are. 432 00:22:44,147 --> 00:22:45,757 So that some prominent examples. 433 00:22:45,757 --> 00:22:49,067 So Firebase Realtime Database, I think of as an example, also 434 00:22:49,067 --> 00:22:51,867 the CRDT ish libraries, like YJS. 435 00:22:52,205 --> 00:22:56,185 also, yeah, Triplit, InstantDB, those are all sort of in this quadrant 436 00:22:56,405 --> 00:23:00,462 or in this cell thing that we're going to replicate low level changes 437 00:23:00,462 --> 00:23:02,499 for you, just like as they are. 438 00:23:02,749 --> 00:23:05,879 another cell on this table, which is sort of near the bottom left, we 439 00:23:05,879 --> 00:23:07,379 mentioned in the computer game example. 440 00:23:07,769 --> 00:23:10,809 In a computer game, you're going to send these high level actions to the 441 00:23:10,809 --> 00:23:14,929 server, which is going to figure out what to do with them, and then communicate 442 00:23:14,929 --> 00:23:16,329 the state changes back to clients. 443 00:23:16,790 --> 00:23:20,300 that's another interesting cell, both because it's sort of old, like, you know, 444 00:23:20,300 --> 00:23:23,480 this is, starts with the Half Life game engine in the 1990s, so people have been 445 00:23:23,480 --> 00:23:27,610 using this technique forever, just not in web apps, it's in computer games. 446 00:23:28,250 --> 00:23:32,570 But more recently, Replicache implements this model as a data sync 447 00:23:32,570 --> 00:23:36,250 layer for web applications, which I know a number of companies are using. 448 00:23:36,660 --> 00:23:39,520 and I found that really inspirational reading about how Replicache works. 449 00:23:39,850 --> 00:23:41,170 I'm glad to have learned about it. 450 00:23:41,500 --> 00:23:41,840 Right. 451 00:23:41,840 --> 00:23:44,590 And I love like how you compare those technologies. 452 00:23:44,610 --> 00:23:45,570 Both technologies. 453 00:23:45,600 --> 00:23:49,230 I love, love like the Half Life game engine spent way too much 454 00:23:49,230 --> 00:23:54,427 time, playing various Half Life game engine games, where it's very, very 455 00:23:54,437 --> 00:23:58,827 intuitive that if you play, press the W key, which moves you forward. 456 00:23:59,087 --> 00:24:01,057 That's like communicating the intent. 457 00:24:01,277 --> 00:24:04,457 To the server, you don't tell the server like, Oh, I'm at these coordinates. 458 00:24:04,457 --> 00:24:08,527 You just give it like a history of like which keys you pressed 459 00:24:08,557 --> 00:24:10,117 and therefore like how you moved. 460 00:24:10,604 --> 00:24:13,564 and it does some validation of like whether all of that is okay. 461 00:24:13,804 --> 00:24:16,044 And it sends you back the location. 462 00:24:16,284 --> 00:24:20,610 And it's the same about Replicache where you send it a few mutations And on the 463 00:24:20,610 --> 00:24:25,600 Replicache server, it interprets all of that and sends back to you the state 464 00:24:25,980 --> 00:24:29,680 using the server side knowledge, which might be different than the client side 465 00:24:29,710 --> 00:24:31,490 implementation, so it's the authority. 466 00:24:31,820 --> 00:24:36,040 So that is very clear and very nicely laid out here, where you send the intent, 467 00:24:36,050 --> 00:24:40,344 you send the declarative mutations, and the server sends you back some state 468 00:24:40,344 --> 00:24:45,984 changes, as opposed to what you before mentioned, with a CRDT times CRDT, 469 00:24:46,314 --> 00:24:51,874 where Both on the client, on the server, you run the same CRDT convergence. 470 00:24:52,204 --> 00:24:55,514 And, uh, so those two, those two cells are very clear. 471 00:24:55,930 --> 00:24:56,670 Yes, exactly. 472 00:24:56,870 --> 00:25:00,570 And then, yeah, so I guess the remaining cells of the table, they 473 00:25:00,570 --> 00:25:04,480 mostly, they either use state changes in both directions or they use high 474 00:25:04,480 --> 00:25:05,980 level mutations in both directions. 475 00:25:06,450 --> 00:25:07,030 So, let's see. 476 00:25:07,030 --> 00:25:07,940 Two interesting ones. 477 00:25:08,200 --> 00:25:09,810 Automerge in ShareDB. 478 00:25:10,130 --> 00:25:15,010 They're both doing a similar idea to the CRDT libraries, like YJS, where they're 479 00:25:15,020 --> 00:25:18,180 sending these low level state changes around and making sure everyone converges 480 00:25:18,180 --> 00:25:21,800 to the same state, but they have a different way of doing this internally. 481 00:25:22,310 --> 00:25:26,620 So with Automerge, what you're actually doing is you're performing these state 482 00:25:26,620 --> 00:25:30,850 based Automerges that a library is basically a JSON CRDT, but the way 483 00:25:30,850 --> 00:25:35,250 it works is more of an, like an event sourcing model, where you have this 484 00:25:35,270 --> 00:25:38,260 total order of CRDT style operations. 485 00:25:38,610 --> 00:25:40,560 All clients are going to make sure that they eventually 486 00:25:40,560 --> 00:25:42,190 confer to the same total order. 487 00:25:42,200 --> 00:25:45,870 So everyone will agree what operation 1, operation 2, operation 3, etc. 488 00:25:46,210 --> 00:25:50,410 The state is the result of applying all of these operations in that fixed order. 489 00:25:50,740 --> 00:25:54,270 And if, you know, people do operations concurrently on their different devices 490 00:25:54,270 --> 00:25:57,670 because the network's not working, then we'll just sort those operations 491 00:25:57,670 --> 00:26:00,600 into some order later, make sure everyone agrees on the same order, 492 00:26:00,640 --> 00:26:01,590 and that's giving you your state. 493 00:26:01,905 --> 00:26:02,565 Interesting. 494 00:26:02,565 --> 00:26:07,355 So given that Yjs and Automerge, which I think are in the web ecosystem, the, the 495 00:26:07,355 --> 00:26:12,935 two most popular CRDT implementations, they actually do differ in this dimension 496 00:26:12,935 --> 00:26:15,635 of like how state changes are implemented. 497 00:26:15,925 --> 00:26:17,622 again, Firebase, as well as Yjs. 498 00:26:17,977 --> 00:26:22,837 following more strictly the CRDT approach and Automerge using server reconciliation. 499 00:26:23,114 --> 00:26:27,794 is there an example that comes to mind where this, in a example app 500 00:26:27,824 --> 00:26:32,644 use case would differ and where you would use Automerge or Yjs, 501 00:26:32,694 --> 00:26:34,354 intentionally because of this? 502 00:26:34,724 --> 00:26:37,204 I think in terms of the, the external. 503 00:26:37,475 --> 00:26:41,215 the API, or what you see as a user of these libraries, it doesn't really differ. 504 00:26:41,505 --> 00:26:45,165 It's more just in terms of the implementation, I guess, in, in this 505 00:26:45,175 --> 00:26:48,705 totally ordered model like Automerge uses, you don't have to worry as much 506 00:26:48,705 --> 00:26:50,605 about getting the math exactly right. 507 00:26:50,665 --> 00:26:53,985 Like, am I sure that these two operations actually do the same thing 508 00:26:53,985 --> 00:26:57,365 if I apply them in different orders, which is this mathematical requirement 509 00:26:57,405 --> 00:26:59,415 that you have to satisfy for CRDTs. 510 00:26:59,995 --> 00:27:04,525 So that makes it a bit easier on the, to like the correctness and 511 00:27:04,525 --> 00:27:06,295 sureness of the implementation. 512 00:27:06,935 --> 00:27:10,805 Whereas with the YJS or CRDT style, if I'm just going to apply my operations 513 00:27:10,805 --> 00:27:14,485 directly, in principle that can be a bit faster because you don't have to 514 00:27:14,485 --> 00:27:18,915 worry about rewinding your total order of operations and then applying a new 515 00:27:18,915 --> 00:27:20,515 thing and walking it forward again. 516 00:27:21,145 --> 00:27:24,895 That said, usually if you're making a collaborative application with CRDTs, 517 00:27:24,905 --> 00:27:28,365 you don't really need to process more than a handful of operations 518 00:27:28,375 --> 00:27:31,325 every second, so it doesn't matter if it takes a little bit longer. 519 00:27:31,742 --> 00:27:32,092 Got it. 520 00:27:32,182 --> 00:27:32,522 Okay. 521 00:27:32,522 --> 00:27:33,302 That, that makes sense. 522 00:27:33,312 --> 00:27:38,152 So in the CRDT approach, wherever I am currently in my state, I can just apply 523 00:27:38,152 --> 00:27:40,812 on top the existing or the new events. 524 00:27:41,132 --> 00:27:45,959 And, with a server side reconciliation approach, this is where depending 525 00:27:45,969 --> 00:27:49,659 on what the new events are, where they sit in terms of the timeline. 526 00:27:49,949 --> 00:27:54,309 I might need to, uh, Wind back, apply them, and that might take a little 527 00:27:54,309 --> 00:27:57,939 bit longer, but possibly also makes the implementation a bit easier. 528 00:27:58,182 --> 00:27:59,242 Yeah, I guess just one note. 529 00:27:59,242 --> 00:28:01,712 So, you've been saying server side reconciliation. 530 00:28:01,852 --> 00:28:03,612 Automerge does not actually require a server. 531 00:28:03,612 --> 00:28:05,202 It's a completely decentralized model. 532 00:28:05,442 --> 00:28:07,882 The name is just sort of by analogy to what you would do 533 00:28:07,882 --> 00:28:08,962 if you would have a server. 534 00:28:09,082 --> 00:28:11,352 You would put all the things in the order that the server receives them. 535 00:28:11,842 --> 00:28:15,572 Automerge instead infers a sort of order in a decentralized way. 536 00:28:15,829 --> 00:28:16,619 That makes sense. 537 00:28:16,649 --> 00:28:21,209 So, we've now mostly talked about the state changes side of it. 538 00:28:21,559 --> 00:28:26,529 And, we talked about how our optimistic, locally, how are 539 00:28:26,529 --> 00:28:28,129 the state changes applied. 540 00:28:28,449 --> 00:28:32,929 But we didn't talk too much about the mutations times mutations quadrant, which 541 00:28:32,929 --> 00:28:35,999 also has couple of, like, Subsections. 542 00:28:36,309 --> 00:28:38,339 So let's dig a little bit into this one. 543 00:28:38,869 --> 00:28:42,569 Yeah, so this, this mutations, mutations quadrant, this is sort of the event 544 00:28:42,569 --> 00:28:45,929 sourcing idea where instead of sending around low level changes, we're going 545 00:28:45,929 --> 00:28:49,959 to send around the actual user actions, both from users to the server and 546 00:28:49,959 --> 00:28:51,329 from the server back to other users. 547 00:28:51,899 --> 00:28:56,059 So an example would be like, if you do a find and replace operation, or maybe 548 00:28:56,059 --> 00:28:59,409 you rename a variable in VS code, the operation that you're going to send 549 00:28:59,409 --> 00:29:03,459 to the server actually says, you know, rename this variable from foo to bar. 550 00:29:03,909 --> 00:29:06,719 As opposed to a bunch of low level edits where you go through and change 551 00:29:06,719 --> 00:29:10,449 the actual characters, F O O to B A R, in every place they happen to exist. 552 00:29:10,849 --> 00:29:15,695 So this quadrant is interesting because it gives you a lot more flexibility 553 00:29:15,705 --> 00:29:20,235 in terms of what You can communicate this really high level intent, like 554 00:29:20,235 --> 00:29:25,885 code refactors or actions in a computer game, and then the server can interpret 555 00:29:25,885 --> 00:29:27,155 that intent in a reasonable way. 556 00:29:27,155 --> 00:29:30,755 You know, applying permissions, maybe you can see that someone else has also been. 557 00:29:31,102 --> 00:29:33,952 you know, added a new reference to that variable. 558 00:29:33,952 --> 00:29:35,682 So it's going to rename that reference as well. 559 00:29:36,112 --> 00:29:38,882 and you can do this a lot more flexibly as opposed to if you just see the low 560 00:29:38,882 --> 00:29:42,402 level intent and have to sort of, or the low level operations and sort of have to 561 00:29:42,652 --> 00:29:44,542 guess what intent that corresponded to. 562 00:29:44,935 --> 00:29:47,475 So there's a few systems along these lines. 563 00:29:47,745 --> 00:29:51,735 So one of them, which I link here, which is not as well known is called Actyx. 564 00:29:51,975 --> 00:29:58,110 It's actually a company in Europe, which does, Like iot, coordination in factories. 565 00:29:58,290 --> 00:30:01,500 So if you have some, you know, robots moving around a factory floor, they're 566 00:30:01,500 --> 00:30:05,070 talking to each other over the local network and they might say things like, 567 00:30:05,340 --> 00:30:09,060 oh, someone needs to go pick up this box and move it from point A to point B. 568 00:30:09,390 --> 00:30:11,550 one of the robots can say, okay, I'm going to go pick up, 569 00:30:11,550 --> 00:30:12,750 pick up this box and move it. 570 00:30:13,020 --> 00:30:15,510 And that way the other robots know not to move it themselves. 571 00:30:15,900 --> 00:30:19,530 And these, these actions or messages, they just get put into a log that 572 00:30:19,530 --> 00:30:21,510 all the devices in the factory see. 573 00:30:21,870 --> 00:30:24,820 And that way they sort of know what's going on, what tasks are 574 00:30:24,820 --> 00:30:25,980 outstanding, that sort of thing. 575 00:30:26,344 --> 00:30:31,624 Right, and I think one very nice benefit of that as well, is that if there's 576 00:30:31,634 --> 00:30:37,844 some real world stuff happening, and whether in a factory a robot has moved, 577 00:30:37,874 --> 00:30:43,189 or you've now like manufactured a new part, or destroyed a certain thing. 578 00:30:43,509 --> 00:30:46,619 Now you have like a real log of those events. 579 00:30:46,629 --> 00:30:50,289 So in case something goes wrong or in case there's an audit, now you have 580 00:30:50,289 --> 00:30:52,129 some hard facts that you can look at. 581 00:30:52,369 --> 00:30:57,549 So it's not just useful for an app and a machine, but it's also useful for human 582 00:30:57,549 --> 00:31:00,179 purposes to understand what has happened. 583 00:31:00,879 --> 00:31:01,349 Exactly. 584 00:31:01,619 --> 00:31:01,789 Yeah. 585 00:31:01,789 --> 00:31:04,679 And this really feeds into the idea of business logic. 586 00:31:04,689 --> 00:31:06,709 You know, in a lot of applications, we have this. 587 00:31:07,039 --> 00:31:10,739 Business logic that we want to do in terms of, you know, what happens 588 00:31:10,749 --> 00:31:12,019 when a user clicks this button. 589 00:31:12,309 --> 00:31:15,809 And it can often be more complicated than you can express 590 00:31:15,819 --> 00:31:17,259 with simple database changes. 591 00:31:17,539 --> 00:31:20,859 And keeping these actions around gets you really first look at what the, the 592 00:31:20,859 --> 00:31:25,369 business logic was supposed to do and also have the server customize its response. 593 00:31:25,399 --> 00:31:27,889 Like you can check permissions at a very fine grained level. 594 00:31:28,229 --> 00:31:31,259 You can make decisions about, you know, bank balances going below 595 00:31:31,259 --> 00:31:32,289 zero and that sort of thing. 596 00:31:32,919 --> 00:31:35,309 yeah, sort of tossing to some of Pat Helland's articles, if you've 597 00:31:35,309 --> 00:31:38,779 seen like building on quicksand or, immutability changes everything, 598 00:31:38,829 --> 00:31:42,689 this idea of, you know, accountants don't use erasers, all those ideas. 599 00:31:43,389 --> 00:31:44,199 Yeah, exactly. 600 00:31:44,259 --> 00:31:48,899 And I think for web developers, this is also very intuitive, where if 601 00:31:48,899 --> 00:31:54,029 you build a React app, for example, and you have Some complex state 602 00:31:54,089 --> 00:31:56,449 that you express in react use state. 603 00:31:56,869 --> 00:32:02,229 And now you try to somehow do the right thing based on how the state changes 604 00:32:02,509 --> 00:32:05,559 using some react use effect, for example. 605 00:32:05,729 --> 00:32:10,119 They're like, you should use better, better mechanisms and better foundations 606 00:32:10,119 --> 00:32:15,429 for that, for example, using XState for like some, some state machines, et cetera. 607 00:32:15,439 --> 00:32:16,199 This is where you. 608 00:32:16,729 --> 00:32:21,729 Very explicitly and declaratively deal with the state changes as opposed to 609 00:32:21,915 --> 00:32:27,105 like, trying to somehow, reinterpret how some, like, nitty gritty state 610 00:32:27,115 --> 00:32:30,865 things have changed, whereas, like, if you just have a beautiful, simple 611 00:32:30,895 --> 00:32:33,105 event that is easy to understand, okay. 612 00:32:33,365 --> 00:32:34,475 That thing has changed. 613 00:32:34,485 --> 00:32:36,975 The robot has entered this room. 614 00:32:37,355 --> 00:32:40,445 that's much easier to understand than interpreting the 615 00:32:40,475 --> 00:32:42,745 coordinates of a certain thing. 616 00:32:43,319 --> 00:32:45,989 And this also feeds into features that you might want to give to your 617 00:32:45,989 --> 00:32:47,699 users, especially in productivity apps. 618 00:32:47,719 --> 00:32:50,859 You want to have that change history where you can see what was everyone doing. 619 00:32:51,189 --> 00:32:52,619 You also want to have undo's. 620 00:32:52,959 --> 00:32:56,929 basically what you can do for undo is when you create this action or this mutation 621 00:32:56,929 --> 00:33:01,069 describing, the high level intent, you can also tag along with it, a mutation saying, 622 00:33:01,079 --> 00:33:02,969 here's how to undo this operation later. 623 00:33:03,264 --> 00:33:06,134 And then you store that somewhere and then you just have a queue somewhere in 624 00:33:06,134 --> 00:33:07,544 your app that's like the action queue. 625 00:33:07,684 --> 00:33:10,794 You can go through that and undo things in a nice way. 626 00:33:11,095 --> 00:33:14,945 Hopefully, you know, the users will be happier with this than if you 627 00:33:14,945 --> 00:33:19,205 just, you know, revert states exactly, ignoring collaborators updates. 628 00:33:19,645 --> 00:33:20,035 Right. 629 00:33:20,285 --> 00:33:24,925 So, in this quadrant of the event sourcing quadrant here, there's still 630 00:33:24,935 --> 00:33:30,905 a couple of like sub cells, um, how the mutations are applied, namely the 631 00:33:31,272 --> 00:33:33,762 serializable, CRDT ish, and OT ish. 632 00:33:34,382 --> 00:33:39,152 Can you give a little bit of an intuition how they differ in the implementation and 633 00:33:39,152 --> 00:33:40,892 when you would choose one or the other? 634 00:33:41,232 --> 00:33:41,392 Yeah. 635 00:33:41,392 --> 00:33:45,232 So the examples here mostly concern text editing, which is not a coincidence. 636 00:33:45,482 --> 00:33:48,652 So in text editing, when you're doing any sort of collaborative text editing, 637 00:33:48,662 --> 00:33:52,872 like in Google Docs, you have this problem that your operation might say, I 638 00:33:52,872 --> 00:33:54,702 want to type, you know, the word hello. 639 00:33:55,192 --> 00:33:58,402 After, you know, maybe I want to type the word world after hello. 640 00:33:58,702 --> 00:34:01,302 So the, this message that you're going to send to the server might 641 00:34:01,302 --> 00:34:04,512 say something like insert world at index five, because you know, hello 642 00:34:04,512 --> 00:34:07,932 is five characters long, but someone else might also edit this world. 643 00:34:07,982 --> 00:34:09,052 Hello, or this word. 644 00:34:09,052 --> 00:34:09,562 Hello. 645 00:34:09,972 --> 00:34:12,282 Before your change makes it to the server. 646 00:34:12,652 --> 00:34:15,042 So maybe now it's like, hello there world. 647 00:34:15,102 --> 00:34:16,192 It's what you want to happen. 648 00:34:16,532 --> 00:34:19,242 But your edit is still trying to target index 5, so it's going to 649 00:34:19,252 --> 00:34:21,052 go in sort of the wrong place. 650 00:34:21,062 --> 00:34:24,372 You want it to shift over to accommodate edits that have been 651 00:34:24,462 --> 00:34:25,712 before yours in the document. 652 00:34:26,812 --> 00:34:28,932 And of course, this gets worse if you're, like, editing the bottom 653 00:34:28,932 --> 00:34:31,062 of the document, someone else is editing the paragraph on top. 654 00:34:31,322 --> 00:34:34,252 All of your array indices are going to get horribly messed up 655 00:34:34,252 --> 00:34:35,562 by the time they reach the server. 656 00:34:35,862 --> 00:34:37,102 Like, they're not going to be accurate anymore. 657 00:34:37,402 --> 00:34:41,582 So the three choices here are basically different ways to patch up those 658 00:34:41,612 --> 00:34:43,312 indices so that they make sense again. 659 00:34:43,850 --> 00:34:44,570 That makes sense. 660 00:34:44,610 --> 00:34:50,000 And, I think this is a common theme for local-first software is that 661 00:34:50,000 --> 00:34:55,080 there are a couple of like special buckets that deserve special treatment, 662 00:34:55,370 --> 00:34:58,860 namely text editing and also lists. 663 00:34:58,900 --> 00:35:02,740 And those, the, the latter two are, I think also like closely related. 664 00:35:03,250 --> 00:35:08,257 So on that note, the article, you also went, went a bit more in depth. 665 00:35:08,777 --> 00:35:13,897 on possible approaches to tame lists in this distributed setting. 666 00:35:14,377 --> 00:35:16,937 Will you mind sharing a little more context about that? 667 00:35:17,427 --> 00:35:17,837 Sure. 668 00:35:18,387 --> 00:35:22,257 Yeah, so if the list, as you said, it's hard and it's hard specifically 669 00:35:22,257 --> 00:35:25,677 because of this index problem where your obvious choice for what operations 670 00:35:25,677 --> 00:35:28,517 you're going to send over the network often don't make sense anymore by 671 00:35:28,517 --> 00:35:29,627 the time they reach the server. 672 00:35:30,197 --> 00:35:32,727 Um, and the solutions really fall into two camps. 673 00:35:33,437 --> 00:35:36,687 There's the operational transformation camp, which is used by Google Docs 674 00:35:37,007 --> 00:35:41,167 Which is where you're going to send, you know, index five, that sort of 675 00:35:41,167 --> 00:35:44,627 thing, a raw number, and the server is going to look at these, this index. 676 00:35:44,717 --> 00:35:48,427 It's going to look at all the intervening operations that arrived 677 00:35:48,477 --> 00:35:50,517 that you didn't know about, but have already reached the server. 678 00:35:50,867 --> 00:35:54,267 And it's going to sort of like walk through those one by one to try to 679 00:35:54,267 --> 00:35:56,247 figure out what index you actually meant. 680 00:35:56,247 --> 00:36:00,857 Because it's going to see, okay, if you inserted something at index five and 681 00:36:00,857 --> 00:36:04,187 three other characters have been inserted before that, I'm going to change it from 682 00:36:04,187 --> 00:36:06,167 five to eight, just adding five and three. 683 00:36:06,872 --> 00:36:07,182 Got it. 684 00:36:07,182 --> 00:36:11,939 So a very common app use case for this is, let's imagine Notion where 685 00:36:11,959 --> 00:36:16,995 on the left sidebar, you can have your, your favorite, pages pinned 686 00:36:17,285 --> 00:36:19,545 and those you control the order. 687 00:36:19,545 --> 00:36:23,585 So you can move them around or also on a Notion page, all the 688 00:36:23,585 --> 00:36:25,945 blocks you can reorder yourself. 689 00:36:26,405 --> 00:36:32,735 And a very naive approach would be, whenever you reordered something, you send 690 00:36:32,735 --> 00:36:37,054 to the server a full copy of the entire document, and that contains the order. 691 00:36:37,054 --> 00:36:41,104 But that is not very useful in the collaborative setting where 692 00:36:41,134 --> 00:36:44,634 now the merge radius of the entire thing is the document. 693 00:36:44,694 --> 00:36:48,204 And it doesn't really allow for collaboration on a per block level. 694 00:36:48,744 --> 00:36:54,414 And the another naive approach would be to send the block and say 695 00:36:54,414 --> 00:36:56,904 like, oh, now I'm at position three. 696 00:36:57,329 --> 00:37:00,705 But something else might've already, moved and it's no 697 00:37:00,705 --> 00:37:02,245 longer in reality position three. 698 00:37:02,245 --> 00:37:06,535 So this is what this is all about and, uh, the different approaches for this. 699 00:37:06,902 --> 00:37:10,902 Figma has written also a really nice blog post about this, how they, 700 00:37:11,135 --> 00:37:14,435 tamed this problem, where I think they call it fractional indexing. 701 00:37:14,995 --> 00:37:16,955 And I think you connected the dots here. 702 00:37:17,350 --> 00:37:22,240 can you, draw a line between the different approaches here, the CRDTish approach 703 00:37:22,240 --> 00:37:27,130 and the OT ish approach, and how that relates to the, the list indexing problem? 704 00:37:27,640 --> 00:37:27,790 Yeah. 705 00:37:27,790 --> 00:37:32,500 So the, the OT ish approach, that's what I was describing with, you know, you send 706 00:37:32,500 --> 00:37:35,740 index five to the server, but the server's going to rewrite it to index eight. 707 00:37:36,150 --> 00:37:38,670 So this is really this idea that the server is going to. 708 00:37:39,192 --> 00:37:42,582 Mutate your operation to try to make it still make sense. 709 00:37:43,162 --> 00:37:46,762 Then the CRDT ish approach, which is used by fractional indexing and YDS and 710 00:37:46,762 --> 00:37:50,962 those sort of things, is actually the clients, instead of sending, you know, 711 00:37:50,962 --> 00:37:54,142 index 5 to the server, they're going to rewrite this message in a way so 712 00:37:54,142 --> 00:37:57,482 that it still makes sense, even if it reaches the server a little bit late. 713 00:37:58,072 --> 00:38:02,130 So, for example, you could have, in fractional indexing, you might label 714 00:38:02,130 --> 00:38:05,460 your characters with these decimal numbers instead, where you say, like, 715 00:38:05,780 --> 00:38:09,140 the characters are at 0.1.2.3, etc. 716 00:38:09,520 --> 00:38:12,810 And then if you want to add a new character in between 0. 717 00:38:12,860 --> 00:38:13,660 4 and 0. 718 00:38:13,660 --> 00:38:14,670 5, you give it the label 0. 719 00:38:14,670 --> 00:38:15,550 45. 720 00:38:16,160 --> 00:38:19,120 So this isn't really a list index, it's what they call a fractional index. 721 00:38:19,465 --> 00:38:22,905 And the idea is that this will still go in between the characters at 0. 722 00:38:22,905 --> 00:38:22,995 4 and 0. 723 00:38:22,995 --> 00:38:27,015 5, even if some other changes happen elsewhere in the list. 724 00:38:27,275 --> 00:38:29,825 Because those other changes don't actually change your fractional index. 725 00:38:29,935 --> 00:38:33,435 You're keeping the characters at the same 0.4.5.6, etc. 726 00:38:34,019 --> 00:38:34,389 Right. 727 00:38:34,419 --> 00:38:35,499 And now the 0. 728 00:38:35,569 --> 00:38:40,409 45, this is what you use to derive the, the real. 729 00:38:40,594 --> 00:38:44,834 Integer indexes from by lexicographically ordering it. 730 00:38:45,254 --> 00:38:45,564 Got it. 731 00:38:45,564 --> 00:38:45,924 Yeah. 732 00:38:45,944 --> 00:38:51,044 So I'm using the same mechanism inspired by the ideas of like the, the Figma 733 00:38:51,044 --> 00:38:53,584 blog posts, et cetera, for Overtone. 734 00:38:53,584 --> 00:38:57,675 And I'm even using it before I started implementing syncing, just because 735 00:38:57,745 --> 00:39:03,420 I found it to be the Easiest way to keeping a list ordered in an event 736 00:39:03,420 --> 00:39:07,910 source system, since this is what I'm also already using to circumvent schema 737 00:39:07,910 --> 00:39:10,490 migrations for the, the app I'm building. 738 00:39:10,510 --> 00:39:15,460 So it's, I think it's actually a very simple self contained concept that can 739 00:39:15,460 --> 00:39:20,470 be applied even outside of the scope of a full blown local-first data stack. 740 00:39:20,790 --> 00:39:21,480 Yes, exactly. 741 00:39:21,780 --> 00:39:22,000 Yeah. 742 00:39:22,000 --> 00:39:22,940 It turns out so what. 743 00:39:23,385 --> 00:39:27,055 Text editing CRDTs are doing is very similar to factional indexing, 744 00:39:27,155 --> 00:39:30,545 just with some extra changes to solve some bugs, basically, like 745 00:39:30,545 --> 00:39:33,005 what happens if two people try to insert a character at the same place. 746 00:39:33,375 --> 00:39:36,895 Factional indexing breaks down, CRDTs just have the smallest change 747 00:39:36,895 --> 00:39:38,335 needed to make this not break down. 748 00:39:38,671 --> 00:39:41,581 I agree with your point that this isn't really a collaborative thing. 749 00:39:41,581 --> 00:39:43,531 This is just a general data structures thing. 750 00:39:43,681 --> 00:39:48,771 It's like the way we describe text and as an array is sort of flawed because 751 00:39:48,905 --> 00:39:52,385 array indexes are changing all the time, even though the character is staying 752 00:39:52,385 --> 00:39:54,605 the same and staying in the same place. 753 00:39:54,785 --> 00:39:55,445 Intuitive sense. 754 00:39:56,355 --> 00:39:59,265 So what we really want is an abstraction where the characters keep the same 755 00:39:59,345 --> 00:40:03,275 identifier at all times, whether that's a fractional index or whether it's part of 756 00:40:03,285 --> 00:40:07,675 the list CRDT internals, and then that's how we should represent sequences that 757 00:40:07,675 --> 00:40:11,385 can move around, which is basically any list in a GUI where you can drag something 758 00:40:11,385 --> 00:40:12,705 in between two existing elements. 759 00:40:13,293 --> 00:40:14,433 That makes a lot of sense. 760 00:40:14,703 --> 00:40:19,023 And what's also so cool about like seeing all of the different options in this 761 00:40:19,023 --> 00:40:23,743 classification table is that you don't have to choose exactly one for your app. 762 00:40:24,080 --> 00:40:28,190 what I'm planning to do for, for Overtone is mostly follow the event 763 00:40:28,190 --> 00:40:30,500 sourcing idea for collaborative state. 764 00:40:30,870 --> 00:40:33,610 However, in the places where I have. 765 00:40:33,835 --> 00:40:39,488 complex, particular problems such as a description text or like a document 766 00:40:39,498 --> 00:40:43,628 text, this is where I most likely will resort to something like Automerge 767 00:40:43,638 --> 00:40:49,038 or Yjs to let those technologies deal with the text editing, the 768 00:40:49,088 --> 00:40:50,998 collaborative text editing use case. 769 00:40:51,398 --> 00:40:57,023 But, and with that, I'm gonna I think I get the best of both worlds where I get 770 00:40:57,033 --> 00:41:02,263 all the benefits from event sourcing for the, the more high level data structure 771 00:41:02,263 --> 00:41:08,803 of my app and for the specificness of the text editing, I embed a little CRDT 772 00:41:08,813 --> 00:41:13,743 use case in the broader document use case that I tame with event sourcing. 773 00:41:14,043 --> 00:41:16,243 Do you think that general approach makes sense? 774 00:41:16,601 --> 00:41:18,261 Yes, that's exactly the way to do it. 775 00:41:18,581 --> 00:41:20,781 Yeah, if you look, there's a lot of, you know, blog posts saying 776 00:41:20,791 --> 00:41:23,641 about how CRDTs are complicated or they're hard to implement. 777 00:41:23,841 --> 00:41:27,061 Usually these blog posts are talking specifically about the text editing part. 778 00:41:27,431 --> 00:41:29,561 That's sort of the hard part where you want to let someone else do 779 00:41:29,561 --> 00:41:32,401 it and have their nice battle tested, fuzz tested implementation. 780 00:41:32,401 --> 00:41:37,265 But for other data structures, like if you have, you know, sort of a database table 781 00:41:37,265 --> 00:41:42,005 sort of structure or a map structure, it's easier to make your own sync engine for 782 00:41:42,005 --> 00:41:45,845 that and just drop in an existing library to handle the lists and text editing. 783 00:41:46,168 --> 00:41:46,648 Right. 784 00:41:46,748 --> 00:41:52,728 So it's funny that you came from like going super deep on CRDTs of like spanning 785 00:41:52,738 --> 00:41:55,755 this, broader table of possibilities. 786 00:41:56,055 --> 00:41:58,975 And it seems like now you're actually much more drawn. 787 00:41:59,145 --> 00:42:02,805 So the first quadrant around event sourcing, what, led 788 00:42:02,945 --> 00:42:04,735 to, to this interest for you? 789 00:42:05,088 --> 00:42:05,688 Let's see. 790 00:42:05,688 --> 00:42:08,278 So it might just be, you know, the grass is greener on the other side. 791 00:42:08,308 --> 00:42:11,398 I haven't tried to make an app or a library using the 792 00:42:11,398 --> 00:42:12,508 event sourcing approach yet. 793 00:42:12,518 --> 00:42:13,818 So maybe I just don't know what's wrong with it. 794 00:42:14,268 --> 00:42:16,568 but it really started out about a year ago. 795 00:42:16,618 --> 00:42:19,018 I was thinking about version control. 796 00:42:19,138 --> 00:42:21,628 This was around the same time that Ink and Switch was thinking about version 797 00:42:21,628 --> 00:42:24,051 control with their, upwelling essay. 798 00:42:24,431 --> 00:42:28,191 and the idea was like, what if we could do this Git style model where 799 00:42:28,321 --> 00:42:31,581 you make changes to an app, like a text document or a spreadsheet. 800 00:42:31,721 --> 00:42:35,731 We just put these into linear branches, and then when we merge them, you 801 00:42:35,731 --> 00:42:37,331 copy from one branch to another. 802 00:42:37,891 --> 00:42:40,741 And originally, the idea is we're going to put CRDT operations in these 803 00:42:40,741 --> 00:42:43,751 branches, because that's what I'm familiar with, but I eventually realized like, 804 00:42:43,841 --> 00:42:47,931 actually, because the branches put the operations in a total order anyway, we 805 00:42:47,931 --> 00:42:51,371 don't care about the CRDT correctness properties that say that you can 806 00:42:51,421 --> 00:42:53,031 apply operations in different orders. 807 00:42:53,531 --> 00:42:55,711 So we might as well just use arbitrary operations. 808 00:42:55,871 --> 00:42:59,041 And that unlocks a whole lot of possibilities that would have been 809 00:42:59,051 --> 00:43:03,171 hard to do in a CRDT system, like you can do these rename variable or find 810 00:43:03,171 --> 00:43:07,401 and replace operations, maybe even like a change tone with AI operation. 811 00:43:07,731 --> 00:43:11,061 Just put these in a log, have the log be in a fixed order, and 812 00:43:11,061 --> 00:43:12,401 run the operations in that order. 813 00:43:12,916 --> 00:43:13,676 That makes sense. 814 00:43:13,940 --> 00:43:20,280 so aside from the versioning use case, can you think how, using a CRDT approach 815 00:43:20,290 --> 00:43:26,420 versus an event sourcing approach might be a good or a bad fit for different 816 00:43:26,590 --> 00:43:28,660 categories of apps that you can think of? 817 00:43:29,220 --> 00:43:29,650 Sure. 818 00:43:29,995 --> 00:43:33,105 Yeah, so I think the advantages of a CRDT approach, well first off, 819 00:43:33,125 --> 00:43:34,845 you can do this more database model. 820 00:43:34,865 --> 00:43:37,975 If I'm going to put my data in a magic box that says database, and 821 00:43:37,975 --> 00:43:39,995 it's going to synchronize it for me, I don't have to worry about it. 822 00:43:39,995 --> 00:43:42,631 Whereas you're doing an event sourcing approach, you have to think 823 00:43:42,631 --> 00:43:46,021 more carefully about what are my mutations that I'm sending around? 824 00:43:46,021 --> 00:43:47,131 How do I process them? 825 00:43:47,221 --> 00:43:50,321 How do I make sure that they still make sense, even if someone else's 826 00:43:50,321 --> 00:43:51,671 mutation reach the server first? 827 00:43:52,291 --> 00:43:53,671 So that's a bit harder. 828 00:43:54,031 --> 00:43:56,941 the other advantage of CRDTs is the efficiency perspective. 829 00:43:57,001 --> 00:44:01,591 You can have, the CRDTs can implement operations in a very efficient way so 830 00:44:01,591 --> 00:44:06,121 that you're not going to accidentally say, you know, I'm sending this mutation 831 00:44:06,121 --> 00:44:07,176 to the server that's going to take. 832 00:44:07,638 --> 00:44:10,248 an entire second to process is going to slow everyone down. 833 00:44:10,598 --> 00:44:14,198 It's sort of the, the general trade offs that CRDTs behave more like a database. 834 00:44:14,418 --> 00:44:17,438 They, they just work and they're optimized to be fast. 835 00:44:17,753 --> 00:44:21,343 Which, with an event sourcing model, you get flexibility. 836 00:44:21,843 --> 00:44:25,213 You can send arbitrary mutations around, you can have arbitrary business 837 00:44:25,213 --> 00:44:28,873 logic on the server, it can even differ from the logic on the clients. 838 00:44:29,123 --> 00:44:32,592 Just coming back to the video game example, you have a lot of logic that 839 00:44:32,593 --> 00:44:34,943 the server needs to step through, checking permissions, checking 840 00:44:34,943 --> 00:44:35,963 collisions, that sort of thing. 841 00:44:36,448 --> 00:44:39,688 Which would be hard to do with a CRDT or with a database model. 842 00:44:40,198 --> 00:44:45,098 So you mentioned that you haven't yet built larger systems with the 843 00:44:45,108 --> 00:44:48,818 event sourcing approach, but I think you've still done a little 844 00:44:48,818 --> 00:44:52,998 bit of research on what might await you in the event sourcing world. 845 00:44:53,418 --> 00:44:57,268 So could you outline a little bit of like the potential concerns 846 00:44:57,268 --> 00:45:01,118 you see on the horizon when going all in on event sourcing? 847 00:45:01,534 --> 00:45:06,374 Yeah, so I guess the main concern always is if you're Sending around 848 00:45:06,374 --> 00:45:08,924 this log of events to clients. 849 00:45:09,204 --> 00:45:13,138 And if you're storing this as your single source of truth, then 850 00:45:13,208 --> 00:45:15,538 storing all these events forever, it might take up a lot of space. 851 00:45:15,988 --> 00:45:20,298 If you could imagine a text document, if each text character corresponds to 100 852 00:45:20,308 --> 00:45:25,168 bytes of JSON, then the history of all the events is going to be a hundred times 853 00:45:25,168 --> 00:45:26,628 bigger than the actual text document. 854 00:45:26,858 --> 00:45:29,608 Even if you've since cleared out the entire text document, now it's empty. 855 00:45:29,718 --> 00:45:30,438 You still have all this state. 856 00:45:30,641 --> 00:45:35,421 So that's the main challenge is just how do we store the events efficiently, how 857 00:45:35,421 --> 00:45:38,471 do we maybe compact them, say I don't need these events anymore, I'm going to 858 00:45:38,471 --> 00:45:41,831 throw them away and replace the state, while still making that play nicely 859 00:45:42,071 --> 00:45:45,191 with, you know, clients who have been offline for a month, that sort of thing. 860 00:45:45,571 --> 00:45:49,831 Which sort of mechanisms do you think will mostly help to 861 00:45:49,871 --> 00:45:51,391 overcome some of those issues? 862 00:45:51,954 --> 00:45:56,634 I'm hoping the main mechanism is just To give up, basically say text is 863 00:45:56,634 --> 00:46:01,414 very small for any, the main sources of lots of data in your app are 864 00:46:01,454 --> 00:46:04,984 blobs like images or videos, which you can put somewhere else anyway. 865 00:46:05,214 --> 00:46:08,424 And then for the actual event describing the fine grained changes, just store 866 00:46:08,424 --> 00:46:11,274 them all and it's only going to be a few megabytes per document anyway. 867 00:46:11,334 --> 00:46:13,604 Got it. 868 00:46:13,644 --> 00:46:13,894 Yeah. 869 00:46:13,894 --> 00:46:17,584 And I think on top of that, there's also the compaction use case. 870 00:46:17,904 --> 00:46:21,564 Now that I have a little bit more, insight on, on that 871 00:46:21,564 --> 00:46:23,414 approach with building Overtone. 872 00:46:23,744 --> 00:46:28,234 for example, given that everything you do within Overtone, whether it's playing 873 00:46:28,234 --> 00:46:31,951 a track, whether it's navigating within the app, whether it's adding a track 874 00:46:31,951 --> 00:46:39,024 to your playlist or follow an artist, all of those are an event and Adding 875 00:46:39,114 --> 00:46:45,864 a track to a playlist, there you do a lot less of those than, for example, 876 00:46:45,894 --> 00:46:51,614 in the background, the app auto playing the next track, which is also an event. 877 00:46:52,024 --> 00:46:58,954 And another kind of event is if the app tries to authenticate with a music service 878 00:46:58,974 --> 00:47:04,739 such as Spotify to exchange tokens, which it needs to do at least Once an hour. 879 00:47:05,019 --> 00:47:07,519 So it does so a little bit ahead of time. 880 00:47:07,829 --> 00:47:11,163 So, also when you reload the app, it needs to do that. 881 00:47:11,783 --> 00:47:18,501 So just by the fact by, the app running in the background over time, it Racks 882 00:47:18,511 --> 00:47:20,591 up quite a lot of different events. 883 00:47:21,061 --> 00:47:25,128 And I think they're the interesting part is the nature of the events 884 00:47:25,158 --> 00:47:28,548 and the nature of those events also allows for different trade offs. 885 00:47:28,948 --> 00:47:33,328 So me putting a track into a playlist, A, there's going to be 886 00:47:33,328 --> 00:47:35,178 like way fewer events of those. 887 00:47:35,558 --> 00:47:38,278 and it's fine to keep the entire history of this around. 888 00:47:38,298 --> 00:47:40,648 What's so cool about this also, the fact. 889 00:47:41,008 --> 00:47:46,481 That, I have this event allows me to trivially implement a feature like that. 890 00:47:46,481 --> 00:47:51,351 I can hover over the track and I see the information when was it added by 891 00:47:51,351 --> 00:47:53,441 whom was it added to, to the playlist. 892 00:47:53,811 --> 00:47:59,721 It also makes implementing things such as undo much easier, but the other kind 893 00:47:59,721 --> 00:48:05,681 of events, which might be implicit or which might just be a lot more, higher 894 00:48:05,681 --> 00:48:11,348 quantity, what I've seen is that, it's not as crucial to keep those events 895 00:48:11,368 --> 00:48:17,158 around for eternity, but some of those events are then also made irrelevant by 896 00:48:17,228 --> 00:48:19,788 follow up events of the, the same type. 897 00:48:20,138 --> 00:48:24,508 So for example, if your app has authenticated and overrides sort of like 898 00:48:24,508 --> 00:48:27,618 an off state into the database, and. 899 00:48:27,901 --> 00:48:31,611 two hours later, it has already done so 10 more times. 900 00:48:31,651 --> 00:48:35,921 I don't need to keep the entire history before that, maybe besides auditing 901 00:48:35,921 --> 00:48:41,871 reasons, so I can just at some point remove the old events, which keeps 902 00:48:41,871 --> 00:48:47,391 an otherwise always growing event log at a, for this given event type 903 00:48:47,671 --> 00:48:51,779 at a much more like constant size, which makes it much more feasible. 904 00:48:52,329 --> 00:48:57,056 Another thing that I, started thinking about is like, what if you have not 905 00:48:57,056 --> 00:49:01,406 just like one event log, but what if you have multiple event logs? 906 00:49:01,426 --> 00:49:04,339 And what if you have, a hierarchy of event logs? 907 00:49:04,599 --> 00:49:08,032 This is something that I also want to think a little bit more about, Let's 908 00:49:08,032 --> 00:49:13,102 say you have a, a tree of, playlists, like a, a folder of playlists. 909 00:49:13,102 --> 00:49:14,902 So you have a, a playlist. 910 00:49:15,232 --> 00:49:19,642 And that playlist could also, possibly be a folder of other playlists. 911 00:49:20,002 --> 00:49:23,152 So now what does the event log exist for? 912 00:49:23,422 --> 00:49:26,452 Does it exist for like, everything in my library? 913 00:49:26,722 --> 00:49:30,322 Does it exist for a broken down to. 914 00:49:30,782 --> 00:49:34,332 only giving information about which playlists I have, and then I need to 915 00:49:34,332 --> 00:49:38,032 subscribe to another playlist, but what if that playlist is a folder? 916 00:49:38,292 --> 00:49:42,192 So this hierarchical aspect of it, I think this will keep me busy 917 00:49:42,282 --> 00:49:43,562 for, for a little bit as well. 918 00:49:43,982 --> 00:49:46,162 Do you have thoughts on those problems? 919 00:49:46,472 --> 00:49:48,622 Yeah, I mean, this, the, what you're saying is really interesting. 920 00:49:48,682 --> 00:49:51,892 It makes me think of the problem of ephemeral presence. 921 00:49:52,212 --> 00:49:54,902 So, you know, in Figma, when your collaborators are moving their 922 00:49:54,902 --> 00:49:57,542 mouse cursors around, you can see where they're at it every time. 923 00:49:58,032 --> 00:50:01,172 I would imagine Figma is not actually persisting those mouse movements, 924 00:50:01,182 --> 00:50:04,382 it's just sending them over the usual channels so that you can see them live, 925 00:50:04,662 --> 00:50:07,282 but then you forget about these events because they don't matter anymore. 926 00:50:07,282 --> 00:50:11,232 So I wonder if you could maybe do that for a lot of the events that don't 927 00:50:11,232 --> 00:50:13,409 matter as much, or even in a text editor. 928 00:50:13,429 --> 00:50:17,919 So one thing that's really hard with a collaborative text editor is you'd like it 929 00:50:17,919 --> 00:50:21,479 so that whenever you press a key, that key is immediately sent to your collaborators. 930 00:50:21,834 --> 00:50:24,664 But if that actually creates an event that's persisted in the log, then you have 931 00:50:24,664 --> 00:50:28,604 this issue of, you know, 100 times as much storage as key presses, but maybe what 932 00:50:28,604 --> 00:50:32,054 you could say is when you press a key, that's like an ephemeral presence message. 933 00:50:32,074 --> 00:50:34,134 It's not actually stored, it's just sent over the same 934 00:50:34,144 --> 00:50:35,424 channel as the mouse movements. 935 00:50:36,029 --> 00:50:40,169 And this is sort of like an ephemeral mini log that's stacked on top of the actual 936 00:50:40,169 --> 00:50:44,199 event log, and then every 10 seconds or so you send a compacted version of 937 00:50:44,199 --> 00:50:47,709 like the entire sentence that the person typed as a single event, and that's 938 00:50:47,719 --> 00:50:49,329 what's actually stored on the backend. 939 00:50:49,859 --> 00:50:52,579 I wonder if that could help at all, or if this is even possible to implement. 940 00:50:52,989 --> 00:50:53,329 Right. 941 00:50:53,369 --> 00:50:57,009 I've actually implemented a small version of that already, 942 00:50:57,039 --> 00:50:59,349 which I call local only events. 943 00:50:59,729 --> 00:51:05,489 The idea of that is that, there's kind of like hierarchies of syncing as well. 944 00:51:05,489 --> 00:51:11,309 There's like syncing, just from the main thread to the workers thread, which is 945 00:51:11,319 --> 00:51:18,119 responsible for persisting the data, but also from one tab to another tab. 946 00:51:18,619 --> 00:51:23,259 And, those two tabs should in some regards, Converge, and in 947 00:51:23,259 --> 00:51:25,599 some regards, allow divergence. 948 00:51:25,956 --> 00:51:32,516 so for example, if you have Notion open in two tabs, you want to be able to navigate 949 00:51:32,586 --> 00:51:36,896 to different documents and those different tabs, but if you're in the same document, 950 00:51:36,906 --> 00:51:38,556 you probably want to see the same thing. 951 00:51:38,566 --> 00:51:41,106 So it's the same that applies to a music app. 952 00:51:41,106 --> 00:51:43,246 Maybe in one tab you want to have. 953 00:51:43,521 --> 00:51:48,481 The playback of one track and the another one, you want to not have the same 954 00:51:48,481 --> 00:51:50,031 playback, otherwise you hear it twice. 955 00:51:50,404 --> 00:51:53,044 but you want to maybe work on a playlist. 956 00:51:53,504 --> 00:51:58,424 And so keeping things in sync is important, but I don't want to, 957 00:51:58,634 --> 00:52:02,864 constantly as the playback progresses, have persistent events for this. 958 00:52:02,864 --> 00:52:07,764 So I try to A, have like, very Deliberately small events. 959 00:52:08,104 --> 00:52:12,294 And the other thing is where I have events that are broadcasted around. 960 00:52:12,734 --> 00:52:16,764 But, if the app reloads, it doesn't rehydrate from those. 961 00:52:16,964 --> 00:52:21,224 It either catches them midway or it's not important enough. 962 00:52:21,516 --> 00:52:25,206 that it shows it so very similar to the presence feature in Figma. 963 00:52:25,206 --> 00:52:29,076 So I have implemented a first version of this, but I think there can be 964 00:52:29,076 --> 00:52:34,296 use cases where you might want to keep them around for like 10 minutes 965 00:52:34,296 --> 00:52:37,746 or 10 seconds, like you say, and then have a version of compaction. 966 00:52:37,776 --> 00:52:39,766 I think that that's really interesting. 967 00:52:40,314 --> 00:52:41,594 What you're describing sounds really cool. 968 00:52:41,604 --> 00:52:43,154 I'll be interested to see this code someday. 969 00:52:43,701 --> 00:52:47,011 I'm planning to open source it a little bit further down the road. 970 00:52:47,314 --> 00:52:50,984 So you've now been in the local-first space for over five 971 00:52:50,984 --> 00:52:56,154 years, and I'm sure you've seen many technologies come along over time. 972 00:52:56,224 --> 00:53:00,164 I'm curious whether you have certain strong opinions about the local-first 973 00:53:00,184 --> 00:53:02,594 space or the web ecosystem more broadly. 974 00:53:02,981 --> 00:53:03,961 Yes, I guess one. 975 00:53:04,176 --> 00:53:06,956 Well, this isn't really an opinion, but just I'll make an observation that the 976 00:53:06,956 --> 00:53:11,306 local-first movement has really exploded just within the past 12 or 18 months. 977 00:53:11,726 --> 00:53:15,896 Like, starting out five years ago reading CRDT papers and going to CRDT 978 00:53:15,896 --> 00:53:19,236 conferences, it was much more, you know, mellow academic atmosphere. 979 00:53:19,526 --> 00:53:21,716 But now there's just so many tools popping up, I can't keep 980 00:53:21,716 --> 00:53:23,276 track of them in my browser tabs. 981 00:53:23,542 --> 00:53:25,182 you know, the local-first discord, all that stuff. 982 00:53:25,492 --> 00:53:26,482 Just a lot more activity. 983 00:53:26,802 --> 00:53:29,482 So it's both exciting and also a bit scary, because now I can't read all 984 00:53:29,482 --> 00:53:30,802 the papers that come out anymore. 985 00:53:31,112 --> 00:53:35,042 yeah, in terms of opinions, I guess the The strong opinion I've had in the 986 00:53:35,042 --> 00:53:41,002 past year or so is that the local-first ideal, I think, is too hard right now. 987 00:53:41,072 --> 00:53:43,842 There's just too many problems we'd have to solve to actually make like 988 00:53:43,842 --> 00:53:47,332 a local-first app where the hosting provider can go away and you'll still be 989 00:53:47,332 --> 00:53:48,792 able to collaborate and keep your data. 990 00:53:49,232 --> 00:53:53,877 So the problem that I've been focusing on for the past year is the narrow 991 00:53:53,917 --> 00:53:58,087 goal, like the baby step, of how do we make traditional central server SaaS 992 00:53:58,127 --> 00:54:01,724 collaboration easier to implement, and maybe a bit easier to deploy. 993 00:54:02,464 --> 00:54:05,404 So that's working on primitives like what you were describing with LiveStore. 994 00:54:05,434 --> 00:54:10,082 We want some way to have events that you send around and persist IndexedDB. 995 00:54:10,364 --> 00:54:13,314 broadcast channel between different tabs and then eventually send it 996 00:54:13,314 --> 00:54:15,814 to a server that stores them and broadcasts them back to the client. 997 00:54:16,154 --> 00:54:19,674 Just make some really good implementation of that that people can reuse so they 998 00:54:19,704 --> 00:54:21,554 don't have to reinvent it every time. 999 00:54:22,097 --> 00:54:22,967 and I think that'll be. 1000 00:54:23,232 --> 00:54:26,552 Both useful for, you know, developers and also a good stepping stone 1001 00:54:26,592 --> 00:54:29,242 towards the eventual goal of we want to get rid of this server and 1002 00:54:29,242 --> 00:54:30,982 have our, have our data forever. 1003 00:54:31,632 --> 00:54:34,212 I love that observation, and that opinion. 1004 00:54:34,212 --> 00:54:38,072 I think that's also one of my key takeaways from talking to many folks 1005 00:54:38,372 --> 00:54:42,982 at the local-first conference we had this year in Berlin, where Everyone 1006 00:54:42,992 --> 00:54:48,322 gets excited about all the goals and all the ideals of local-first, but 1007 00:54:48,532 --> 00:54:54,352 going after a few of those already is technically very complicated. 1008 00:54:54,582 --> 00:54:58,982 And then going like all the way to making sure that the software still 1009 00:54:58,982 --> 00:55:01,462 works if the vendor goes away, etc. 1010 00:55:01,982 --> 00:55:06,972 That is, I think, right now achieved by only a very, very few 1011 00:55:07,032 --> 00:55:09,572 set of products and technologies. 1012 00:55:09,852 --> 00:55:13,422 I hope that in five years from now, it will be table stakes. 1013 00:55:13,772 --> 00:55:17,572 But, I think it's a little bit like Maslow's hierarchy of needs. 1014 00:55:17,832 --> 00:55:21,732 And like we, here we have like the hierarchy of ideals and we haven't, Yet 1015 00:55:21,802 --> 00:55:26,992 quite made it as easy to achieve all of it, hopefully we'll, we'll get closer 1016 00:55:26,992 --> 00:55:29,012 to that over the next couple of years. 1017 00:55:29,522 --> 00:55:33,382 So those technologies that you've, now mentioned, is there anything 1018 00:55:33,382 --> 00:55:37,122 that you're working on that can be looked at by other people? 1019 00:55:37,459 --> 00:55:37,899 Let's see. 1020 00:55:37,899 --> 00:55:42,049 So the main project I've had recently is it's a library called list-positions. 1021 00:55:42,069 --> 00:55:44,924 So you can read about it on my blog post or look at the docs on GitHub. 1022 00:55:45,214 --> 00:55:49,824 But it's basically trying to solve this fractional index generalization problem. 1023 00:55:49,984 --> 00:55:52,984 You can think of it like a fractional index library that also 1024 00:55:52,984 --> 00:55:56,924 implements the extra features that CRDTs have to prevent some bugs. 1025 00:55:57,226 --> 00:56:01,476 The idea is that you can use this as a drop in part to do just the text and 1026 00:56:01,516 --> 00:56:06,096 list collaboration in some arbitrary data structure . So I built examples on top 1027 00:56:06,106 --> 00:56:09,252 of Triplit, Electric SQL, Replicache. 1028 00:56:09,402 --> 00:56:11,802 So these are our collaborative data stores that don't talk 1029 00:56:11,812 --> 00:56:12,857 about lists or texts at all. 1030 00:56:13,137 --> 00:56:15,257 They're basically syncing maps or database tables. 1031 00:56:15,547 --> 00:56:18,081 And I said here, if we just stick these souped up fractional 1032 00:56:18,081 --> 00:56:21,491 indices on top, we can actually do text to rich text collaboration. 1033 00:56:21,831 --> 00:56:23,021 So that's, that's been my focus. 1034 00:56:23,384 --> 00:56:24,414 Very interesting. 1035 00:56:24,484 --> 00:56:25,734 I will check this out. 1036 00:56:25,754 --> 00:56:27,884 Maybe I can use it for Overtone. 1037 00:56:27,894 --> 00:56:30,124 Maybe I could even integrate it with LiveStore. 1038 00:56:30,494 --> 00:56:34,024 I will certainly check this out and we'll put the link in the show notes. 1039 00:56:34,196 --> 00:56:34,606 Great. 1040 00:56:34,666 --> 00:56:37,666 Matthew, is there anything else you want to share with the audience? 1041 00:56:38,319 --> 00:56:38,979 No, I don't think so. 1042 00:56:39,179 --> 00:56:40,259 It's been a really good chat. 1043 00:56:40,472 --> 00:56:44,022 Thank you so much for sharing all of your knowledge about different 1044 00:56:44,022 --> 00:56:46,202 approaches to syncing state. 1045 00:56:46,486 --> 00:56:49,676 I think this is the most in depth we've gone on those topics so 1046 00:56:49,676 --> 00:56:53,406 far, and it provided a brilliant overview for future conversations. 1047 00:56:53,906 --> 00:56:57,346 Has helped me a ton to, to better understand this, both your blog 1048 00:56:57,346 --> 00:56:58,926 posts as well as this conversation. 1049 00:56:59,306 --> 00:57:02,776 So thank you so much for taking time today and coming on to chat. 1050 00:57:03,087 --> 00:57:04,137 Yeah, thanks so much for having me. 1051 00:57:04,670 --> 00:57:07,080 Thank you for listening to the local-first FM podcast. 1052 00:57:07,310 --> 00:57:10,820 If you've enjoyed this episode and haven't done so already, please subscribe and 1053 00:57:10,820 --> 00:57:12,380 leave a review wherever you're listening. 1054 00:57:12,750 --> 00:57:14,750 Please also share this episode with others. 1055 00:57:15,050 --> 00:57:17,850 Spreading the word about the podcast is a great way to 1056 00:57:17,850 --> 00:57:19,450 support it and to keep it going. 1057 00:57:19,920 --> 00:57:23,930 A special thanks again to Rocicorp and Expo for supporting this podcast. 1058 00:57:24,100 --> 00:57:24,890 See you next time.