1 00:00:00,000 --> 00:00:03,770 we've restricted ourselves down to making things look like access 2 00:00:03,770 --> 00:00:06,134 control lists, on the outside. 3 00:00:06,411 --> 00:00:10,191 and so it should feel very, very similar to doing things with role 4 00:00:10,191 --> 00:00:13,231 based access control using, say, OAuth. 5 00:00:14,971 --> 00:00:17,208 That should all feel totally normal. 6 00:00:17,408 --> 00:00:20,088 You shouldn't really have to think about it in any special way. 7 00:00:20,088 --> 00:00:22,788 In the same way that, you know, if you have a sync server, other than having 8 00:00:22,788 --> 00:00:26,818 to set up the sync server, or maybe you pointed at an existing one, knowing that 9 00:00:26,818 --> 00:00:30,138 it's there doesn't mean that you have to, like, design it from first principles. 10 00:00:30,755 --> 00:00:32,905 Welcome to the localfirst.fm podcast. 11 00:00:33,135 --> 00:00:35,945 I'm your host, Johannes Schickling, and I'm a web developer, a 12 00:00:35,945 --> 00:00:38,845 startup founder, and love the craft of software engineering. 13 00:00:39,295 --> 00:00:42,935 For the past few years, I've been on a journey to build a modern, high quality 14 00:00:42,935 --> 00:00:44,905 music app using web technologies. 15 00:00:45,115 --> 00:00:48,835 And in doing so, I've fallen down the rabbit hole of local-first software. 16 00:00:49,580 --> 00:00:52,620 This podcast is your invitation to join me on that journey. 17 00:00:53,290 --> 00:00:57,030 In this episode, I'm speaking to Brooklyn Zelenka, a local-first 18 00:00:57,050 --> 00:01:01,430 researcher and creator of various projects, including UCAN and Beehive. 19 00:01:01,950 --> 00:01:05,680 In this conversation, we go deep on authorization and access control 20 00:01:05,950 --> 00:01:10,080 in a local-first decentralized environment and explore this topic 21 00:01:10,110 --> 00:01:12,150 by learning about UCAN and Beehive. 22 00:01:12,755 --> 00:01:17,265 Later, we are also diving into Beelay, a new generic sync server implementation 23 00:01:17,485 --> 00:01:18,875 developed by Ink and Switch. 24 00:01:19,505 --> 00:01:23,545 Before getting started, also a big thank you to Convex and Electric 25 00:01:23,545 --> 00:01:25,645 SQL for supporting this podcast. 26 00:01:26,065 --> 00:01:28,035 And now my interview with Brooklyn. 27 00:01:28,964 --> 00:01:31,004 Hey Brooke, so nice to have you on the show. 28 00:01:31,014 --> 00:01:31,774 How are you doing? 29 00:01:32,166 --> 00:01:32,726 I'm doing great. 30 00:01:32,726 --> 00:01:33,856 Super excited to be here. 31 00:01:33,876 --> 00:01:35,621 I'm glad that we, made this happen. 32 00:01:35,946 --> 00:01:36,856 Thanks so much for having me. 33 00:01:37,508 --> 00:01:42,430 I was really looking forward to this episode and honestly, I was quite nervous 34 00:01:42,450 --> 00:01:47,150 because this is certainly bringing me to an aspect of local-first where I have 35 00:01:47,160 --> 00:01:49,435 much less first hand experience myself. 36 00:01:49,845 --> 00:01:54,973 I think overall local-first is a big frontier of pushing the boundaries, what's 37 00:01:54,983 --> 00:01:57,113 possible technologically, et cetera. 38 00:01:57,433 --> 00:02:03,243 And you're pushing forward even a further frontier here all around local-first auth. 39 00:02:03,663 --> 00:02:07,863 So the people in the audience who are already familiar with your work, 40 00:02:08,086 --> 00:02:11,329 I'm sure they're very thrilled for you to be here, but for the folks 41 00:02:11,329 --> 00:02:14,609 who don't know who you are, would you mind giving a brief background? 42 00:02:15,479 --> 00:02:16,759 Yeah, absolutely. 43 00:02:16,986 --> 00:02:19,736 I'll maybe do in slightly reverse chronological order. 44 00:02:20,026 --> 00:02:25,051 So, these days I'm working on a, Auth system for local-first, mostly 45 00:02:25,051 --> 00:02:29,944 focused on Automerge called Beehive, which does both read controls with 46 00:02:29,944 --> 00:02:33,484 encryption and mutation controls with something called capabilities. 47 00:02:33,484 --> 00:02:34,644 I'm sure we'll get into that. 48 00:02:35,121 --> 00:02:40,274 Prior to this, for A little over five years, I was the, CTO 49 00:02:40,304 --> 00:02:41,374 at a company called Fission. 50 00:02:41,524 --> 00:02:44,198 so 2019, we started doing local-first, there. 51 00:02:44,198 --> 00:02:48,531 And we worked on, the stack we always called, auth, data, and compute and so 52 00:02:48,771 --> 00:02:53,398 we ranged out way ahead on, a variety of things, trying local-first, you know, 53 00:02:53,398 --> 00:02:58,328 Encrypted at rest databases databases, file system, a auth system, that has 54 00:02:58,531 --> 00:03:03,968 gotten some adoption called UCAN, and Compute Layer, IPVM and prior to that, 55 00:03:04,028 --> 00:03:08,328 I did a lot of web and was, temporarily, did work with the, Ethereum core 56 00:03:08,358 --> 00:03:11,411 development community, mostly working on the, Ethereum virtual machine. 57 00:03:11,818 --> 00:03:13,288 That is super impressive. 58 00:03:13,418 --> 00:03:18,218 I am very curious to dig into all of the parts really around 59 00:03:18,468 --> 00:03:20,038 auth, data, and compute. 60 00:03:20,354 --> 00:03:23,914 however, in this episode, I think we should keep it a bit more 61 00:03:23,914 --> 00:03:26,624 focused on particularly on auth. 62 00:03:26,714 --> 00:03:29,764 Maybe towards the end, we can also talk a bit more about compute. 63 00:03:29,961 --> 00:03:34,291 Most of the episodes we've done so far have been very centric around data. 64 00:03:34,736 --> 00:03:39,886 Only a few have been more, also exploring what auth in a local-first setting 65 00:03:39,886 --> 00:03:44,046 could look like, but I think there is no better person in the local-first space 66 00:03:44,236 --> 00:03:46,816 to really go deep on, on all things auth. 67 00:03:47,356 --> 00:03:53,196 So through your work on Fission, and previous backgrounds, et cetera, you've, 68 00:03:53,244 --> 00:03:57,983 both participated in, contributed to, and started a whole myriad of different 69 00:03:58,418 --> 00:04:02,988 projects, which are now really like on the forefront on those various fields. 70 00:04:03,628 --> 00:04:04,898 One of it is UCAN. 71 00:04:04,908 --> 00:04:08,378 You've also mentioned Beehive at Ink & Switch. 72 00:04:08,378 --> 00:04:13,998 Maybe starting with UCAN, for those of us who have no idea what UCAN, that four 73 00:04:13,998 --> 00:04:18,008 letter acronym, stands for and what it means, could you give us an introduction? 74 00:04:18,528 --> 00:04:19,248 Yeah, absolutely. 75 00:04:19,858 --> 00:04:25,496 So UCAN, U C A N, User Controlled Authorization Networks, is A way of 76 00:04:25,496 --> 00:04:30,176 doing authorization, so granting the ability to somebody else to perform 77 00:04:30,786 --> 00:04:36,036 some action on a resource, in a totally peer to peer, local-first way. 78 00:04:36,286 --> 00:04:39,316 It uses a model called Capabilities. 79 00:04:39,396 --> 00:04:43,036 So instead of having a database that lists all of the users and what 80 00:04:43,036 --> 00:04:47,673 they can do, you get certificates that are cryptographically provable. 81 00:04:48,048 --> 00:04:52,258 And so if I wanted to give you access to some resource I controlled, I 82 00:04:52,258 --> 00:04:53,698 would sign a certificate to you. 83 00:04:54,228 --> 00:04:56,828 And then if you wanted to give access to someone else, you 84 00:04:56,828 --> 00:04:58,388 would sign a certificate to them. 85 00:04:58,518 --> 00:05:01,918 And then when it came back to me, I could check that that whole chain was correct. 86 00:05:02,358 --> 00:05:05,241 And so people have used this to, do all kinds of things. 87 00:05:05,251 --> 00:05:08,601 So at Fission, we were using it for CRDTs. 88 00:05:08,641 --> 00:05:12,611 For example, there's a CRDT based file system that we had developed, 89 00:05:12,854 --> 00:05:15,154 to guard whether or not you were allowed to write into it. 90 00:05:15,616 --> 00:05:19,386 There's a bunch of teams now using it for, managing resources. 91 00:05:19,456 --> 00:05:20,376 So, storage quotas. 92 00:05:20,396 --> 00:05:23,199 How much are you allowed to store inside of some data volume? 93 00:05:23,463 --> 00:05:26,183 and for them, it's really helpful because then they can say, Okay. 94 00:05:26,548 --> 00:05:30,538 Here's a certificate from us to, you know, say a developer, and then they can 95 00:05:31,108 --> 00:05:35,438 portion that out to all of their users without having to always register all of 96 00:05:35,438 --> 00:05:37,781 their users back to, the storage company. 97 00:05:37,977 --> 00:05:41,681 and so it can, both lower the amount of interaction that they have to do 98 00:05:41,681 --> 00:05:44,794 with, you know, registering all of these different people, but it also 99 00:05:44,804 --> 00:05:48,401 means that they can scale up, really nicely their service so as long as 100 00:05:48,401 --> 00:05:49,961 they know about the root signature. 101 00:05:50,481 --> 00:05:55,068 They can scale horizontally, very, very easily or interact with other teams very 102 00:05:55,068 --> 00:05:56,308 easily by just issuing them certificates. 103 00:05:56,468 --> 00:05:58,348 So, like, people are doing that kind of thing, 104 00:05:58,561 --> 00:06:01,911 So, you've mentioned the term capabilities before, and I think 105 00:06:01,911 --> 00:06:04,031 that's also a central part in UCAN. 106 00:06:04,553 --> 00:06:08,543 I'm most familiar with, from my more traditional background of like building 107 00:06:08,543 --> 00:06:13,313 more centralized server applications, et cetera, and how you implement auth is 108 00:06:13,323 --> 00:06:17,583 always very, very dependent on the kind of application that you want to build. 109 00:06:17,833 --> 00:06:21,303 if you want to start out a bit more easily, then you could maybe lean 110 00:06:21,303 --> 00:06:25,893 on some of the primitives that a certain technology or platform is 111 00:06:25,893 --> 00:06:29,576 giving you, maybe using Postgres and use sort of like the, role based 112 00:06:29,586 --> 00:06:33,736 access control patterns that you have in Postgres or maybe something 113 00:06:33,746 --> 00:06:36,146 even as off the shelf as Firebase. 114 00:06:36,376 --> 00:06:40,116 is this sort of like a useful mental model to think about it that you 115 00:06:40,136 --> 00:06:45,269 can gives me similar building blocks or how much more fine granular can 116 00:06:45,269 --> 00:06:47,589 I get with what UCAN offers to me? 117 00:06:48,549 --> 00:06:49,919 Yes, it's a great question. 118 00:06:50,529 --> 00:06:55,813 So, in, role based access control or any of these, access control 119 00:06:55,813 --> 00:06:58,823 list based systems, right? 120 00:06:59,019 --> 00:07:05,829 you put a database that has You know, a list of users and what they're able to do. 121 00:07:05,869 --> 00:07:07,539 So often their role, are they an admin? 122 00:07:07,549 --> 00:07:08,419 Are they a writer? 123 00:07:08,419 --> 00:07:08,999 Are they a reader? 124 00:07:08,999 --> 00:07:10,229 You know, all of these things. 125 00:07:10,689 --> 00:07:15,266 and, to update that list, you have to go to that database, update that 126 00:07:15,266 --> 00:07:19,986 database, and on every request that you make, you have to check the list. 127 00:07:20,136 --> 00:07:23,276 So sometimes we call this like, it's like having a bouncer at a club. 128 00:07:23,406 --> 00:07:25,126 You know, you show up, you show them your ID. 129 00:07:25,746 --> 00:07:27,376 They check, are you on the VIP list? 130 00:07:27,416 --> 00:07:31,742 And then you're allowed into the club or not, And what those rules are, are set by 131 00:07:31,892 --> 00:07:34,352 that, you know, by that bouncer, right? 132 00:07:34,692 --> 00:07:36,432 These are the only rules, no others. 133 00:07:36,712 --> 00:07:40,842 in a capabilities world, the analogy is, is often to having like a ticket to go 134 00:07:40,842 --> 00:07:45,046 see a movie, So, this last weekend, I went to go see Wicked, it was awesome. 135 00:07:45,269 --> 00:07:49,379 but I bought my ticket online, it showed up in my email, they didn't ID me on 136 00:07:49,379 --> 00:07:52,049 the way in, I just showed them my ticket and they're like, Oh, great, yeah. 137 00:07:52,404 --> 00:07:53,774 Theater 4, you can go in. 138 00:07:54,181 --> 00:07:56,511 so as long as I had that proof with me. 139 00:07:57,061 --> 00:07:57,841 I'm allowed in. 140 00:07:58,211 --> 00:07:59,611 They didn't have to check a list. 141 00:07:59,681 --> 00:08:01,961 There was no central place to look. 142 00:08:02,551 --> 00:08:05,001 Capabilities, are not a new model. 143 00:08:05,001 --> 00:08:07,451 They've existed for some time. 144 00:08:07,921 --> 00:08:11,891 In fact, a big part of the internet infrastructure runs on top of 145 00:08:11,891 --> 00:08:15,031 capabilities as well, or a subset of them. 146 00:08:15,098 --> 00:08:18,402 But it hasn't found its way as much into applications because we're 147 00:08:18,402 --> 00:08:19,712 so used to access control lists. 148 00:08:20,782 --> 00:08:24,552 The granularity that you mentioned before is really interesting because, 149 00:08:24,706 --> 00:08:28,006 in the capability system, anytime I make that delegation to somebody else, 150 00:08:28,006 --> 00:08:31,286 I say, you're allowed to use this thing, or then you go to somebody else 151 00:08:31,286 --> 00:08:32,656 and say, you can also use this thing. 152 00:08:33,386 --> 00:08:36,746 You can grant them the ability to see or to use that. 153 00:08:37,214 --> 00:08:38,274 or fewer capabilities. 154 00:08:38,284 --> 00:08:42,874 So if it was like, here's a terabyte of storage, you could turn around and say, 155 00:08:42,874 --> 00:08:44,724 well, here's only 50 MBs to somebody. 156 00:08:44,764 --> 00:08:47,757 And so you can get as granular as you want, with it. 157 00:08:47,937 --> 00:08:54,671 And, there's never any confusion about who's acting in what way, right? 158 00:08:54,671 --> 00:08:59,361 So in a traditional system, if we had, you know, with, with access control lists, 159 00:08:59,941 --> 00:09:01,901 you sat, you know, you ran a service. 160 00:09:02,536 --> 00:09:05,956 between the user and me, and they made a request to you. 161 00:09:05,996 --> 00:09:08,416 Well, they only have a link to you and you only have a link to me. 162 00:09:08,656 --> 00:09:12,376 So when you'd make the request to me, you'd be using your terabyte of storage. 163 00:09:12,746 --> 00:09:16,496 And so there are some cases where that can confuse the resource. 164 00:09:16,556 --> 00:09:19,316 So it's like, oh yeah, you can totally store it, you know, use a terabyte 165 00:09:19,316 --> 00:09:21,916 of storage, even though the actual user shouldn't be able to do that. 166 00:09:22,546 --> 00:09:25,016 With capabilities, we get rid of that completely. 167 00:09:25,466 --> 00:09:28,766 We have this entire chain of custody, basically, of this. 168 00:09:28,816 --> 00:09:32,026 As granular as you want to get, it's very clear on every request, 169 00:09:32,026 --> 00:09:34,086 what that request is allowed to do. 170 00:09:34,336 --> 00:09:38,326 so I think this is going to become really important for things like, LLMs and other 171 00:09:38,326 --> 00:09:43,526 sort of automated agents where you can tell it, Hey, go do things for me, but 172 00:09:43,526 --> 00:09:45,426 not with all of my rights, not as sudo. 173 00:09:46,016 --> 00:09:50,336 Only with, in this scenario for the next five minutes, these things 174 00:09:50,346 --> 00:09:51,266 are what you're allowed to do. 175 00:09:51,416 --> 00:09:54,246 And even if it hallucinates some other intention, those are the 176 00:09:54,246 --> 00:09:55,376 only things it's able to do. 177 00:09:55,859 --> 00:09:59,099 Yeah, I think this is, such an important aspect. 178 00:09:59,256 --> 00:10:06,031 since I think you don't even need to reach as far as giving agency to a An 179 00:10:06,041 --> 00:10:11,411 agent to an AI, but even if you want to go a bit more dumb and a bit more 180 00:10:11,581 --> 00:10:17,937 traditional, if you want to use some off the shelf SaaS service, and, maybe that 181 00:10:17,937 --> 00:10:20,187 thing integrates with your Google account. 182 00:10:20,532 --> 00:10:23,712 Then you also like, you need to give the thing somehow access. 183 00:10:23,722 --> 00:10:27,579 So you do like the, OAuth flow with Google and then it asks you 184 00:10:27,579 --> 00:10:30,919 like, Hey, is it okay that we have access to all of those things, 185 00:10:30,919 --> 00:10:32,639 that we can do all of those things? 186 00:10:33,049 --> 00:10:37,689 And even though Google's already offers some pretty fine granular things there, 187 00:10:37,719 --> 00:10:41,639 often I feel like, Oh, actually I want to make it even more fine granular. 188 00:10:42,019 --> 00:10:44,929 Wait, you're going to have like access to all of my emails. 189 00:10:44,949 --> 00:10:48,139 Can I maybe just give you access to my invoice emails 190 00:10:48,159 --> 00:10:49,689 if this is an invoicing thing? 191 00:10:50,089 --> 00:10:54,233 So I feel like it's both a bit overwhelming to make all of those 192 00:10:54,233 --> 00:10:59,308 decisions upfront, like what should be allowed, Both from a application end 193 00:10:59,318 --> 00:11:03,488 user perspective, me using the thing, but then particularly also from like 194 00:11:03,488 --> 00:11:05,538 an application developer perspective. 195 00:11:05,898 --> 00:11:10,641 And, yeah, it feels like a really, really important aspect of using the 196 00:11:10,641 --> 00:11:12,451 app and building, designing the app. 197 00:11:12,841 --> 00:11:18,064 And if that is not, intuitive and ergonomic, then I feel it's going 198 00:11:18,064 --> 00:11:19,604 to, everyone's going to suffer. 199 00:11:19,604 --> 00:11:23,888 The application developer, they're Probably just going to wing it, and 200 00:11:23,958 --> 00:11:30,444 that will mean probably too coarse of a, granularity for application users, etc. 201 00:11:30,444 --> 00:11:33,224 So I'm really excited that you're pushing forward on this. 202 00:11:33,404 --> 00:11:38,408 maybe also to draw the analogy, between more traditional OAuth 203 00:11:38,418 --> 00:11:40,628 flows and what UCAN is providing. 204 00:11:40,648 --> 00:11:46,081 It's, should I think about UCAN as a replacement for OAuth from like both, 205 00:11:46,209 --> 00:11:49,929 end user perspective, as well as from an application developer perspective? 206 00:11:50,551 --> 00:11:51,921 Yeah exactly. 207 00:11:52,046 --> 00:11:56,011 so the, the underlying mechanism is different, But we really wanted 208 00:11:56,011 --> 00:11:58,601 it to feel as familiar as possible. 209 00:11:59,051 --> 00:12:02,771 So even the early versions of UCAN used the same token 210 00:12:02,771 --> 00:12:04,481 format and things like this. 211 00:12:04,961 --> 00:12:08,344 We've since switched over, to some more modern formats. 212 00:12:08,744 --> 00:12:10,384 There are problems with JWTs. 213 00:12:10,744 --> 00:12:11,664 but yeah, exactly. 214 00:12:11,664 --> 00:12:15,984 You can think of it as, local-first OAuth is one way of thinking about it, exactly. 215 00:12:16,438 --> 00:12:16,958 Right. 216 00:12:17,158 --> 00:12:22,448 So as an application developer, I need to make up my mind once to 217 00:12:22,448 --> 00:12:23,958 say like, this is what's possible. 218 00:12:23,958 --> 00:12:28,128 This is what is allowed and like define, and then the system then 219 00:12:28,128 --> 00:12:32,848 enforces those rules, but often I, as an application developer get it 220 00:12:32,848 --> 00:12:38,108 wrong and I need to like, either make the rules like more permissive, 221 00:12:38,108 --> 00:12:40,463 or or less permissive over time. 222 00:12:40,743 --> 00:12:46,013 And similar to how I might get wrong a database schema and then later need 223 00:12:46,013 --> 00:12:49,999 to do those dreaded database schema migrations, what is the equivalent 224 00:12:49,999 --> 00:12:56,519 of a schema migration, but for UCAN capability definitions, etc. 225 00:12:56,847 --> 00:13:00,577 so all of the information that you need to fulfill a request in UCAN 226 00:13:00,637 --> 00:13:02,037 is contained in the token itself. 227 00:13:02,311 --> 00:13:06,447 so, these days we have a little, policy language, think of it a little bit 228 00:13:06,447 --> 00:13:08,294 like, like SAML, inside the token. 229 00:13:08,434 --> 00:13:14,664 And it says, okay, when you go to actually do something with this token, the, Action 230 00:13:14,674 --> 00:13:16,314 has to match the following criteria. 231 00:13:16,354 --> 00:13:17,287 you're sending an email. 232 00:13:17,327 --> 00:13:22,507 So the two fields has to only be two people inside of the company. 233 00:13:22,867 --> 00:13:28,331 Or, you can only send, newsletters on Mondays or whatever it is. 234 00:13:28,331 --> 00:13:28,501 Right. 235 00:13:28,511 --> 00:13:30,801 And you can scope that down arbitrarily, syntactically. 236 00:13:32,080 --> 00:13:35,236 So updating those policies is just issuing a new certificate, to say 237 00:13:35,236 --> 00:13:36,326 this is what you're allowed to do now. 238 00:13:36,606 --> 00:13:40,213 and, you know, you can revoke the old ones if that's needed. 239 00:13:40,413 --> 00:13:44,223 But I think the more interesting part of this actually is on the far other end. 240 00:13:44,323 --> 00:13:47,003 So we were talking about, you know, the developer sets these policies. 241 00:13:47,693 --> 00:13:50,733 And that's true, I would say, the majority of the time. 242 00:13:50,793 --> 00:13:55,560 But it's not very, It doesn't respect user agency, right? 243 00:13:55,580 --> 00:13:58,770 You're giving the developer all of the agency, but the user's the one 244 00:13:58,780 --> 00:14:02,700 who owns whatever, let's say that it's a text editing app, right? 245 00:14:02,720 --> 00:14:04,250 You know, so they own the document. 246 00:14:04,496 --> 00:14:07,970 Why can't they decide, you know, when they share with somebody else what they 247 00:14:07,970 --> 00:14:09,240 should be able to do with that document? 248 00:14:09,666 --> 00:14:12,976 so in, say, you know, Google Docs, you've got that little share button in the top 249 00:14:12,986 --> 00:14:15,986 corner and then says, you know, invite people and then you can say, well, they're 250 00:14:15,986 --> 00:14:19,626 an editor and this person said, you know, another admin and this is another viewer. 251 00:14:19,866 --> 00:14:21,206 This person can only comment. 252 00:14:21,690 --> 00:14:22,649 I think the UI is. 253 00:14:22,940 --> 00:14:26,630 You know, we'll usually stay like that, but you could add whatever 254 00:14:26,630 --> 00:14:28,220 options you wanted in there, right? 255 00:14:28,260 --> 00:14:28,750 Why not? 256 00:14:29,220 --> 00:14:33,186 So when we were doing, back at Fission, the file system work, you could scope 257 00:14:33,186 --> 00:14:37,126 down to say, like, well, you're allowed to write into only this directory, for 258 00:14:37,126 --> 00:14:39,470 example, and that was very, very flexible. 259 00:14:39,480 --> 00:14:43,900 Or, you're allowed to write files under a certain size limit, right? 260 00:14:43,900 --> 00:14:46,930 And so the user now can make these decisions of like, I'm giving 261 00:14:46,930 --> 00:14:48,200 you access to my file system. 262 00:14:48,770 --> 00:14:52,010 I only want you, you know, maybe I'm, you know, I'm thinking back to my school days, 263 00:14:52,320 --> 00:14:55,810 you know, a teacher and they're having students submit, assignments to them. 264 00:14:55,840 --> 00:14:59,300 Well, you can only submit them to this one directory and I don't 265 00:14:59,300 --> 00:15:00,630 want you filling up my entire disk. 266 00:15:00,630 --> 00:15:04,100 So they have to be under a gigabyte or whatever, right? 267 00:15:04,610 --> 00:15:07,970 And so you can imagine scenarios like this, where we're now inviting 268 00:15:07,970 --> 00:15:11,365 the end user to participate in what should the policy be. 269 00:15:11,625 --> 00:15:13,705 It's not all set completely. 270 00:15:13,915 --> 00:15:16,165 The developer can absolutely set it in advance, but you can 271 00:15:16,185 --> 00:15:19,248 also then refine it further and further, for the user's intention. 272 00:15:19,548 --> 00:15:19,918 Right. 273 00:15:19,978 --> 00:15:20,618 I love that. 274 00:15:20,628 --> 00:15:26,415 Since particularly now with like LMs and AIs in general, now a non technical 275 00:15:26,435 --> 00:15:30,795 user can now just in the way how they would say to another person, like, 276 00:15:30,795 --> 00:15:36,128 Hey, I want to give Alice access to this file, but Alice is only allowed 277 00:15:36,128 --> 00:15:38,978 to like read the first page here. 278 00:15:38,988 --> 00:15:42,178 The second two pages, those are like my private notes. 279 00:15:42,188 --> 00:15:43,878 Please don't give anyone access to this. 280 00:15:44,108 --> 00:15:44,548 You know what? 281 00:15:44,558 --> 00:15:47,798 Like actually Alice is allowed to also like comment on it. 282 00:15:48,078 --> 00:15:52,605 Like just from like a, a very like colloquial sentence like that, 283 00:15:52,831 --> 00:15:56,795 a computer can now derive, those capabilities very accurately. 284 00:15:56,990 --> 00:15:59,920 Represented to the user, like, Hey, does this look right to you? 285 00:16:00,310 --> 00:16:04,010 And, leveling up the entire application user experience. 286 00:16:04,343 --> 00:16:10,173 so it's very reassuring to me that all of this is built on top of very sound 287 00:16:10,408 --> 00:16:14,910 cryptography, however, even though I've studied computer science and like 288 00:16:14,910 --> 00:16:17,000 I have done my cryptography classes. 289 00:16:17,280 --> 00:16:20,340 That being said, I have, that's not my day to day thing. 290 00:16:20,350 --> 00:16:25,393 And as an application developer, I'm trying to steer away from like low 291 00:16:25,393 --> 00:16:28,853 level cryptography things as much as possible, just because I don't 292 00:16:28,853 --> 00:16:30,763 consider myself an expert in this. 293 00:16:31,123 --> 00:16:36,598 So it's very good to know that everything on that is built on top of very solid 294 00:16:36,708 --> 00:16:40,168 cryptography, but how much as an application developer, how much do I 295 00:16:40,168 --> 00:16:45,508 need to deal with like signing things, et cetera, or how much of that is 296 00:16:45,538 --> 00:16:47,408 abstracted from what I'm dealing with? 297 00:16:47,848 --> 00:16:48,388 Yeah. 298 00:16:48,508 --> 00:16:52,198 so I would say that there's two layers here that people find. 299 00:16:52,506 --> 00:16:55,906 correctly find scary, myself included, right? 300 00:16:56,140 --> 00:16:59,610 cryptography and auth in general, both super scary topics. 301 00:16:59,710 --> 00:17:04,373 I remember, you know, as a web dev, whatever, 10 years ago adding, in a 302 00:17:04,373 --> 00:17:09,083 web app, the, You know, the Auth plugin and kind of going, and if I don't 303 00:17:09,103 --> 00:17:10,743 touch it, hopefully it'll work, right? 304 00:17:11,106 --> 00:17:15,766 really the goal with all these projects was to hide as much of the scary 305 00:17:15,886 --> 00:17:17,946 complexities in there as possible. 306 00:17:18,016 --> 00:17:21,356 So we handle all of the encryption and signing and all of this 307 00:17:21,356 --> 00:17:24,306 stuff in a way that should make it, if we do our job well. 308 00:17:24,506 --> 00:17:26,900 Completely invisible, to the developer. 309 00:17:27,290 --> 00:17:29,940 So even, you know, we haven't talked about Beehive very much. 310 00:17:29,960 --> 00:17:33,526 Beehive has both a, which is this, project I'm doing, at Ink & Switch 311 00:17:33,526 --> 00:17:35,936 to add access control to Automerge. 312 00:17:36,240 --> 00:17:41,310 It has both a encryption side, so that's read controls, and then capabilities for 313 00:17:41,310 --> 00:17:43,330 these mutations or, or write controls. 314 00:17:44,243 --> 00:17:48,050 and for encryption, there's a bunch of things that have to happen. 315 00:17:48,050 --> 00:17:51,590 We have to serialize things in an efficient way. 316 00:17:51,700 --> 00:17:52,630 We have to chunk them up. 317 00:17:52,660 --> 00:17:57,280 We have to, make sure that we share the encryption key with everyone. 318 00:17:57,496 --> 00:17:59,166 but no and nobody else, right? 319 00:17:59,416 --> 00:18:02,476 And that could be, Thousands of people, potentially, and we've set 320 00:18:02,476 --> 00:18:05,226 ourselves these, these goals of, you know, you should be able to run, run 321 00:18:05,226 --> 00:18:08,166 this inside of a large organization or a medium sized organization. 322 00:18:08,386 --> 00:18:09,786 how do you do all that stuff efficiently? 323 00:18:09,826 --> 00:18:15,976 And our goal is you should be able to say, add these people, and it just works. 324 00:18:16,326 --> 00:18:19,716 You do all your normal Automerge stuff, and on, you know, when you 325 00:18:19,716 --> 00:18:22,866 persist to disk, or when you send it out to the network, then it gets 326 00:18:22,866 --> 00:18:25,846 encrypted, then it gets secured, then it gets signed, all of this stuff. 327 00:18:25,866 --> 00:18:27,426 And you don't have to worry about any of it. 328 00:18:27,800 --> 00:18:31,900 when you set up Beehive, it generates keys, it does all the key 329 00:18:31,900 --> 00:18:35,693 management for you, it does all of the key rotation, all of this stuff. 330 00:18:35,826 --> 00:18:39,653 so, again, it's one of these things where it's like, I'm really excited about this. 331 00:18:40,090 --> 00:18:42,450 and it's like super cool to get to work on. 332 00:18:42,650 --> 00:18:47,130 And there's a lot of interesting detail on the inside, but in an ideal 333 00:18:47,130 --> 00:18:50,940 world, nobody has to think about this other than I want to grant these 334 00:18:50,940 --> 00:18:54,390 rights to these people and everything else is taken care of automatically. 335 00:18:54,795 --> 00:18:55,485 I love that. 336 00:18:55,555 --> 00:19:01,878 so you've motivated initially that UCAN, happened as a project while you've been 337 00:19:01,878 --> 00:19:04,838 working on various projects at Fission. 338 00:19:05,005 --> 00:19:07,995 and right now you're mostly focused on Beehive. 339 00:19:08,235 --> 00:19:13,995 So can you share a bit more, what was the impetus for Beehive coming 340 00:19:14,005 --> 00:19:18,698 into existence and then going into what Beehive is exactly? 341 00:19:19,231 --> 00:19:19,971 absolutely. 342 00:19:20,211 --> 00:19:25,771 So, you know, we started UCAN very, very early in 2020, came out of 343 00:19:26,851 --> 00:19:30,501 normal, regular product requirements of like, oh, well, we probably want 344 00:19:30,631 --> 00:19:31,971 everyone to read this document. 345 00:19:32,271 --> 00:19:32,991 How do we do that? 346 00:19:33,011 --> 00:19:35,571 Or I don't want somebody to fill up my entire disk. 347 00:19:36,211 --> 00:19:37,141 How do we prevent that? 348 00:19:37,413 --> 00:19:40,436 And, that went through a bunch of iterations and we, we had a lot 349 00:19:40,436 --> 00:19:41,756 of learnings come out of that. 350 00:19:42,026 --> 00:19:47,206 I'd say that really the big one was in a traditional app stack, you have data 351 00:19:47,206 --> 00:19:49,676 at the bottom, you know, you have to say Postgres and that's your source of truth. 352 00:19:49,976 --> 00:19:52,516 And then above that, you have some computes, maybe you're running. 353 00:19:52,841 --> 00:19:53,411 Whatever, Express. 354 00:19:53,521 --> 00:19:56,471 js, or Rails, or Phoenix, or you know, one of these. 355 00:19:57,091 --> 00:20:02,501 And then on top of that, you put in an Auth plugin, right, that uses all 356 00:20:02,501 --> 00:20:04,001 the facilities of everything below it. 357 00:20:04,648 --> 00:20:10,328 but that requires that you have a database that has all this information 358 00:20:10,328 --> 00:20:11,608 in it that lives at a location. 359 00:20:11,608 --> 00:20:14,878 We call this, internally at Ink & Switch, auth-as-place. 360 00:20:15,628 --> 00:20:15,838 Right? 361 00:20:15,858 --> 00:20:18,098 Because your auth goes to somewhere, right? 362 00:20:18,308 --> 00:20:22,808 And on every request, you present your ID, they go, okay, sure, you know, here's 363 00:20:22,808 --> 00:20:25,688 a temporary token, then you hand that to the application, the application 364 00:20:25,688 --> 00:20:28,298 checks with the auth, you know, server again, and you do this whole loop. 365 00:20:28,738 --> 00:20:32,598 And that has, you know, problems with latency, if you go offline, 366 00:20:32,598 --> 00:20:34,948 this doesn't work, and it doesn't scale very well, right? 367 00:20:34,948 --> 00:20:36,908 Like, even Google ran into problems with this and started, 368 00:20:37,011 --> 00:20:38,318 adjusting their auth system. 369 00:20:38,541 --> 00:20:42,716 we found at Fission, and I, I think this, this Very much holds true, like 370 00:20:42,716 --> 00:20:47,656 we just kept learning this over and over again, is you can't rely on that system. 371 00:20:47,656 --> 00:20:49,806 In fact, auth has to go at the bottom of the stack. 372 00:20:50,113 --> 00:20:55,103 your auth logic and the auth, the thing that actually does the guarding of your 373 00:20:55,103 --> 00:20:57,683 data has to move with the data itself. 374 00:20:57,793 --> 00:20:59,463 So we call this "auth as data". 375 00:20:59,780 --> 00:21:04,330 So for read control, it's no longer, oh, I'm making a request to a web server and 376 00:21:04,330 --> 00:21:05,640 they may or may not send something to me. 377 00:21:05,640 --> 00:21:06,710 It's, I've encrypted it. 378 00:21:06,910 --> 00:21:07,910 Do you have the key? 379 00:21:08,040 --> 00:21:08,670 Yes or no. 380 00:21:09,400 --> 00:21:09,700 If you. 381 00:21:10,000 --> 00:21:10,530 Have the key. 382 00:21:10,530 --> 00:21:11,020 You can read it. 383 00:21:11,020 --> 00:21:11,910 If you don't, you can't. 384 00:21:11,950 --> 00:21:13,620 And it doesn't matter where you are. 385 00:21:14,140 --> 00:21:16,160 You could be on a plane disconnected from the internet. 386 00:21:16,730 --> 00:21:18,876 You can decrypt the data, right? 387 00:21:19,056 --> 00:21:23,730 So we developed these ideas with, with UCAN and, the web native file system, 388 00:21:23,751 --> 00:21:27,370 in particular, Fission unfortunately didn't make it, earlier this year, 389 00:21:27,490 --> 00:21:30,760 or I, I'm not sure when this will be released in, early in 2024. 390 00:21:31,360 --> 00:21:32,867 and, Ink & Switch reached out. 391 00:21:32,867 --> 00:21:34,947 So we, we've, we've known those folks for a while, cause we've been, you 392 00:21:34,947 --> 00:21:39,177 know, obviously working in the same space for a while and, PVH, the lab 393 00:21:39,177 --> 00:21:42,847 director was actually an advisor at Fission and said, Hey, we have a bunch 394 00:21:42,847 --> 00:21:48,160 of people that are interested in getting, auth for Automerge in particular. 395 00:21:48,640 --> 00:21:53,110 could you apply UCAN and WNFS to Automerge? 396 00:21:53,470 --> 00:21:55,720 And I said, I don't see why not. 397 00:21:55,790 --> 00:21:56,100 Right. 398 00:21:56,158 --> 00:21:58,660 and so we, we looked at it, a little bit deeper and went, well, 399 00:21:59,050 --> 00:22:02,480 yes, like we, we could use these things directly, but they're tuned 400 00:22:02,490 --> 00:22:03,880 for slightly different use cases. 401 00:22:04,660 --> 00:22:06,180 UCAN is extremely powerful. 402 00:22:06,250 --> 00:22:07,420 It's very flexible. 403 00:22:07,520 --> 00:22:10,270 and it has a bunch of stuff in it for this, you know, network 404 00:22:10,440 --> 00:22:12,780 layer, in addition to CRDTs. 405 00:22:13,820 --> 00:22:15,780 You pay for that in space, right? 406 00:22:15,780 --> 00:22:17,570 The certificates get a little bit bigger. 407 00:22:17,660 --> 00:22:20,700 And so we said, well, okay, maybe, you know, we want these 408 00:22:20,866 --> 00:22:22,750 documents be as small as possible. 409 00:22:23,180 --> 00:22:26,500 You know, there's been a lot of work in Automerge to do compression, right? 410 00:22:26,520 --> 00:22:28,750 Really, really, really good compression on them. 411 00:22:28,750 --> 00:22:32,317 So the documents are tiny and, you know, you're not going to get that with UCAN. 412 00:22:32,327 --> 00:22:35,779 So could we take the principles and the learnings from UCAN and 413 00:22:35,779 --> 00:22:37,860 WNFS and apply them, to Automerge? 414 00:22:37,860 --> 00:22:40,750 And so ultimately that's what we've done. 415 00:22:41,500 --> 00:22:43,743 And there are a couple of different requirements that 416 00:22:43,743 --> 00:22:44,573 have come out of it as well. 417 00:22:44,603 --> 00:22:46,443 So it's tuned for a slightly different thing. 418 00:22:46,793 --> 00:22:50,753 But essentially, Beehive says, what if we had end to end encrypted? 419 00:22:50,783 --> 00:22:54,673 So in the same way that, you know, say, Signal, end to end encrypts your chats. 420 00:22:55,233 --> 00:22:57,483 What if I had end to end encrypted documents? 421 00:22:58,218 --> 00:23:01,998 That only certain people could write into, and I can control who can write into them. 422 00:23:03,618 --> 00:23:09,328 Has there been any prior art in regards to CRDTs to fulfill those sort of 423 00:23:09,338 --> 00:23:14,378 like end user driven authentication authorization requirements? 424 00:23:14,947 --> 00:23:18,587 there's some, some nearer term stuff that was also exploring things with CRDTs. 425 00:23:19,127 --> 00:23:22,810 But, you know, if you go really, really, you know, further back, 426 00:23:22,837 --> 00:23:27,564 there's, uh, the Tahoe least authority file system, for example, 427 00:23:27,684 --> 00:23:30,777 which was, you know, this encrypted at rest, file system capabilities 428 00:23:30,777 --> 00:23:31,987 model, you know, whole, whole thing. 429 00:23:32,330 --> 00:23:38,350 Mark Miller was doing capabilities based off going back into, you know, uh, The 430 00:23:38,350 --> 00:23:41,490 late 90s, there's capability stuff that goes even further back, but he's, he's, 431 00:23:41,530 --> 00:23:44,554 you know, really did the, the work that everybody points at, in, in the stuff. 432 00:23:44,824 --> 00:23:48,964 But for CRDTs and for a local-first context where we don't assume at 433 00:23:48,964 --> 00:23:53,917 all, like there's no server in the middle whatsoever, we may have been 434 00:23:54,137 --> 00:23:55,877 the first to do this at Fission. 435 00:23:55,877 --> 00:23:56,727 It's, it's possible. 436 00:23:56,727 --> 00:23:59,597 I mean, when we got started, the local-first essay hadn't 437 00:23:59,597 --> 00:24:00,607 even been published, right? 438 00:24:00,607 --> 00:24:02,267 We were doing local-first without, without the term. 439 00:24:02,455 --> 00:24:04,125 but there was a bunch of others in the space. 440 00:24:04,135 --> 00:24:09,252 So, Serenity Notes has done related work, Matrix, Signal, obviously has done 441 00:24:09,312 --> 00:24:13,549 a bunch of the end to end encryption stuff, and, local-first to auth, is a, 442 00:24:13,659 --> 00:24:17,305 a project that has also worked with, Automerge, to do similar things. 443 00:24:17,505 --> 00:24:20,325 so most of these projects, showed up, after the fact. 444 00:24:20,439 --> 00:24:23,649 but yeah, so we're drawing from, in fact, we've talked to, all these 445 00:24:23,649 --> 00:24:26,419 people and all of the fantastic work that they've done over the past few 446 00:24:26,419 --> 00:24:31,032 years, and, collected the learnings, from them into, into Beehive. 447 00:24:31,315 --> 00:24:32,025 That's awesome. 448 00:24:32,125 --> 00:24:35,585 I would love to get a better feeling for what it would mean 449 00:24:35,615 --> 00:24:37,765 to build an app with Beehive. 450 00:24:38,005 --> 00:24:42,535 My understanding is that Beehive right now is very centric around Automerge. 451 00:24:42,735 --> 00:24:47,985 However, it is designed in a way that over time, other CRDT systems, 452 00:24:48,015 --> 00:24:52,095 other sync engines, et cetera could actually embrace it and integrate 453 00:24:52,095 --> 00:24:54,229 it into their specific system. 454 00:24:54,269 --> 00:24:58,309 I would like to get into that in a moment as well, but zooming into 455 00:24:58,309 --> 00:25:02,349 the Automerge use case right now, let's say I have already built a 456 00:25:02,349 --> 00:25:04,249 little side project with Automerge. 457 00:25:04,529 --> 00:25:09,449 I have like some Automerge documents that are happily syncing the 458 00:25:09,449 --> 00:25:11,139 data between my different apps. 459 00:25:11,482 --> 00:25:13,192 so far I've maybe. 460 00:25:13,447 --> 00:25:18,960 Put the entire thing, maybe I don't even, have any auth fences around it at all. 461 00:25:19,117 --> 00:25:22,237 hopefully no one knows the end point where all of my data lives. 462 00:25:22,487 --> 00:25:24,257 And if so, okay. 463 00:25:24,257 --> 00:25:25,907 It's like not very sensitive data. 464 00:25:26,134 --> 00:25:29,994 or maybe I'm running all of that behind like a tail scale network or something 465 00:25:30,004 --> 00:25:34,601 like that, which I think in a lot of use cases, simpler use cases, this can also 466 00:25:34,601 --> 00:25:37,100 be a very pragmatic approach, by the way. 467 00:25:37,400 --> 00:25:44,067 when you can run the entire thing already, like in a fully secured frame of like 468 00:25:44,067 --> 00:25:47,810 a, guarded network, and you, you're just going to run this for yourself 469 00:25:47,830 --> 00:25:51,350 or like in your home network or for your family and you're all on like the 470 00:25:51,350 --> 00:25:53,847 same, tail scale wire guard network. 471 00:25:54,012 --> 00:25:56,132 I think that's also a very pragmatic approach. 472 00:25:56,439 --> 00:26:01,572 but, let's say I want to build an app that I can share more publicly on the 473 00:26:01,572 --> 00:26:06,825 internet, where maybe I want to build a TLDraw like thing where I can send over 474 00:26:06,825 --> 00:26:11,915 a link where people can read it, but they need to have special permissions to 475 00:26:11,915 --> 00:26:13,715 actually also write something into it. 476 00:26:14,022 --> 00:26:16,042 I want to build the thing with Automerge. 477 00:26:16,312 --> 00:26:18,222 What does my experience look like? 478 00:26:18,619 --> 00:26:18,979 Yeah. 479 00:26:19,441 --> 00:26:22,592 there are, I would say two parts to that question, right? 480 00:26:22,622 --> 00:26:24,832 One is, I have an existing documents. 481 00:26:25,165 --> 00:26:26,645 how do I migrate it in? 482 00:26:27,105 --> 00:26:30,689 And, you know, could I use it with something, you know, you alluded to 483 00:26:30,709 --> 00:26:33,269 other, other systems, in, in the future. 484 00:26:33,372 --> 00:26:36,556 and, what does the actual, experience building something 485 00:26:36,582 --> 00:26:37,702 with, with Behive look like? 486 00:26:38,482 --> 00:26:40,202 So Behive is still in progress. 487 00:26:40,252 --> 00:26:44,465 we're planning to have a first release of it, uh, in Q1. 488 00:26:44,952 --> 00:26:47,836 and, you know, we're currently going at this with with the viewpoint 489 00:26:47,836 --> 00:26:50,736 that like adding any auth is better than not having auth right now. 490 00:26:50,736 --> 00:26:54,016 So like there's definitely like further work where we want to like really 491 00:26:54,116 --> 00:26:57,846 polish off the edges of this thing but getting anything into people's hands is 492 00:26:57,846 --> 00:26:59,806 better than than not having it right. 493 00:27:00,232 --> 00:27:04,392 and there are some changes that we need to make to Automerge because 494 00:27:04,432 --> 00:27:08,292 as I mentioned before you know auth lives at the bottom of the stack so 495 00:27:08,402 --> 00:27:11,522 anything above in a stack needs to know something about the things below. 496 00:27:12,077 --> 00:27:15,167 Off being at the bottom means that if you wanna do in particular mutation 497 00:27:15,167 --> 00:27:18,587 control, Automerge needs to know about how to ingest that mutation. 498 00:27:18,647 --> 00:27:21,857 So we do need to make some small changes to Automerge to, to make this work. 499 00:27:22,167 --> 00:27:26,801 but the actual experience is, we're bundling it directly into Automerge 500 00:27:26,821 --> 00:27:30,371 or the current plan at least, is we're bundling it directly into the Automerge 501 00:27:30,371 --> 00:27:36,894 wasm, and then exposing a handful of functions on that, which is add 502 00:27:36,914 --> 00:27:39,894 member at a certain authority level. 503 00:27:40,464 --> 00:27:41,094 Remove member. 504 00:27:41,554 --> 00:27:42,134 And that's it. 505 00:27:42,461 --> 00:27:45,941 so your experience will be, we're going to do all the key management for you, 506 00:27:46,361 --> 00:27:48,261 behind the scenes, under the hood. 507 00:27:48,631 --> 00:27:52,417 if you have an existing document, it'll get serialized and encrypted 508 00:27:53,217 --> 00:27:55,497 and put, you know, into storage. 509 00:27:56,117 --> 00:27:58,507 And you can add other people to the document. 510 00:27:58,999 --> 00:28:03,059 By inviting them using add member or remove member from that document. 511 00:28:03,476 --> 00:28:08,012 maybe, maybe also worth noting, this gives you a couple extra, concepts to work with. 512 00:28:08,787 --> 00:28:11,601 So today we have documents, and you can have a whole bunch of them, and 513 00:28:11,601 --> 00:28:14,721 they're really independent pieces, right? 514 00:28:14,721 --> 00:28:17,511 And maybe they can refer to each other by, you know, an Automerge URL. 515 00:28:17,987 --> 00:28:22,687 instead, or in addition, I should say, not instead, you want to be able 516 00:28:22,687 --> 00:28:24,217 to say, I'm building a file system. 517 00:28:24,247 --> 00:28:27,357 If I give you access to the root of the file system, you should have access to. 518 00:28:27,662 --> 00:28:28,592 The entire file system. 519 00:28:28,592 --> 00:28:31,562 I don't want to have to share with you every individual thing. 520 00:28:32,482 --> 00:28:34,162 So we have this concept of a group. 521 00:28:34,466 --> 00:28:37,876 so you have your individual device, you have groups, and you have documents. 522 00:28:39,176 --> 00:28:42,956 Each individual device has its own, under the hood, you don't have to worry about 523 00:28:43,016 --> 00:28:45,346 this specific detail, but has its own key. 524 00:28:45,616 --> 00:28:48,211 So it's in, Uniquely identifiable. 525 00:28:48,451 --> 00:28:52,471 Somebody steals your phone, you can kick your phone out of the group, right? 526 00:28:52,471 --> 00:28:54,131 Or out of the document and that, that's fine. 527 00:28:54,624 --> 00:28:55,454 then we have groups. 528 00:28:55,524 --> 00:28:58,874 So let's say that I have a group for everyone at Ink & Switch. 529 00:28:59,247 --> 00:29:01,987 and then that can add everybody to that, but it doesn't have 530 00:29:01,987 --> 00:29:03,227 a document associated with it. 531 00:29:03,227 --> 00:29:07,187 It's purely just a way of managing people and saying, I want to add 532 00:29:07,727 --> 00:29:10,507 everybody in this group to this document. 533 00:29:10,922 --> 00:29:11,162 Right? 534 00:29:11,172 --> 00:29:15,442 And so you can have groups contain users and other groups. 535 00:29:15,782 --> 00:29:18,002 Then you have documents, which are groups that have some 536 00:29:18,002 --> 00:29:18,792 content associated with them. 537 00:29:19,482 --> 00:29:21,952 So I say on this document, here's who's allowed to see it. 538 00:29:21,972 --> 00:29:25,132 So it could be individuals or other groups or other documents. 539 00:29:25,832 --> 00:29:28,482 Other documents is interesting because I can say then you have 540 00:29:28,482 --> 00:29:31,052 access to this document, this document represents a directory. 541 00:29:31,082 --> 00:29:33,912 And so you also have access to all of its children, right? 542 00:29:33,912 --> 00:29:36,522 In a, in a file system, you can do things like this. 543 00:29:36,856 --> 00:29:40,272 So Add member, remove member becomes very, very powerful because now you can 544 00:29:40,272 --> 00:29:45,972 have groups and, you know, set up these, hierarchies of, here's all of my devices. 545 00:29:46,042 --> 00:29:49,092 All of my devices sit in a group of Brook's devices. 546 00:29:49,542 --> 00:29:53,142 All of Brook's devices should be added to Ink & Switch, and Ink 547 00:29:53,252 --> 00:29:54,802 & Switch has the following documents. 548 00:29:54,822 --> 00:29:57,332 And then, you know, whenever one of my contract finishes and I get 549 00:29:57,332 --> 00:30:00,896 kicked out of Ink & Switch, then they can kick all of my devices out 550 00:30:00,896 --> 00:30:03,736 by, by revoking that group, right? 551 00:30:04,256 --> 00:30:07,431 So using, Beehive is going to feel like that. 552 00:30:07,461 --> 00:30:11,211 It's going to say, yeah, I know about the ID for Brooke's devices. 553 00:30:11,411 --> 00:30:14,941 Please add her or, you know, contract finishes, please remove her. 554 00:30:15,414 --> 00:30:19,314 all of the rest of the stuff should be completely invisible to you. 555 00:30:19,684 --> 00:30:22,784 So when you persist things to disk or you send them to a sync server, 556 00:30:23,224 --> 00:30:24,534 that all gets encrypted first. 557 00:30:24,919 --> 00:30:28,459 And even the sync servers have permission. 558 00:30:28,699 --> 00:30:33,159 There's a permission level in here of, you're allowed to ask for the, 559 00:30:33,399 --> 00:30:35,492 the bytes off, from another node. 560 00:30:35,792 --> 00:30:39,932 And they can prove that because you have these certificates under the hood, right? 561 00:30:40,319 --> 00:30:44,162 because, and this is an uncomfortable truth, all cryptography is breakable. 562 00:30:44,472 --> 00:30:47,832 So in 10 years, maybe they break all of our current ciphers. 563 00:30:48,637 --> 00:30:48,887 Right? 564 00:30:48,957 --> 00:30:49,697 It could happen. 565 00:30:49,697 --> 00:30:52,447 In fact, older Cypher's already, you know, broken. 566 00:30:52,737 --> 00:30:56,077 Or maybe quantum computing gets very, very advanced, and it becomes 567 00:30:56,077 --> 00:30:58,137 practical to break keys, right? 568 00:30:58,417 --> 00:30:58,977 Whatever it is. 569 00:30:58,977 --> 00:31:03,241 Or there's an advancement in, discrete log problem, or whatever the thing is, right? 570 00:31:03,251 --> 00:31:05,551 You know, we have some mathematical advance, and it gets broken. 571 00:31:05,981 --> 00:31:09,741 the best thing to do, then, is to just not make those bytes available. 572 00:31:10,341 --> 00:31:13,664 Make the encrypted content only pullable by people that you trust. 573 00:31:13,957 --> 00:31:17,127 And yes, somebody could break into the sync server, let's 574 00:31:17,127 --> 00:31:18,207 say, and download everything. 575 00:31:18,477 --> 00:31:20,937 But that's a much higher bar than anybody can download. 576 00:31:21,227 --> 00:31:23,497 Anybody on the internet can download whatever chunk they want, right? 577 00:31:23,789 --> 00:31:26,999 But all of that is handled really for the developer to say, this is 578 00:31:26,999 --> 00:31:30,099 the sync server, sync server has the ability to pull down these documents. 579 00:31:30,659 --> 00:31:34,239 Or even the user could say, I want to sync to this sync server, I'm going 580 00:31:34,239 --> 00:31:37,309 to grant that sync server access to my documents to replicate them. 581 00:31:37,666 --> 00:31:41,136 But really, we're trying to keep the top level API for this 582 00:31:41,266 --> 00:31:43,316 as boring as possible, right? 583 00:31:43,516 --> 00:31:45,286 That is a top line goal. 584 00:31:45,886 --> 00:31:48,836 Add member, remove member, and the sync server is just 585 00:31:48,836 --> 00:31:50,956 another member in the system. 586 00:31:51,464 --> 00:31:51,844 Got it. 587 00:31:52,244 --> 00:31:58,584 So in terms of the auth as data, that, that mental model, that's very intuitive. 588 00:31:58,614 --> 00:32:02,736 And, as you're like rewiring your brain as an application developer, like how 589 00:32:02,746 --> 00:32:07,066 data flows through the system, now to understand that, like everything that's 590 00:32:07,566 --> 00:32:12,996 necessary to make those auth decisions, should someone have access to, to read 591 00:32:12,996 --> 00:32:17,896 this, to like write this, et cetera, that this is just data that's also being 592 00:32:17,896 --> 00:32:20,293 synchronized, across the different nodes. 593 00:32:20,473 --> 00:32:21,833 That is very intuitive. 594 00:32:22,063 --> 00:32:27,013 is this something that in this particular case, at least with Beehive and Automerge, 595 00:32:27,023 --> 00:32:29,233 is this purely an implementation detail? 596 00:32:29,568 --> 00:32:34,238 And this is like your internal mental model of this data, or is this actually 597 00:32:34,258 --> 00:32:38,628 data that is available somehow to the application developer that the application 598 00:32:38,628 --> 00:32:43,238 developer would work with that as they work with the normal Automerge documents? 599 00:32:43,735 --> 00:32:44,165 Yeah. 600 00:32:44,245 --> 00:32:48,360 So, Again, we're trying to hide these details as much as possible. 601 00:32:48,400 --> 00:32:52,583 So, you'll hear me talking about things like add member or groups, right? 602 00:32:52,603 --> 00:32:56,023 And that sounds very access control list like. 603 00:32:56,260 --> 00:33:00,646 capabilities are, like there's a formal proof of this, are more powerful. 604 00:33:00,656 --> 00:33:03,086 Like they can express more things than access control lists. 605 00:33:03,836 --> 00:33:06,556 So at least for this first revision, we've restricted ourselves down 606 00:33:06,656 --> 00:33:11,491 to making things look like access control lists, on the outside. 607 00:33:11,768 --> 00:33:15,548 and so it should feel very, very similar to doing things with role 608 00:33:15,548 --> 00:33:18,588 based access control using, say, OAuth. 609 00:33:20,328 --> 00:33:22,915 That should all feel totally normal. 610 00:33:23,115 --> 00:33:25,795 You shouldn't really have to think about it in any special way. 611 00:33:25,795 --> 00:33:28,495 In the same way that, you know, if you have a sync server, other than having 612 00:33:28,495 --> 00:33:32,525 to set up the sync server, or maybe you pointed at an existing one, knowing that 613 00:33:32,525 --> 00:33:35,845 it's there doesn't mean that you have to, like, design it from first principles. 614 00:33:35,895 --> 00:33:37,835 Or, you know, same thing with Automerge. 615 00:33:38,445 --> 00:33:41,345 Technically, you have access to all of the events. 616 00:33:41,600 --> 00:33:44,390 But really you're going to materialize a view and treat it like it's JSON. 617 00:33:45,120 --> 00:33:50,133 And so we're saying the same thing here with Beehive is you will automatically 618 00:33:50,450 --> 00:33:54,960 get only the data that you can decrypt and that you're allowed to receive from 619 00:33:54,960 --> 00:34:00,350 others and So, essentially, Beehive takes things off the wire, decrypts 620 00:34:00,350 --> 00:34:03,300 it, and hands it to Automerge, and then Automerge does its normal Automerge stuff. 621 00:34:03,850 --> 00:34:07,620 The one wrinkle is if an old write has been revoked, so it turns out 622 00:34:07,620 --> 00:34:10,590 that somebody was, like, defacing the document and doing all this horrible 623 00:34:10,590 --> 00:34:13,243 stuff, and we had to kick them out, we have to send it to Automerge, 624 00:34:13,253 --> 00:34:15,373 Hey, ignore this run of changes. 625 00:34:15,923 --> 00:34:17,123 And then it has to recalculate. 626 00:34:17,363 --> 00:34:19,513 So that's the one change that we have to make inside of Automerge. 627 00:34:19,716 --> 00:34:22,186 but really you will use Automerge as normal. 628 00:34:22,396 --> 00:34:25,586 you will have an extra API that is add this person to this document or 629 00:34:25,586 --> 00:34:28,396 to this group, and remove them, right? 630 00:34:28,436 --> 00:34:29,226 As needed. 631 00:34:29,696 --> 00:34:31,816 And you shouldn't have to think about any of these other 632 00:34:31,816 --> 00:34:33,716 parts, even the sync server. 633 00:34:33,736 --> 00:34:37,663 Like, Alex Good, who's the, the main maintainer of, of Automerge. 634 00:34:37,946 --> 00:34:41,426 has been working on, on sync and improving sync. 635 00:34:41,448 --> 00:34:44,036 and that project started around the same time as Beehive and we realized, 636 00:34:44,096 --> 00:34:47,636 Oh, there's actually this challenge because we're, you know, on the 637 00:34:47,636 --> 00:34:50,926 security side, trying to hide as much information from the network as possible, 638 00:34:50,926 --> 00:34:52,306 including from the sync server, right? 639 00:34:52,346 --> 00:34:53,916 Sync server shouldn't be able to read your documents. 640 00:34:54,316 --> 00:34:56,886 To do efficient sync, you want to have like a lot of information about the 641 00:34:56,886 --> 00:34:59,716 structure of the thing that you're syncing so that you have no redundancy. 642 00:34:59,926 --> 00:35:00,126 Right? 643 00:35:00,126 --> 00:35:02,026 And you can do it in a few round trips, all of this stuff. 644 00:35:02,406 --> 00:35:06,346 So we ended up having to co design and essentially, like, negotiate 645 00:35:06,346 --> 00:35:09,406 between the two systems, like, how, how much information can we 646 00:35:09,406 --> 00:35:11,400 reveal, and still have it be secure? 647 00:35:11,710 --> 00:35:15,170 And given that you can't read inside the documents, like, how do we 648 00:35:15,170 --> 00:35:17,180 package things up in an efficient way? 649 00:35:17,460 --> 00:35:21,988 But again, none of that information should be a concern for a developer 650 00:35:22,038 --> 00:35:24,558 in the same way that the sync system right now, you don't really interact 651 00:35:24,558 --> 00:35:26,718 with the sync system, other than you say, that's my sync server over 652 00:35:26,718 --> 00:35:28,038 there and the bytes go over there. 653 00:35:28,538 --> 00:35:30,858 There's an extra layer now of, it gets encrypted first 654 00:35:31,008 --> 00:35:32,178 before it goes over the wire. 655 00:35:32,545 --> 00:35:33,245 That makes sense. 656 00:35:33,295 --> 00:35:36,375 I think as an application developer, there's typically sort 657 00:35:36,375 --> 00:35:39,045 of this two pronged approach. 658 00:35:39,045 --> 00:35:43,625 There is like, You, on the one hand, you ideally, you want to embrace 659 00:35:43,655 --> 00:35:45,135 that things are hidden from you. 660 00:35:45,385 --> 00:35:48,755 That you don't need to understand them to use it correctly, et cetera. 661 00:35:49,105 --> 00:35:52,355 But particularly if something's new, some, maybe you're like an 662 00:35:52,365 --> 00:35:54,035 early adopter of the technology. 663 00:35:54,378 --> 00:35:57,648 you would like to figure out like, what are the worst case scenarios? 664 00:35:57,668 --> 00:35:59,488 Maybe the thing is no longer being developed. 665 00:35:59,498 --> 00:36:03,966 Could I take it over and like, can I become a contributor or maintainer 666 00:36:03,976 --> 00:36:08,926 of, of that, or you'd still like to understand it for the sake of like 667 00:36:08,926 --> 00:36:11,326 figuring, really understanding, is this. 668 00:36:11,641 --> 00:36:12,921 The thing that I want. 669 00:36:13,158 --> 00:36:16,888 and just by like understanding how it works, you can come to the right 670 00:36:16,898 --> 00:36:19,928 conclusion, like, is this for me or not, particularly if it's not 671 00:36:19,928 --> 00:36:21,858 yet as well documented, et cetera. 672 00:36:21,858 --> 00:36:27,321 So channeling our like inner understanding application developer. 673 00:36:27,628 --> 00:36:32,301 I'd like to understand a bit better of like how, Beehive and in that regard, 674 00:36:32,311 --> 00:36:35,111 also the sync server works under the hood. 675 00:36:35,111 --> 00:36:37,941 Like, it's hard enough to build a syncing system. 676 00:36:38,345 --> 00:36:41,825 and now, you build an authorization layer on top of it. 677 00:36:42,015 --> 00:36:46,345 What sort of implications does this have for the sync server? 678 00:36:46,535 --> 00:36:50,415 And my understanding is that Alex Good is working on this and I think 679 00:36:50,415 --> 00:36:52,205 this has been semi public so far. 680 00:36:52,205 --> 00:36:56,150 And that there's like a, you know, like a sibling product or a sibling 681 00:36:56,150 --> 00:37:00,990 project, next to Beehive called Beelay, which I guess like relays 682 00:37:01,020 --> 00:37:03,480 messages in the Beehive system. 683 00:37:03,860 --> 00:37:09,340 And I think that's a step towards what eventually, we're all dreaming about as 684 00:37:09,340 --> 00:37:14,700 like a generic sync server that ideally is compatible with like as many things 685 00:37:14,700 --> 00:37:18,660 as possible, I guess, at the beginning for Automerge, but also beyond that. 686 00:37:19,450 --> 00:37:21,270 So what is Beelay? 687 00:37:21,280 --> 00:37:24,730 What are its design goals and how does it work? 688 00:37:25,425 --> 00:37:30,501 So Beelay, has a requirement that it has to work with, encrypted chunks. 689 00:37:30,853 --> 00:37:34,650 So, you know, we do this compression and then encryption, on top of it, 690 00:37:34,881 --> 00:37:36,190 and then send that to the Sync Server. 691 00:37:36,200 --> 00:37:39,700 The Sync Server can see, because it has to know who it can send these 692 00:37:39,700 --> 00:37:41,880 chunks around to, the membership. 693 00:37:41,940 --> 00:37:44,150 So Sync Server does have access to the membership. 694 00:37:44,640 --> 00:37:47,200 of each doc, but not the content of the document. 695 00:37:47,590 --> 00:37:50,610 so if you make a request, it checks, you know, okay, are you somebody 696 00:37:50,610 --> 00:37:53,750 that, has the, the rights to, to have this sent to you, yes or no, 697 00:37:53,830 --> 00:37:55,360 and then it'll send it to you or not. 698 00:37:55,390 --> 00:37:58,800 And this isn't only for sync servers, you know, if you connect to somebody, 699 00:37:58,810 --> 00:38:01,740 you know, directly over Bluetooth, you know, you'd do the same thing, right? 700 00:38:01,760 --> 00:38:03,580 Even if, you know, you can both see the document. 701 00:38:04,080 --> 00:38:05,520 There's nothing special here about sync servers. 702 00:38:06,116 --> 00:38:10,936 To do this sync, well, we're no longer syncing individual ops, right? 703 00:38:10,936 --> 00:38:13,176 Like, we could do that, but then we lose the compression. 704 00:38:13,906 --> 00:38:15,136 It's not great, right? 705 00:38:15,656 --> 00:38:19,456 And ideally, we don't want people to know, you know, if somebody were to 706 00:38:19,456 --> 00:38:22,946 break into your server, hey, here's how everything's related to each other, right? 707 00:38:22,946 --> 00:38:25,476 Like, that compression and encryption, you know, also hides 708 00:38:25,476 --> 00:38:26,786 a little bit more of this data. 709 00:38:27,186 --> 00:38:30,596 We do show the links between these, you know, compressed chunks, but 710 00:38:30,656 --> 00:38:31,786 we'll, we'll get to that in a second. 711 00:38:32,156 --> 00:38:37,976 Essentially what we want to do is chunk up the documents in such a way where, 712 00:38:38,076 --> 00:38:43,696 there's the fewest number of chunks to get synced, and the longer ranges that 713 00:38:43,696 --> 00:38:48,575 we have of, you Automerge ops that we get compressed before we encrypt it, right? 714 00:38:48,575 --> 00:38:50,165 On the, I'll call it client. 715 00:38:50,165 --> 00:38:52,175 It's not really a client in a local-first setting, right? 716 00:38:52,175 --> 00:38:54,995 But like not on the not sync server when you're sending it to it. 717 00:38:55,402 --> 00:38:57,832 the more stuff that you have, the better the compression is. 718 00:38:58,282 --> 00:39:02,132 And chunking up the document here means basically, you're really 719 00:39:02,132 --> 00:39:07,457 chunking up the history of operations that then get internally rolled up 720 00:39:07,497 --> 00:39:09,917 into one snapshot of the document. 721 00:39:09,927 --> 00:39:11,367 And that could be very long. 722 00:39:11,737 --> 00:39:14,023 And, there's room for optimization. 723 00:39:14,253 --> 00:39:19,113 That is like the, the compression here, where if you set a ton of times, like, 724 00:39:19,113 --> 00:39:22,313 Hey, the name of the document is Peter. 725 00:39:22,313 --> 00:39:24,843 And later you say like, no, it's Brooke. 726 00:39:24,883 --> 00:39:26,753 And later you say, no, it's Peter. 727 00:39:26,773 --> 00:39:27,713 No, it's Johannes. 728 00:39:28,003 --> 00:39:32,293 Then you, you can like compress it into, for example, just the latest operation. 729 00:39:33,118 --> 00:39:33,888 Yeah, exactly. 730 00:39:34,538 --> 00:39:37,868 So, you know, if you want to think about how this, you know, to get, to get more 731 00:39:37,868 --> 00:39:40,798 concrete, you know, if you take this slider all the way to one end and you take 732 00:39:40,798 --> 00:39:45,408 the entire history and run length encoded, you know, do this Automerge compression, 733 00:39:45,888 --> 00:39:47,818 you get very, very good compression. 734 00:39:47,898 --> 00:39:50,758 If we take it to the far other end, we go really granular. 735 00:39:50,958 --> 00:39:55,423 Every op, doesn't get compressed, but you know, so it's just like each individual 736 00:39:55,423 --> 00:39:56,563 op, so you don't get compression. 737 00:39:56,587 --> 00:39:59,863 So there's something in between here of like, how can we chop up 738 00:39:59,863 --> 00:40:04,213 the history in a way where I get a nice balance between these two? 739 00:40:04,763 --> 00:40:10,183 When Automerge receives new ops, It has to know where in the history to place it. 740 00:40:10,193 --> 00:40:12,333 So you have this partial order, you know, you have this, you 741 00:40:12,333 --> 00:40:14,273 know, typical CRDT lattice. 742 00:40:14,933 --> 00:40:18,840 And then, we put that, or it puts it into a strict order. 743 00:40:18,840 --> 00:40:21,140 It orders all the events and then plays over them like a log. 744 00:40:21,330 --> 00:40:24,390 And this new event that you get, maybe it becomes the first event. 745 00:40:24,420 --> 00:40:26,320 Like you could go way to the beginning of history, right? 746 00:40:26,330 --> 00:40:28,910 Like you, you don't know because everything's eventually consistent. 747 00:40:29,453 --> 00:40:34,303 So if you do that linearization first and then chop up the documents, 748 00:40:34,563 --> 00:40:35,913 you have this problem where. 749 00:40:36,348 --> 00:40:39,168 If I do this chunking, or you do this chunking, well, it really depends 750 00:40:39,168 --> 00:40:40,938 on what history we have, right? 751 00:40:41,638 --> 00:40:46,608 And so it makes it very, very difficult to have a small amount of redundancy. 752 00:40:46,950 --> 00:40:49,593 So we found, two techniques helped us with this. 753 00:40:49,613 --> 00:40:55,270 One was, we take some particular, operation as a head and we 754 00:40:55,270 --> 00:40:56,450 say, ignore everything else. 755 00:40:56,490 --> 00:40:58,860 Only give me the history for this operation. 756 00:40:58,880 --> 00:41:00,190 Only instruct ancestors. 757 00:41:00,430 --> 00:41:03,340 So even if there's something concurrent, forget about all of that stuff. 758 00:41:04,520 --> 00:41:08,700 So that gets us something stable relative to a certain head. 759 00:41:08,721 --> 00:41:13,500 And then to know where the chunk boundaries are, we 760 00:41:13,510 --> 00:41:15,260 run a hash hardness metric. 761 00:41:15,460 --> 00:41:20,137 So, the number of zeros at the end of the hash for each op, gives 762 00:41:20,137 --> 00:41:23,117 you, you know, you can basically say, you know, each individual op, 763 00:41:23,897 --> 00:41:27,497 there may or may not be a 0, 0, 0, so I'm, I'm happy with, with anything. 764 00:41:28,007 --> 00:41:32,470 Or if I want it to be a range of, you know, 4, then give me two 0s at the 765 00:41:32,470 --> 00:41:35,680 end, because that will be, you know, 2 to the power of 2 is 4, so I'll chunk 766 00:41:35,680 --> 00:41:38,600 it up into 2s, and you, you make this as big or as small as you want, right? 767 00:41:38,600 --> 00:41:41,790 So now you have some way of probabilistically chunking up the 768 00:41:41,790 --> 00:41:43,797 documents, relative to some head. 769 00:41:44,122 --> 00:41:46,972 And you can say how big you want that to be based on this hash hardness metric. 770 00:41:47,745 --> 00:41:51,005 the advantage of this is even if we're doing things relative to 771 00:41:51,005 --> 00:41:54,955 different heads, now we're going to hit the same boundaries for these 772 00:41:54,955 --> 00:41:56,182 different, hash hardness metrics. 773 00:41:56,702 --> 00:41:58,942 So now we're sharing how we're chunking up the document. 774 00:41:59,692 --> 00:42:04,217 And we, Assume that on average, not all the time, but like on 775 00:42:04,247 --> 00:42:07,980 average, older, operations will have been seen by more people. 776 00:42:08,720 --> 00:42:10,680 So, or, you know, more and more peers. 777 00:42:11,620 --> 00:42:16,663 So, you're going to be appending things really to the end of the document, right? 778 00:42:17,143 --> 00:42:20,773 So you, you will less frequently have something concurrent with the 779 00:42:20,773 --> 00:42:22,613 first operation using this system. 780 00:42:22,950 --> 00:42:27,710 That means that we can get really good compression on older operations. 781 00:42:28,045 --> 00:42:30,645 Let's take, I'm just picking numbers out of the air here, but let's take 782 00:42:30,645 --> 00:42:34,175 the first two thirds of the document, which are relatively stable, compress 783 00:42:34,225 --> 00:42:35,835 those, we get really good compression. 784 00:42:36,295 --> 00:42:37,985 And then encrypt it and send it to the server. 785 00:42:38,318 --> 00:42:42,038 And then for the next, you know, of the remaining third, let's take the 786 00:42:42,228 --> 00:42:46,008 first two thirds of that and compress them and send them to the server. 787 00:42:46,228 --> 00:42:48,398 And then at some point we get each individual op. 788 00:42:48,845 --> 00:42:52,115 This means that as the, the document grows and changes. 789 00:42:52,520 --> 00:42:56,930 We can take these smaller chunks and as that gets pushed further and further into 790 00:42:56,930 --> 00:43:02,240 history, we can, whoever can actually read them, can recompress those ranges. 791 00:43:02,890 --> 00:43:06,877 So, Alex has this, I think, really fantastic, name for this, which is 792 00:43:06,987 --> 00:43:11,727 sedimen-tree because it's almost acting in sedimen-tree layers, but it's sedimen-tree 793 00:43:12,057 --> 00:43:14,237 because you get a tree of these layers. 794 00:43:14,377 --> 00:43:15,277 Yeah, it's cute, right? 795 00:43:15,680 --> 00:43:18,370 and so if you want to do a sync, like let's say you're doing a sync 796 00:43:18,370 --> 00:43:21,000 of like completely fresh, you've never seen the document before. 797 00:43:21,295 --> 00:43:25,045 You will get the really big chunk, and then you'll move up a layer, 798 00:43:25,045 --> 00:43:27,695 and you'll get the next biggest chunk of history, and then you move 799 00:43:27,725 --> 00:43:29,845 up a layer, and then eventually get like the last couple of ops. 800 00:43:30,225 --> 00:43:32,835 So we can get you really good compression, but again, it's this 801 00:43:32,855 --> 00:43:34,655 balance of the these two forces. 802 00:43:35,305 --> 00:43:38,195 Or, if you've already seen the first half of the document, you 803 00:43:38,195 --> 00:43:39,345 never have to sync that chunk again. 804 00:43:39,812 --> 00:43:44,005 You only need to get these higher layers of the sedimentary sync. 805 00:43:44,835 --> 00:43:46,925 So that's how we chunk up the document. 806 00:43:46,995 --> 00:43:49,775 Additionally, and I'm not at all going to go into how this thing works, 807 00:43:49,775 --> 00:43:53,325 but if people are into sync systems, this is like a pretty cool paper. 808 00:43:53,325 --> 00:43:56,772 It's called Practically Rateless Set Reconciliation is the name of the paper. 809 00:43:57,262 --> 00:44:02,592 And it does really interesting things with, compressing how, all the information 810 00:44:02,592 --> 00:44:04,202 you need to know what the other side has. 811 00:44:04,962 --> 00:44:09,685 So in half a round trip, so in one direction on average, you can get all 812 00:44:09,685 --> 00:44:13,578 the information you need to know what the delta is between your two sets. 813 00:44:13,948 --> 00:44:18,298 Literally, what are, what's the handful of ops that we've diverged by without 814 00:44:18,308 --> 00:44:19,858 having to send all of the hashes? 815 00:44:20,388 --> 00:44:22,318 so if people are into that stuff, go check out that paper. 816 00:44:22,348 --> 00:44:23,048 It's pretty cool. 817 00:44:23,098 --> 00:44:25,158 but there's a lot of detail in there that we're not, we're not 818 00:44:25,158 --> 00:44:26,568 going to cover on this podcast. 819 00:44:26,962 --> 00:44:28,952 Thanks a lot for explaining. 820 00:44:29,212 --> 00:44:33,123 I suppose it's like, Just a tip of the iceberg of like how Beelay works, 821 00:44:33,353 --> 00:44:37,450 but I think it's important to get a feeling for like, this is a new world 822 00:44:37,460 --> 00:44:42,370 in a way where it's decentralized, it is encrypted, et cetera. 823 00:44:42,400 --> 00:44:47,560 There's like really hard constraints what certain things can do since you could 824 00:44:47,560 --> 00:44:52,650 say like in your traditional development mindset, you would just say like, yeah, 825 00:44:52,740 --> 00:44:56,953 let's treat the client like it's just like a, like a Kindle, with like no 826 00:44:56,953 --> 00:45:01,263 CPU in it let's have the server do as much as the heavy lifting as possible. 827 00:45:01,273 --> 00:45:04,613 I think that's like a, the muscle that we're used to so far. 828 00:45:04,923 --> 00:45:11,093 But in this case, the server, even if it has a super beefy machine, et cetera, it 829 00:45:11,213 --> 00:45:15,643 can't really do that because it doesn't have access to do all of this work. 830 00:45:15,643 --> 00:45:17,353 So the clients need to do it. 831 00:45:17,713 --> 00:45:21,483 And, and when the clients independently do so, They need to 832 00:45:21,553 --> 00:45:23,283 eventually end up in the same spot. 833 00:45:23,363 --> 00:45:27,553 Otherwise the entire system, falls over or it gets very inefficient. 834 00:45:27,823 --> 00:45:30,903 So that sounds like a really elegant system that, that you're 835 00:45:30,903 --> 00:45:32,373 like working on in that regard. 836 00:45:32,903 --> 00:45:37,773 So with Beehive overall, like again, you're starting out here with 837 00:45:38,133 --> 00:45:43,473 Automerge as the driving system that drives the requirements, et cetera. 838 00:45:43,493 --> 00:45:48,163 But I think your, bigger ambition here, your bigger goals, is that this 839 00:45:48,313 --> 00:45:54,473 actually becomes a system that is, that at some point goes beyond just 840 00:45:54,513 --> 00:45:59,543 applying to Automerge, and that being a system that applies to many more other 841 00:45:59,543 --> 00:46:01,693 local-first technologies in the space. 842 00:46:01,993 --> 00:46:07,373 If there are application framework authors or like, like other people building a 843 00:46:07,373 --> 00:46:11,803 sync system, et cetera, and they'd be interested in seeing like, Hmm, instead 844 00:46:11,803 --> 00:46:17,233 of like us trying to come up with our own, research here for like what it 845 00:46:17,233 --> 00:46:23,167 means to do, authentication authorization for our sync system, particularly if 846 00:46:23,167 --> 00:46:25,127 you're doing it in a decentralized way. 847 00:46:25,437 --> 00:46:30,987 What would be a good way for those frameworks, those technologies to 848 00:46:30,987 --> 00:46:32,857 jump on the, the Beehive wagon. 849 00:46:33,313 --> 00:46:36,943 so if they're already using Automerge, I think that'll be 850 00:46:37,203 --> 00:46:38,323 pretty straightforward, right? 851 00:46:38,403 --> 00:46:40,653 You'll have bindings, it'll just work. 852 00:46:40,920 --> 00:46:45,570 but Beehive doesn't have a hard dependency on Automerge at all. 853 00:46:45,710 --> 00:46:50,140 because it lives at this layer below and we, Early on, we're like, well, should 854 00:46:50,140 --> 00:46:51,930 we just weld it directly into Automerge? 855 00:46:51,930 --> 00:46:54,640 Or like, you know, how much does it really need to know about it? 856 00:46:55,080 --> 00:46:58,240 and where we landed on this was you just need to have some kind 857 00:46:58,240 --> 00:47:02,410 of way of saying, here's the partial order between these events. 858 00:47:02,797 --> 00:47:03,807 and then everything works. 859 00:47:04,697 --> 00:47:07,097 So, as, just as a intuition. 860 00:47:07,467 --> 00:47:11,960 You could put Git inside of, Beehive, and it would work, I don't think 861 00:47:11,960 --> 00:47:14,960 GitHub's gonna adopt this anytime soon, but like, if you had your own 862 00:47:14,980 --> 00:47:18,060 Git syncing system, like, you, you could do this, and, and it would work. 863 00:47:18,270 --> 00:47:22,490 you just need to have some way of ordering, events next to each other. 864 00:47:22,642 --> 00:47:27,352 and yes, then you have to get a little bit more into slightly lower level APIs. 865 00:47:27,362 --> 00:47:32,002 So I, when I build stuff, I tend to work in layers of like, here's the very 866 00:47:32,002 --> 00:47:35,742 low level primitives, and then here's a slightly higher level, and a slightly 867 00:47:35,742 --> 00:47:37,432 higher level, and a slightly lower level. 868 00:47:37,488 --> 00:47:40,288 so people using it from Automerge will just have add member, remove 869 00:47:40,288 --> 00:47:41,648 member, and like, everything works. 870 00:47:41,955 --> 00:47:46,335 to go down one layer, you have to wire into it, here's how to do ordering. 871 00:47:47,400 --> 00:47:47,910 And that's it. 872 00:47:48,460 --> 00:47:50,860 And then everything else should, should wire all the way through. 873 00:47:51,310 --> 00:47:53,693 And you have to be able to pass it, serialized bytes. 874 00:47:53,763 --> 00:47:56,853 So, like, Beehive doesn't know anything about this compression that we were 875 00:47:56,883 --> 00:47:58,083 just talking about that Automerge does. 876 00:47:58,643 --> 00:48:02,063 But you tell it, hey, this is, you know, this is some batch, this is 877 00:48:02,123 --> 00:48:03,633 some, like, archive that I want to do. 878 00:48:03,783 --> 00:48:06,460 It starts at this timestamp and ends at that timestamp, 879 00:48:06,460 --> 00:48:07,760 or, you know, logical clock. 880 00:48:07,913 --> 00:48:09,043 please encrypt this for me. 881 00:48:09,063 --> 00:48:10,143 And it goes, sure, here you go. 882 00:48:10,553 --> 00:48:11,023 Encrypted. 883 00:48:11,563 --> 00:48:12,663 And, you know, off it goes. 884 00:48:12,863 --> 00:48:15,447 So it has very, very few, assumptions 885 00:48:15,610 --> 00:48:18,960 That's certainly something that I might also pick up a bit further down the 886 00:48:18,960 --> 00:48:23,873 road myself for, for LiveStore where the underlaying substrate to sync data 887 00:48:23,893 --> 00:48:26,603 around is like a ordered event log. 888 00:48:26,993 --> 00:48:29,377 And, if I'm encrypting those events. 889 00:48:29,762 --> 00:48:34,845 then I think that fulfills, perfectly the requirements that you've listed, 890 00:48:34,955 --> 00:48:37,255 which are very few for, for Beehive. 891 00:48:37,515 --> 00:48:40,745 So I'm really looking forward to once that gets further along. 892 00:48:40,745 --> 00:48:43,725 So speaking of like, where is Beehive right now? 893 00:48:43,815 --> 00:48:49,242 I've seen the, lab notebooks from what you have been working on at Ink & Switch. 894 00:48:49,487 --> 00:48:52,802 can I get my hands on Beehive already right now? 895 00:48:52,812 --> 00:48:53,822 Where is it at? 896 00:48:54,024 --> 00:48:55,838 what are the plans for the coming years? 897 00:48:56,133 --> 00:48:59,163 So at the time that we're recording this, at least, which is in early 898 00:48:59,163 --> 00:49:02,768 December, there's unfortunately not, not a publicly available version of it. 899 00:49:02,768 --> 00:49:06,393 I really hoped we'd have it ready by now, but, unfortunately we're still, wrapping 900 00:49:06,393 --> 00:49:08,885 up the last few, items in, in there. 901 00:49:09,067 --> 00:49:11,623 but, Q1, we plan to have, a release. 902 00:49:12,003 --> 00:49:16,312 as I mentioned before, there are some changes required, to Automerge to consume. 903 00:49:16,488 --> 00:49:19,578 specifically to, to manage revocation history. 904 00:49:19,598 --> 00:49:22,883 So somebody got kicked out, but we're still in this eventually consistent world. 905 00:49:23,043 --> 00:49:24,763 Automerge needs to know how to manage that. 906 00:49:24,954 --> 00:49:25,464 But. 907 00:49:25,684 --> 00:49:30,451 Managing things, sync, encryption, all of that stuff, we, we hope to have 908 00:49:30,471 --> 00:49:33,941 in, I'm not going to commit, commit the team to any particular, timeframe 909 00:49:33,941 --> 00:49:36,861 here, but like, we'll, we'll say in the next few, in the next coming weeks. 910 00:49:37,154 --> 00:49:39,281 right now the team is, myself. 911 00:49:39,488 --> 00:49:43,321 John Mumm, who joined a couple months into the project, and has been working 912 00:49:43,341 --> 00:49:48,341 on, BeeKEM, focused primarily on BeeKEM, which is a, again, I'm just going to 913 00:49:48,341 --> 00:49:51,058 throw out words here for people that are interested in this stuff, related to 914 00:49:51,058 --> 00:49:55,584 TreeKEM, but we made a concurrent, Which is based on, MLS or one of the primitives 915 00:49:55,584 --> 00:49:56,934 for, for messaging layer security. 916 00:49:57,351 --> 00:49:58,361 he's been doing great work there. 917 00:49:58,371 --> 00:50:02,318 And, Alex, amongst the many, many things that Alex Good does between 918 00:50:02,478 --> 00:50:07,038 writing the sync system and maintaining Automerge and all of these, you 919 00:50:07,038 --> 00:50:11,031 know, community stuff that he does, has also been, lending a hand. 920 00:50:11,364 --> 00:50:15,131 So I'm sure there's like for, for Beehive in a way you're, Just 921 00:50:15,141 --> 00:50:19,301 scratching the surface and there's probably enough work here for, to 922 00:50:19,301 --> 00:50:24,034 fill like another few years, maybe even decades worth of ambitious work. 923 00:50:24,284 --> 00:50:28,494 Can you paint a picture of like, what are some of like the, like right now 924 00:50:28,494 --> 00:50:32,784 you're probably working through the kind of POC or just the table stakes things. 925 00:50:33,014 --> 00:50:36,598 What are some of like the, way more ambitious longterm things 926 00:50:36,598 --> 00:50:39,248 that you would like to see in under the umbrella of Beehive? 927 00:50:39,721 --> 00:50:40,171 Yeah. 928 00:50:40,181 --> 00:50:41,666 So, There's a few. 929 00:50:41,766 --> 00:50:42,146 Yes. 930 00:50:42,229 --> 00:50:45,059 and we have this running list internally of like, what would a V2 look like? 931 00:50:45,335 --> 00:50:48,415 So, one is, adding a little policy language. 932 00:50:48,425 --> 00:50:51,655 I think it's just like the, bang for the buck that you get on having 933 00:50:51,655 --> 00:50:53,355 something like UCAN's policy language. 934 00:50:53,355 --> 00:50:54,079 It's just so high. 935 00:50:54,115 --> 00:50:55,605 It just gives you so much flexibility. 936 00:50:56,109 --> 00:51:00,362 hiding the membership, from even the sync server, is possible. 937 00:51:00,484 --> 00:51:02,115 it's just requires more engineering. 938 00:51:02,312 --> 00:51:06,119 so there are many, many places in here where, zero knowledge proofs, I 939 00:51:06,129 --> 00:51:09,735 think, would be very, Useful, for, for people who knows, know what those are. 940 00:51:09,789 --> 00:51:14,579 essentially it would let the sync server say, yes, I can send you bytes 941 00:51:14,589 --> 00:51:15,939 without knowing anything about you. 942 00:51:16,734 --> 00:51:17,164 Right, 943 00:51:17,164 --> 00:51:19,724 but it would still deny others. 944 00:51:19,734 --> 00:51:22,644 And right now it basically needs to run more logic to actually 945 00:51:22,644 --> 00:51:24,854 enforce those auth rules. 946 00:51:25,494 --> 00:51:25,804 Yeah. 947 00:51:25,804 --> 00:51:30,910 So today you have to, sign a message that says, I signed this with the same 948 00:51:30,910 --> 00:51:36,630 private key that you know about the public key for in this membership, we 949 00:51:36,630 --> 00:51:39,100 can hide the entire membership from the sync server and still do this. 950 00:51:39,375 --> 00:51:41,715 Without revealing even who's making the request, right? 951 00:51:41,715 --> 00:51:42,865 Like, that would be awesome. 952 00:51:43,052 --> 00:51:45,552 in fact, and this is a bit of a tangent, I think there's a number 953 00:51:45,552 --> 00:51:49,192 of places where, that class of technology would be really helpful. 954 00:51:49,232 --> 00:51:53,635 Even for things like, in CRDTs, there's this challenge where you have 955 00:51:53,635 --> 00:51:54,735 to keep all the history for all time. 956 00:51:55,064 --> 00:51:58,144 and I think with zero knowledge proofs, we can actually, like, this, this would 957 00:51:58,144 --> 00:52:02,514 very much be a research project, but I, I think it's possible to delete history, but 958 00:52:02,514 --> 00:52:06,110 still maintain cryptographic proofs, that things were done correctly and compress 959 00:52:06,110 --> 00:52:10,770 that down to, you know, a couple bytes, basically, but that's a bit of a tangent. 960 00:52:10,884 --> 00:52:12,974 I would love to work on that at some point in the future, but for, for 961 00:52:13,174 --> 00:52:17,340 Beehive, yeah, hiding more metadata, Hiding, you know, the membership 962 00:52:17,350 --> 00:52:21,087 from, from the group, making it, all the signatures post quantum. 963 00:52:21,347 --> 00:52:26,530 that is like even the main, recommendations from, from NIST, the U. 964 00:52:26,530 --> 00:52:26,630 S. 965 00:52:26,630 --> 00:52:30,920 government agency that that handles these things only just came out. 966 00:52:30,930 --> 00:52:34,850 So, you know, we're still kind of waiting for good libraries on it and, you know, 967 00:52:34,850 --> 00:52:36,664 all, all of this stuff and what have you. 968 00:52:36,899 --> 00:52:40,292 But yeah, making it post quantum, or fully, big chunks of it are already 969 00:52:40,292 --> 00:52:42,832 post quantum, but making it fully post quantum, would, would be great. 970 00:52:43,205 --> 00:52:46,597 and then yeah, adding all kinds of, bells and whistles and features, you know, 971 00:52:46,617 --> 00:52:49,691 making it faster, adding, it's not going to have its own compression, because it 972 00:52:50,041 --> 00:52:54,102 relies so heavily on cryptography, So it doesn't compress super well, right? 973 00:52:54,102 --> 00:52:57,832 So we're going to need to figure out our own version of, you know, 974 00:52:58,022 --> 00:52:59,202 Automerge has run length encoding. 975 00:52:59,222 --> 00:53:02,332 What is our version of that, given that we can't run length encode 976 00:53:02,532 --> 00:53:04,259 easily, encrypted things, right? 977 00:53:04,259 --> 00:53:05,999 Or, or signatures or, you know, all, all of this. 978 00:53:06,109 --> 00:53:08,219 so there's a lot of stuff, down, down in the plumbing. 979 00:53:08,249 --> 00:53:10,719 Plus I think this policy language would be really, really helpful. 980 00:53:11,201 --> 00:53:12,061 That sounds awesome. 981 00:53:12,061 --> 00:53:16,861 Both in terms of new features, capabilities, no pun intended, being 982 00:53:16,901 --> 00:53:22,401 added here, but also in terms of just, removing overhead from the system and like 983 00:53:22,411 --> 00:53:27,631 simplifying the surface area by doing, more of like clever work internally, 984 00:53:27,651 --> 00:53:29,671 which simplifies the system overall. 985 00:53:29,681 --> 00:53:31,061 That sounds very intriguing. 986 00:53:31,497 --> 00:53:35,397 The, the other thing worth noting with this, just, I think both to show point 987 00:53:35,397 --> 00:53:38,987 away into the future and then also draw a boundary over where what Beehive 988 00:53:39,007 --> 00:53:41,034 does and doesn't do, is identity. 989 00:53:41,411 --> 00:53:46,821 so Beehive only knows about public keys because those are universal. 990 00:53:46,851 --> 00:53:47,671 They work everywhere. 991 00:53:47,991 --> 00:53:50,261 They don't require a naming system, any of this stuff. 992 00:53:50,724 --> 00:53:55,294 we have lots of ideas and opinions on how to do a naming system. 993 00:53:55,381 --> 00:53:58,507 but you know, if, if you look at, for example, uh, BlueSky, under 994 00:53:58,507 --> 00:54:02,097 the hood, all of the accounts are managed with public keys, and then 995 00:54:02,097 --> 00:54:04,187 you map a name to them using DNS. 996 00:54:04,826 --> 00:54:07,616 So either you're using, you know, myname. 997 00:54:07,616 --> 00:54:07,646 bluesky. 998 00:54:07,646 --> 00:54:11,696 social, or you have your own domain name like I'm expede.Wtf 999 00:54:12,146 --> 00:54:13,566 on BlueSky, for example, right? 1000 00:54:13,566 --> 00:54:15,426 Because I own that domain name and I can edit the text record. 1001 00:54:15,709 --> 00:54:19,876 and that's great and it, definitely gives users a lot of agency over 1002 00:54:20,086 --> 00:54:21,586 how to name themselves, right? 1003 00:54:21,586 --> 00:54:24,406 Or, you know, there are other related systems. 1004 00:54:24,666 --> 00:54:28,156 But it's not local-first because it relies on DNS. 1005 00:54:28,406 --> 00:54:32,957 So, like, how could I invite you to a group without having to know your public 1006 00:54:32,987 --> 00:54:35,894 key, We're probably going to ship, I would say, just because it's like 1007 00:54:35,894 --> 00:54:40,334 relatively easy to do, a system called Edge Names, based on pet names, where 1008 00:54:40,334 --> 00:54:42,024 basically I say, here's my contact book. 1009 00:54:42,284 --> 00:54:43,104 I invited you. 1010 00:54:43,124 --> 00:54:45,124 And at the time I invited you, I named you. 1011 00:54:45,437 --> 00:54:46,627 Johannes right? 1012 00:54:46,627 --> 00:54:52,362 And I named Peter, Peter, and so on and so forth, but there's no way to prove 1013 00:54:52,362 --> 00:54:54,222 that that's just my name for them. 1014 00:54:54,402 --> 00:54:54,662 Right. 1015 00:54:54,662 --> 00:54:59,276 And for these people, and having a more universal system where 1016 00:54:59,276 --> 00:55:02,186 I could invite somebody by like their email address, for example, I 1017 00:55:02,206 --> 00:55:03,226 think would be really interesting. 1018 00:55:03,726 --> 00:55:05,312 Back at Fission, Blaine Cook. 1019 00:55:06,007 --> 00:55:09,051 Who's also done a bunch of stuff with Ink & Switch in the past, had proposed 1020 00:55:09,051 --> 00:55:12,947 this system, the NameName system, that would give you local-first names 1021 00:55:12,947 --> 00:55:17,017 that were rooted in things like email, so you could invite somebody with 1022 00:55:17,047 --> 00:55:21,934 their email address and A local-first system could validate that that person 1023 00:55:21,934 --> 00:55:23,194 actually had control over that email. 1024 00:55:23,604 --> 00:55:24,984 It was a very interesting system. 1025 00:55:25,154 --> 00:55:29,507 So there's a lot of work to be done in identity as separate from, authorization. 1026 00:55:29,567 --> 00:55:30,237 Right, yeah. 1027 00:55:30,277 --> 00:55:35,569 I feel like there just always, There's so much interesting stuff happening 1028 00:55:35,579 --> 00:55:39,809 across the entire spectrum from, like, the world that we're currently in, 1029 00:55:40,022 --> 00:55:45,529 which is mostly centralized, for just in terms of, like, that things work at 1030 00:55:45,549 --> 00:55:50,839 all, and even there, it's hard to keep things up to date and, like, working, 1031 00:55:50,909 --> 00:55:53,789 et cetera, but we want to aim higher. 1032 00:55:54,064 --> 00:55:59,701 And one way to improve things a lot is like by going more decentralized but 1033 00:55:59,701 --> 00:56:04,851 there's like so many hard problems to tame and like, we're starting to just peel 1034 00:56:04,851 --> 00:56:07,111 off like the layers from the onion here. 1035 00:56:07,751 --> 00:56:12,386 And, Automerge I think is a, is a great, canonical case study there, like it has 1036 00:56:12,396 --> 00:56:17,502 started with the data and now things are around, authorization, et cetera. 1037 00:56:17,502 --> 00:56:21,252 And like, then authentication, identity there, we probably have 1038 00:56:21,252 --> 00:56:25,342 enough research work ahead of us for, for the coming decades to come. 1039 00:56:25,372 --> 00:56:29,702 And super, super cool to see that so many bright minds are working on it. 1040 00:56:29,809 --> 00:56:33,449 maybe one last question in regards to Beehive. 1041 00:56:34,464 --> 00:56:38,814 When there's a lot of cryptography involved, that also means there's 1042 00:56:38,904 --> 00:56:43,434 even more CPU cycles that need to be spent to make stuff work. 1043 00:56:43,714 --> 00:56:48,687 have you been looking into some, performance benchmarks, when you, let's 1044 00:56:48,687 --> 00:56:54,781 say you want to synchronize a certain, history of Automerge for some Automerge 1045 00:56:54,801 --> 00:57:00,444 documents, with Beehive disabled and with Beehive enabled, do you see like 1046 00:57:00,444 --> 00:57:05,714 a certain factor of like how much it gets slower with, Beehive and sort of 1047 00:57:05,724 --> 00:57:09,884 the authorization rules applied both on the client as well as on the server? 1048 00:57:10,324 --> 00:57:10,724 Yeah. 1049 00:57:10,804 --> 00:57:12,301 So, it's a great question. 1050 00:57:12,517 --> 00:57:14,827 so obviously there's different dimensions in, in Beehive, right? 1051 00:57:14,828 --> 00:57:19,107 So for encryption, which is where I would say most people would expect there 1052 00:57:19,107 --> 00:57:21,107 to be the, the performance overhead. 1053 00:57:21,334 --> 00:57:22,674 There's absolutely overhead there. 1054 00:57:22,754 --> 00:57:26,844 You're, you're doing decryption, but we're using algorithms that decrypt on the 1055 00:57:26,844 --> 00:57:29,214 order of like multiple gigabytes a second. 1056 00:57:29,804 --> 00:57:31,934 So it's fine, basically. 1057 00:57:32,131 --> 00:57:35,014 and that's also part of why we wanted to chunk things up in this way, 1058 00:57:35,014 --> 00:57:37,534 because when we get good compression, you know, all, all of this stuff. 1059 00:57:37,834 --> 00:57:42,064 So if you're doing like a total, you know, first time you've seen this document, 1060 00:57:42,064 --> 00:57:44,874 you've got to pull everything and decrypt everything and hand it off to Automerge. 1061 00:57:45,311 --> 00:57:46,541 the, the encryption's not. 1062 00:57:46,861 --> 00:57:47,631 going to be the bottleneck. 1063 00:57:48,414 --> 00:57:53,824 and then on like a rolling basis, like as you know, per keystroke, yes, there 1064 00:57:53,824 --> 00:57:58,564 there's absolutely overhead there, but remember this is relative to latency. 1065 00:57:59,424 --> 00:58:03,584 So if you have 200 milliseconds of latency, that's your bottleneck. 1066 00:58:03,664 --> 00:58:08,921 It's not going to be the five milliseconds of, of encryption that we're doing or 1067 00:58:08,921 --> 00:58:13,691 signatures or, or whatever it is, there's a space cost because now we have to keep. 1068 00:58:14,106 --> 00:58:17,856 Public keys, which are 32 bytes, and signatures, which are 64 bytes. 1069 00:58:19,496 --> 00:58:21,986 So there is some overhead in space. 1070 00:58:22,479 --> 00:58:23,129 that happens. 1071 00:58:23,579 --> 00:58:26,359 but for the most part we've taken, we've chosen algorithms that 1072 00:58:26,379 --> 00:58:28,449 are known to be very, very fast. 1073 00:58:28,479 --> 00:58:30,519 They're, they're sort of like the, the, the best in class. 1074 00:58:30,589 --> 00:58:33,614 So I'll just rattle down, down, down a list for the, the, the, the best. 1075 00:58:33,854 --> 00:58:34,814 People that are interested. 1076 00:58:34,997 --> 00:58:40,564 so we're using, EdDSA Edwards Keys for signatures, and key exchange, chacha 1077 00:58:40,714 --> 00:58:43,624 for encryption, and BLAKE3 for hashing. 1078 00:58:44,101 --> 00:58:45,447 BLAKE3 is very interesting what you do. 1079 00:58:45,447 --> 00:58:47,307 Things like verifiable, streams. 1080 00:58:47,307 --> 00:58:50,487 So like as you're streaming the data in, you can start hashing even 1081 00:58:50,697 --> 00:58:52,287 parts of it as you're going along. 1082 00:58:52,727 --> 00:58:56,711 the really big, bottleneck, the, like, the heaviest part of the system. 1083 00:58:57,126 --> 00:59:00,396 or, or sorry, the part that we were at least happy with our original design on 1084 00:59:00,406 --> 00:59:05,582 that we then ended up doing a bunch of research on was, doing key agreement. 1085 00:59:06,252 --> 00:59:12,002 So if I have whatever, a thousand people in a company, and they're all, 1086 00:59:12,012 --> 00:59:14,882 you know, working on this document, I don't want to have to send a 1087 00:59:14,882 --> 00:59:18,282 thousand messages every time I change the key, which will be rotated. 1088 00:59:18,792 --> 00:59:22,082 every message, let's say, or you know, once a day, if we're being, 1089 00:59:22,092 --> 00:59:23,552 you know, more conservative with it. 1090 00:59:24,009 --> 00:59:27,009 and that's a lot of data and a lot of just like latency on 1091 00:59:27,009 --> 00:59:28,079 this and just a lot of network. 1092 00:59:29,649 --> 00:59:32,859 So we switched to, instead of it being linear, we found a way 1093 00:59:32,859 --> 00:59:34,679 of doing it in logarithmic time. 1094 00:59:35,299 --> 00:59:39,189 So we can now do key rotations concurrently, like totally eventually 1095 00:59:39,189 --> 00:59:41,386 consistently, in log n time. 1096 00:59:41,646 --> 00:59:47,421 and That has been, a lot of research, happened in there, but then that let 1097 00:59:47,421 --> 00:59:48,841 us scale up much, much, much more. 1098 00:59:48,841 --> 00:59:52,101 So the prior algorithm that we were using off the shelf from a paper 1099 00:59:52,501 --> 00:59:55,851 scaled up to, in the paper, they say about like 128 people, right? 1100 00:59:55,851 --> 00:59:58,611 It's sort of like your upper bound and we're like, uh, you know, we had set 1101 00:59:58,611 --> 01:00:01,954 ourselves these, these higher, levels that we actually want to work with. 1102 01:00:02,516 --> 01:00:04,932 and so now we can scale into, into the thousands. 1103 01:00:05,112 --> 01:00:07,692 When you get up to 50,000 people, yeah, it starts to slow down. 1104 01:00:07,692 --> 01:00:11,082 You start to get into, you know, closer to a second if you're doing, 1105 01:00:11,389 --> 01:00:14,839 very, very concurrent, you know, uh, 40,000 of the 50,000 people 1106 01:00:14,839 --> 01:00:16,069 are doing concurrent key rotations. 1107 01:00:16,904 --> 01:00:19,024 Doesn't happen very often, but like it could happen. 1108 01:00:19,401 --> 01:00:21,781 if one person's doing an update, then it'll happen. 1109 01:00:21,944 --> 01:00:23,867 in, like you won't even notice it. 1110 01:00:23,937 --> 01:00:24,277 Right. 1111 01:00:24,437 --> 01:00:26,957 So it depends on how heavily concurrent your document is. 1112 01:00:26,967 --> 01:00:28,557 Do you have 40, 000 people writing to your document? 1113 01:00:28,667 --> 01:00:28,867 Yeah. 1114 01:00:28,867 --> 01:00:29,947 You're going to see it slow down a little bit. 1115 01:00:30,384 --> 01:00:31,834 It's so amazing to see that. 1116 01:00:32,064 --> 01:00:36,724 I mean, in academia, there is so much progress in those various fields. 1117 01:00:36,764 --> 01:00:41,994 And I feel like in local-first, we actually get to benefit and like directly 1118 01:00:42,004 --> 01:00:45,814 apply a lot of like those, those great achievements from other places where 1119 01:00:45,814 --> 01:00:49,332 like, we can now like it makes a, Big difference for the applications that 1120 01:00:49,332 --> 01:00:53,812 we'll be using, whether there is a cryptographic breakthrough in efficiency 1121 01:00:53,812 --> 01:00:57,372 or being more long term secure, et cetera. 1122 01:00:57,382 --> 01:01:02,162 And like, I fully agree that latency is probably by far the most important 1123 01:01:02,192 --> 01:01:06,442 one when it comes to does it make a difference or not, but if my, like 1124 01:01:06,619 --> 01:01:08,779 battery usage, et cetera, is another one. 1125 01:01:08,829 --> 01:01:13,256 And like, If I synchronize data a lot, maybe I open a lot of data, like a lot 1126 01:01:13,256 --> 01:01:17,236 of documents just once because maybe I'm reviewing documents a lot and 1127 01:01:17,246 --> 01:01:20,656 like someone sends it, or maybe I'm an executive, I get to review a lot of 1128 01:01:20,656 --> 01:01:26,296 documents and I like, I don't really amortize the documents too much because 1129 01:01:26,296 --> 01:01:28,766 I don't reuse them on a day to day basis. 1130 01:01:28,796 --> 01:01:33,086 I think that initial sync also tends to matter quite a bit. 1131 01:01:33,661 --> 01:01:37,398 But, it's great to hear that, efficiency seems to be already, 1132 01:01:37,721 --> 01:01:39,061 very well under control. 1133 01:01:39,681 --> 01:01:45,898 So maybe rounding out this, you've been at Fission, you've been seeing, like, the 1134 01:01:45,898 --> 01:01:51,048 innovation around local-first in, like, three buckets auth data and compute. 1135 01:01:51,438 --> 01:01:54,526 As mentioned before, on this podcast, we've mostly been 1136 01:01:54,526 --> 01:01:56,506 exploring the data aspect. 1137 01:01:56,746 --> 01:02:00,976 Now we went quite deep on some of your work in regards to auth. 1138 01:02:01,446 --> 01:02:06,066 We don't have too much time to spend on something else, but I'm curious 1139 01:02:06,076 --> 01:02:12,566 whether you can just seed some ideas in regards to what does, where does compute 1140 01:02:12,711 --> 01:02:15,161 fit in this new local-first world? 1141 01:02:15,161 --> 01:02:21,144 Like, if you could fork yourself and like do a lot more work, what would you do be 1142 01:02:21,154 --> 01:02:23,264 doing in regards to that compute bucket? 1143 01:02:23,888 --> 01:02:24,258 Yeah. 1144 01:02:24,268 --> 01:02:27,904 So, we, we had a project, related to compute at Fission, 1145 01:02:27,924 --> 01:02:28,994 right, right at the end. 1146 01:02:29,054 --> 01:02:32,233 and, I'm very fortunate that I actually have some grants to continue that 1147 01:02:32,233 --> 01:02:33,713 work after I finish with Beehive. 1148 01:02:33,713 --> 01:02:36,872 I'll switch to that and then, after that project, see what else is, 1149 01:02:36,873 --> 01:02:38,383 is, is interesting kicking around. 1150 01:02:38,459 --> 01:02:42,973 but, essentially the motivation is, all the compute for local-first stuff happens 1151 01:02:43,074 --> 01:02:47,014 Completely locally today, or you're talking to some cloud service, right? 1152 01:02:47,014 --> 01:02:48,614 Like maybe you're using an LLM. 1153 01:02:48,624 --> 01:02:53,404 So you go to, you know, use the open AI APIs, that kind of thing. 1154 01:02:53,858 --> 01:02:58,408 but what if you're on a very low powered device and you're on a plane? 1155 01:02:58,633 --> 01:02:59,053 Right. 1156 01:02:59,176 --> 01:03:02,303 you know, you still need to be able to do some compute some of the time. 1157 01:03:02,326 --> 01:03:05,176 So the, the trade off that we're trying to, to strike in, in these 1158 01:03:05,176 --> 01:03:08,816 kinds of projects is, what if I can always run it even slowly? 1159 01:03:08,816 --> 01:03:11,816 So let's say I'm rendering a 3D scene and it's gonna take a, a minute 1160 01:03:11,816 --> 01:03:18,349 to paint, versus I have a, desktop computer, you know, nearby and I can 1161 01:03:18,349 --> 01:03:22,439 farm that drop out to that machine because it's nearby in latency, 1162 01:03:22,869 --> 01:03:25,109 and it has more compute resources. 1163 01:03:25,489 --> 01:03:30,619 Or maybe, I need to send email to a mail server that only exists in one place. 1164 01:03:30,639 --> 01:03:35,779 Like, how can I do these, you know, compute dynamically where I can 1165 01:03:35,789 --> 01:03:40,129 always run my jobs or my resource management whenever, whenever possible. 1166 01:03:40,129 --> 01:03:42,489 Email server is a case where you can't always do this, right? 1167 01:03:42,889 --> 01:03:44,429 But when somebody else could run it. 1168 01:03:45,214 --> 01:03:46,604 Maybe I can farm that out to them instead. 1169 01:03:46,904 --> 01:03:53,171 so there's a lot of interest, I think, in how do we bridge between what is 1170 01:03:53,171 --> 01:03:56,691 sometimes called in the blue sky world, big world versus small world, right? 1171 01:03:56,691 --> 01:03:57,671 So I have my local stuff. 1172 01:03:57,701 --> 01:03:59,271 I'm doing things entirely on my own. 1173 01:03:59,301 --> 01:04:00,571 I'm completely offline. 1174 01:04:00,991 --> 01:04:02,231 And that is the baseline. 1175 01:04:02,761 --> 01:04:05,851 But when I am online, how much more powerful can it get? 1176 01:04:06,351 --> 01:04:10,091 Can I, you know, I'm not going to ingest the entire blue sky firehose myself. 1177 01:04:10,351 --> 01:04:11,861 I'm going to leave that to an indexer. 1178 01:04:12,846 --> 01:04:13,646 To do for me. 1179 01:04:13,926 --> 01:04:17,606 So when I'm online, maybe I can get better search, right? 1180 01:04:17,966 --> 01:04:20,866 Things like this, or maybe if I'm rendering PDFs, maybe I want to farm 1181 01:04:20,866 --> 01:04:25,439 that out to some, server somewhere rather than doing that with Wasm in my browser. 1182 01:04:25,909 --> 01:04:28,139 So kind of progressively enhancing the app. 1183 01:04:28,139 --> 01:04:31,889 And I think, there's a lot of like recent, Oh, even more relevant 1184 01:04:31,899 --> 01:04:35,879 with AI, but like with AI, this is particularly more relevant because 1185 01:04:35,909 --> 01:04:38,749 now suddenly, we get lot of work. 1186 01:04:38,949 --> 01:04:43,489 to be done that get massively benefits from a lot of compute. 1187 01:04:43,929 --> 01:04:47,469 And with AI, in particular, I think it's also like, now we're 1188 01:04:47,679 --> 01:04:49,259 in this, in this tricky spot. 1189 01:04:49,539 --> 01:04:53,923 Either we already get to live in the future, but that means, typically all of 1190 01:04:54,043 --> 01:04:58,848 like our Our AI intelligence is coming from like some very beefy servers and 1191 01:04:58,848 --> 01:05:03,778 some data centers and the way how I get that instant, those, these enhancements 1192 01:05:03,818 --> 01:05:08,868 is by just sending over all of like my context data into those servers. 1193 01:05:09,101 --> 01:05:13,218 well, I guess you could get those beefy servers also, next to your 1194 01:05:13,218 --> 01:05:17,328 desk, but that is a very expensive and I think not very practical. 1195 01:05:17,766 --> 01:05:21,726 I guess step by step, like now the newest MacBooks, et cetera, are already 1196 01:05:21,726 --> 01:05:25,836 like very capable and running things locally, but there will be always like 1197 01:05:26,006 --> 01:05:30,616 a reason that you want to, fan things out a bit more, but doing so in a 1198 01:05:30,616 --> 01:05:34,903 way that preserves like your, privacy around your data, et cetera, like 1199 01:05:34,913 --> 01:05:37,703 leverages your, your resources properly. 1200 01:05:37,713 --> 01:05:41,853 Like, if I'm just looking around myself, like I have an iPad over here, 1201 01:05:41,853 --> 01:05:44,423 which sits entirely idle, et cetera. 1202 01:05:44,753 --> 01:05:45,213 So. 1203 01:05:45,728 --> 01:05:50,541 It's as with most things, in regards to application developers, if it's 1204 01:05:50,851 --> 01:05:55,408 the right thing, it should be easy and doing, compute in sort of a 1205 01:05:55,438 --> 01:05:57,868 distributed way is by far not easy. 1206 01:05:58,158 --> 01:06:01,778 So very excited to, to hear that you want to explore this more. 1207 01:06:02,091 --> 01:06:02,431 Yeah. 1208 01:06:02,451 --> 01:06:06,511 Well, and you know, especially things like AI, you know, the, the question 1209 01:06:06,511 --> 01:06:10,911 always is I should never be cut off from, from performing actions, if 1210 01:06:10,911 --> 01:06:13,991 possible, like when possible, sometimes something lives at a particular 1211 01:06:13,991 --> 01:06:15,051 place and I'm not connected to it. 1212 01:06:15,221 --> 01:06:15,861 Fine, right? 1213 01:06:16,168 --> 01:06:18,478 email being, you know, the canonical example here. 1214 01:06:18,518 --> 01:06:19,788 Mail server lives in one place. 1215 01:06:19,838 --> 01:06:20,358 Okay, fine. 1216 01:06:21,614 --> 01:06:23,664 but why not with an LLM? 1217 01:06:23,674 --> 01:06:26,914 Like, maybe I run a smaller, simpler LLM locally. 1218 01:06:27,204 --> 01:06:30,819 And then again, when I'm connected and I'm online, I just get better results. 1219 01:06:30,819 --> 01:06:31,949 I get better answers. 1220 01:06:32,279 --> 01:06:34,712 so I'm never totally, totally cut off. 1221 01:06:34,793 --> 01:06:38,303 mean, there's plenty of research on distributed machine learning 1222 01:06:38,343 --> 01:06:40,903 and all of this stuff, but that's like, I would say in the future. 1223 01:06:41,101 --> 01:06:43,291 just kind of to put an arc on all of this stuff. 1224 01:06:43,651 --> 01:06:46,321 and everybody's seen my talks before has probably heard me give, give 1225 01:06:46,321 --> 01:06:48,414 this short spiel, once or twice. 1226 01:06:48,471 --> 01:06:52,731 but you know, in, in the nineties, when we were developing the web, right. 1227 01:06:52,801 --> 01:06:53,851 As opposed to the internet. 1228 01:06:54,128 --> 01:06:57,428 the assumption was that you had a computer under your desk. 1229 01:06:57,428 --> 01:07:00,658 It was a beige box that you would turn on and you would turn it off sometimes. 1230 01:07:00,688 --> 01:07:00,858 Right. 1231 01:07:00,958 --> 01:07:02,618 It was the last time you actually turned off your, your laptop, 1232 01:07:02,931 --> 01:07:03,981 or your phone for that matter. 1233 01:07:04,396 --> 01:07:07,326 And when you wanted to connect to the internet, you'd tie up your phone line. 1234 01:07:08,206 --> 01:07:08,896 That's no good. 1235 01:07:09,326 --> 01:07:12,046 So you would rent from somebody else, something that was always 1236 01:07:12,046 --> 01:07:13,356 online with a lot of power. 1237 01:07:14,496 --> 01:07:18,946 And we now live in a different world, but we're still, you know, the centralized, 1238 01:07:18,976 --> 01:07:23,206 you know, or the, the cloud systems rather, all have this assumption of, 1239 01:07:23,406 --> 01:07:27,086 well, we have more power and we're more online and are better connected than you. 1240 01:07:28,606 --> 01:07:29,056 Okay. 1241 01:07:29,056 --> 01:07:32,546 That's true, but how many things do we, does that actually matter for? 1242 01:07:32,816 --> 01:07:36,226 And with systems like Automerge and, you know, local-first things developing, it's 1243 01:07:36,226 --> 01:07:41,886 like, actually, you know what, my, my machines are fast enough now where I can 1244 01:07:41,886 --> 01:07:43,776 keep the entire log of the entire history. 1245 01:07:43,806 --> 01:07:47,896 And it's fine because we can compress it down to a couple hundred K and it's okay. 1246 01:07:48,246 --> 01:07:49,866 And I'm fast enough to play over the whole log. 1247 01:07:50,336 --> 01:07:53,076 And we can do all of this eventually consistent stuff and it doesn't 1248 01:07:53,196 --> 01:07:55,906 completely, you know, hurt the performance of my application. 1249 01:07:56,796 --> 01:07:59,286 It's massively simplifying the architecture. 1250 01:07:59,316 --> 01:08:00,496 Things have gotten out of hand. 1251 01:08:00,976 --> 01:08:06,596 So there is this dividing line between things that are still, you know, the 1252 01:08:06,596 --> 01:08:08,776 cloud isn't completely the enemy. 1253 01:08:09,146 --> 01:08:11,886 They do have some advantages, right? 1254 01:08:12,376 --> 01:08:14,746 But they don't, not everything needs to live there. 1255 01:08:14,796 --> 01:08:16,906 And so we're moving into this world of like, how much can we 1256 01:08:16,906 --> 01:08:20,266 pull back down into our individual devices and get control over them? 1257 01:08:20,916 --> 01:08:21,936 Yeah, I love that. 1258 01:08:21,986 --> 01:08:25,986 I think that very neatly summarizes a huge aspect why 1259 01:08:26,276 --> 01:08:28,916 local-first talks to so many of us. 1260 01:08:29,506 --> 01:08:34,066 So I've learned a lot in this conversation and I'm really 1261 01:08:34,066 --> 01:08:37,386 excited to get my hands on Beehive. 1262 01:08:37,641 --> 01:08:43,211 As it becomes more publicly available, hopefully already a lot closer to the 1263 01:08:43,221 --> 01:08:45,281 time when the, this episode comes out. 1264 01:08:45,611 --> 01:08:50,951 In the meanwhile, if someone got really excited to get their hands dirty and 1265 01:08:51,141 --> 01:08:55,771 like digging into some of the knowledge that you've shared here, I certainly 1266 01:08:55,771 --> 01:08:58,981 recommend checking out your amazing talks. 1267 01:08:59,191 --> 01:09:02,801 I have still a lot of them on my watch lists and like our, I think 1268 01:09:02,801 --> 01:09:06,181 there's many shared interests that we didn't go into this episode here. 1269 01:09:06,181 --> 01:09:09,844 Like you're also, a lot into functional programming, et cetera. 1270 01:09:09,844 --> 01:09:13,894 And I think you're, you're like going really deep on Rust as well, et cetera. 1271 01:09:13,924 --> 01:09:15,554 So lots for me to, to learn. 1272 01:09:15,994 --> 01:09:20,233 But, If you can't wait to get your hands on beehive, I think it's also very 1273 01:09:20,233 --> 01:09:22,906 practical to, play around with UCAN. 1274 01:09:23,239 --> 01:09:26,656 I think there are a bunch of, implementations for, for various language 1275 01:09:26,656 --> 01:09:31,761 stacks, and that is something that you can already build things with today. 1276 01:09:32,001 --> 01:09:34,887 and I think, it's not like that Beehive will fully replace 1277 01:09:34,917 --> 01:09:36,337 UCAN or the other way around. 1278 01:09:36,377 --> 01:09:39,697 I think there will be use cases where you can use both, but this way you 1279 01:09:39,697 --> 01:09:44,547 can already get in the right mental model, and, and be ready, Beehive 1280 01:09:44,547 --> 01:09:46,827 ready when, when it gets available. 1281 01:09:47,137 --> 01:09:50,717 So that's certainly, what I would recommend folks to check out. 1282 01:09:50,897 --> 01:09:55,467 Is there anything else you would like the audience to do, look up or watch? 1283 01:09:56,161 --> 01:10:00,507 Yeah, so definitely keep an eye on the the Ink & Switch, webpage. 1284 01:10:00,707 --> 01:10:03,504 we have lab notes, at the time of this recording. 1285 01:10:03,554 --> 01:10:06,184 There's just the one note up there, but I'm, I have a whole bunch of 1286 01:10:06,184 --> 01:10:10,027 them, like many, in draft that I just need to clean up and publish. 1287 01:10:10,305 --> 01:10:14,262 we'll also be releasing an essay, Ink & Switch style essay, on, on 1288 01:10:14,262 --> 01:10:16,112 this whole project, in the new year. 1289 01:10:16,222 --> 01:10:19,945 And, yeah, keep, keep an eye out for, for when this all gets released. 1290 01:10:20,071 --> 01:10:23,950 there's a bunch of stuff coming, in Automerge, in, in the new years, I 1291 01:10:23,950 --> 01:10:27,760 can't remember if it's Automerge V2 or V3, but there's, you know, some, 1292 01:10:27,770 --> 01:10:31,400 some, some branding with it of like much faster, lower memory footprint, 1293 01:10:31,440 --> 01:10:33,356 better sync, and, and security. 1294 01:10:33,356 --> 01:10:35,546 And like all of these sort of, you know, big, big headline features. 1295 01:10:35,556 --> 01:10:38,400 So definitely keep an eye on, all the stuff happening in Automerge. 1296 01:10:38,610 --> 01:10:39,380 That's awesome. 1297 01:10:39,380 --> 01:10:42,753 Brooke, thank you so much for taking the time and sharing 1298 01:10:42,753 --> 01:10:44,103 all of this knowledge with us. 1299 01:10:44,193 --> 01:10:45,213 super appreciated. 1300 01:10:45,233 --> 01:10:45,703 Thank you. 1301 01:10:45,996 --> 01:10:46,916 Thank you so much for having me. 1302 01:10:47,232 --> 01:10:49,652 Thank you for listening to the Local First FM podcast. 1303 01:10:49,882 --> 01:10:52,332 If you've enjoyed this episode and haven't done so already, please 1304 01:10:52,602 --> 01:10:54,312 Please subscribe and leave a review. 1305 01:10:54,752 --> 01:10:57,312 Please also share this episode with your friends and colleagues. 1306 01:10:57,682 --> 01:11:00,672 Spreading the word about the podcast is a great way to support 1307 01:11:00,672 --> 01:11:02,552 it and to help me keep it going. 1308 01:11:03,062 --> 01:11:07,562 A special thanks again to Convex and ElectricSQL for supporting this podcast. 1309 01:11:07,962 --> 01:11:08,722 See you next time.