localfirst.fm

A podcast about local-first software development

Listen

Conference

#19 – Brooklyn Zelenka: UCAN, Beehive, Beelay


The guest of this episode is Brooklyn Zelenka, a local-first researcher and creator of various projects including UCAN and Beehive. This conversation, will go deep on authorization and access control in a decentralized, local-first environment and explore this topic by learning about UCAN and Beehive. Later, the conversation will also diving into Beelay, a new generic sync server implementation developed by Ink & Switch. 

Mentioned in podcast:


Links:

Thank you to Convex and ElectriSQL for supporting the podcast.

Transcript

#19 – Brooklyn Zelenka: UCAN, Beehive, Beelay
00:00we've restricted ourselves down to making things look like access
00:03control lists, on the outside.
00:06and so it should feel very, very similar to doing things with role
00:10based access control using, say, OAuth.
00:14That should all feel totally normal.
00:17You shouldn't really have to think about it in any special way.
00:20In the same way that, you know, if you have a sync server, other than having
00:22to set up the sync server, or maybe you pointed at an existing one, knowing that
00:26it's there doesn't mean that you have to, like, design it from first principles.
00:30Welcome to the localfirst.fm podcast.
00:33I'm your host, Johannes Schickling, and I'm a web developer, a
00:35startup founder, and love the craft of software engineering.
00:39For the past few years, I've been on a journey to build a modern, high quality
00:42music app using web technologies.
00:45And in doing so, I've fallen down the rabbit hole of local-first software.
00:49This podcast is your invitation to join me on that journey.
00:53In this episode, I'm speaking to Brooklyn Zelenka, a local-first
00:57researcher and creator of various projects, including UCAN and Beehive.
01:01In this conversation, we go deep on authorization and access control
01:05in a local-first decentralized environment and explore this topic
01:10by learning about UCAN and Beehive.
01:12Later, we are also diving into Beelay, a new generic sync server implementation
01:17developed by Ink and Switch.
01:19Before getting started, also a big thank you to Convex and Electric
01:23SQL for supporting this podcast.
01:26And now my interview with Brooklyn.
01:28Hey Brooke, so nice to have you on the show.
01:31How are you doing?
01:32I'm doing great.
01:32Super excited to be here.
01:33I'm glad that we, made this happen.
01:35Thanks so much for having me.
01:37I was really looking forward to this episode and honestly, I was quite nervous
01:42because this is certainly bringing me to an aspect of local-first where I have
01:47much less first hand experience myself.
01:49I think overall local-first is a big frontier of pushing the boundaries, what's
01:54possible technologically, et cetera.
01:57And you're pushing forward even a further frontier here all around local-first auth.
02:03So the people in the audience who are already familiar with your work,
02:08I'm sure they're very thrilled for you to be here, but for the folks
02:11who don't know who you are, would you mind giving a brief background?
02:15Yeah, absolutely.
02:16I'll maybe do in slightly reverse chronological order.
02:20So, these days I'm working on a, Auth system for local-first, mostly
02:25focused on Automerge called Beehive, which does both read controls with
02:29encryption and mutation controls with something called capabilities.
02:33I'm sure we'll get into that.
02:35Prior to this, for A little over five years, I was the, CTO
02:40at a company called Fission.
02:41so 2019, we started doing local-first, there.
02:44And we worked on, the stack we always called, auth, data, and compute and so
02:48we ranged out way ahead on, a variety of things, trying local-first, you know,
02:53Encrypted at rest databases databases, file system, a auth system, that has
02:58gotten some adoption called UCAN, and Compute Layer, IPVM and prior to that,
03:04I did a lot of web and was, temporarily, did work with the, Ethereum core
03:08development community, mostly working on the, Ethereum virtual machine.
03:11That is super impressive.
03:13I am very curious to dig into all of the parts really around
03:18auth, data, and compute.
03:20however, in this episode, I think we should keep it a bit more
03:23focused on particularly on auth.
03:26Maybe towards the end, we can also talk a bit more about compute.
03:29Most of the episodes we've done so far have been very centric around data.
03:34Only a few have been more, also exploring what auth in a local-first setting
03:39could look like, but I think there is no better person in the local-first space
03:44to really go deep on, on all things auth.
03:47So through your work on Fission, and previous backgrounds, et cetera, you've,
03:53both participated in, contributed to, and started a whole myriad of different
03:58projects, which are now really like on the forefront on those various fields.
04:03One of it is UCAN.
04:04You've also mentioned Beehive at Ink & Switch.
04:08Maybe starting with UCAN, for those of us who have no idea what UCAN, that four
04:13letter acronym, stands for and what it means, could you give us an introduction?
04:18Yeah, absolutely.
04:19So UCAN, U C A N, User Controlled Authorization Networks, is A way of
04:25doing authorization, so granting the ability to somebody else to perform
04:30some action on a resource, in a totally peer to peer, local-first way.
04:36It uses a model called Capabilities.
04:39So instead of having a database that lists all of the users and what
04:43they can do, you get certificates that are cryptographically provable.
04:48And so if I wanted to give you access to some resource I controlled, I
04:52would sign a certificate to you.
04:54And then if you wanted to give access to someone else, you
04:56would sign a certificate to them.
04:58And then when it came back to me, I could check that that whole chain was correct.
05:02And so people have used this to, do all kinds of things.
05:05So at Fission, we were using it for CRDTs.
05:08For example, there's a CRDT based file system that we had developed,
05:12to guard whether or not you were allowed to write into it.
05:15There's a bunch of teams now using it for, managing resources.
05:19So, storage quotas.
05:20How much are you allowed to store inside of some data volume?
05:23and for them, it's really helpful because then they can say, Okay.
05:26Here's a certificate from us to, you know, say a developer, and then they can
05:31portion that out to all of their users without having to always register all of
05:35their users back to, the storage company.
05:37and so it can, both lower the amount of interaction that they have to do
05:41with, you know, registering all of these different people, but it also
05:44means that they can scale up, really nicely their service so as long as
05:48they know about the root signature.
05:50They can scale horizontally, very, very easily or interact with other teams very
05:55easily by just issuing them certificates.
05:56So, like, people are doing that kind of thing,
05:58So, you've mentioned the term capabilities before, and I think
06:01that's also a central part in UCAN.
06:04I'm most familiar with, from my more traditional background of like building
06:08more centralized server applications, et cetera, and how you implement auth is
06:13always very, very dependent on the kind of application that you want to build.
06:17if you want to start out a bit more easily, then you could maybe lean
06:21on some of the primitives that a certain technology or platform is
06:25giving you, maybe using Postgres and use sort of like the, role based
06:29access control patterns that you have in Postgres or maybe something
06:33even as off the shelf as Firebase.
06:36is this sort of like a useful mental model to think about it that you
06:40can gives me similar building blocks or how much more fine granular can
06:45I get with what UCAN offers to me?
06:48Yes, it's a great question.
06:50So, in, role based access control or any of these, access control
06:55list based systems, right?
06:59you put a database that has You know, a list of users and what they're able to do.
07:05So often their role, are they an admin?
07:07Are they a writer?
07:08Are they a reader?
07:08You know, all of these things.
07:10and, to update that list, you have to go to that database, update that
07:15database, and on every request that you make, you have to check the list.
07:20So sometimes we call this like, it's like having a bouncer at a club.
07:23You know, you show up, you show them your ID.
07:25They check, are you on the VIP list?
07:27And then you're allowed into the club or not, And what those rules are, are set by
07:31that, you know, by that bouncer, right?
07:34These are the only rules, no others.
07:36in a capabilities world, the analogy is, is often to having like a ticket to go
07:40see a movie, So, this last weekend, I went to go see Wicked, it was awesome.
07:45but I bought my ticket online, it showed up in my email, they didn't ID me on
07:49the way in, I just showed them my ticket and they're like, Oh, great, yeah.
07:52Theater 4, you can go in.
07:54so as long as I had that proof with me.
07:57I'm allowed in.
07:58They didn't have to check a list.
07:59There was no central place to look.
08:02Capabilities, are not a new model.
08:05They've existed for some time.
08:07In fact, a big part of the internet infrastructure runs on top of
08:11capabilities as well, or a subset of them.
08:15But it hasn't found its way as much into applications because we're
08:18so used to access control lists.
08:20The granularity that you mentioned before is really interesting because,
08:24in the capability system, anytime I make that delegation to somebody else,
08:28I say, you're allowed to use this thing, or then you go to somebody else
08:31and say, you can also use this thing.
08:33You can grant them the ability to see or to use that.
08:37or fewer capabilities.
08:38So if it was like, here's a terabyte of storage, you could turn around and say,
08:42well, here's only 50 MBs to somebody.
08:44And so you can get as granular as you want, with it.
08:47And, there's never any confusion about who's acting in what way, right?
08:54So in a traditional system, if we had, you know, with, with access control lists,
08:59you sat, you know, you ran a service.
09:02between the user and me, and they made a request to you.
09:05Well, they only have a link to you and you only have a link to me.
09:08So when you'd make the request to me, you'd be using your terabyte of storage.
09:12And so there are some cases where that can confuse the resource.
09:16So it's like, oh yeah, you can totally store it, you know, use a terabyte
09:19of storage, even though the actual user shouldn't be able to do that.
09:22With capabilities, we get rid of that completely.
09:25We have this entire chain of custody, basically, of this.
09:28As granular as you want to get, it's very clear on every request,
09:32what that request is allowed to do.
09:34so I think this is going to become really important for things like, LLMs and other
09:38sort of automated agents where you can tell it, Hey, go do things for me, but
09:43not with all of my rights, not as sudo.
09:46Only with, in this scenario for the next five minutes, these things
09:50are what you're allowed to do.
09:51And even if it hallucinates some other intention, those are the
09:54only things it's able to do.
09:55Yeah, I think this is, such an important aspect.
09:59since I think you don't even need to reach as far as giving agency to a An
10:06agent to an AI, but even if you want to go a bit more dumb and a bit more
10:11traditional, if you want to use some off the shelf SaaS service, and, maybe that
10:17thing integrates with your Google account.
10:20Then you also like, you need to give the thing somehow access.
10:23So you do like the, OAuth flow with Google and then it asks you
10:27like, Hey, is it okay that we have access to all of those things,
10:30that we can do all of those things?
10:33And even though Google's already offers some pretty fine granular things there,
10:37often I feel like, Oh, actually I want to make it even more fine granular.
10:42Wait, you're going to have like access to all of my emails.
10:44Can I maybe just give you access to my invoice emails
10:48if this is an invoicing thing?
10:50So I feel like it's both a bit overwhelming to make all of those
10:54decisions upfront, like what should be allowed, Both from a application end
10:59user perspective, me using the thing, but then particularly also from like
11:03an application developer perspective.
11:05And, yeah, it feels like a really, really important aspect of using the
11:10app and building, designing the app.
11:12And if that is not, intuitive and ergonomic, then I feel it's going
11:18to, everyone's going to suffer.
11:19The application developer, they're Probably just going to wing it, and
11:23that will mean probably too coarse of a, granularity for application users, etc.
11:30So I'm really excited that you're pushing forward on this.
11:33maybe also to draw the analogy, between more traditional OAuth
11:38flows and what UCAN is providing.
11:40It's, should I think about UCAN as a replacement for OAuth from like both,
11:46end user perspective, as well as from an application developer perspective?
11:50Yeah exactly.
11:52so the, the underlying mechanism is different, But we really wanted
11:56it to feel as familiar as possible.
11:59So even the early versions of UCAN used the same token
12:02format and things like this.
12:04We've since switched over, to some more modern formats.
12:08There are problems with JWTs.
12:10but yeah, exactly.
12:11You can think of it as, local-first OAuth is one way of thinking about it, exactly.
12:16Right.
12:17So as an application developer, I need to make up my mind once to
12:22say like, this is what's possible.
12:23This is what is allowed and like define, and then the system then
12:28enforces those rules, but often I, as an application developer get it
12:32wrong and I need to like, either make the rules like more permissive,
12:38or or less permissive over time.
12:40And similar to how I might get wrong a database schema and then later need
12:46to do those dreaded database schema migrations, what is the equivalent
12:49of a schema migration, but for UCAN capability definitions, etc.
12:56so all of the information that you need to fulfill a request in UCAN
13:00is contained in the token itself.
13:02so, these days we have a little, policy language, think of it a little bit
13:06like, like SAML, inside the token.
13:08And it says, okay, when you go to actually do something with this token, the, Action
13:14has to match the following criteria.
13:16you're sending an email.
13:17So the two fields has to only be two people inside of the company.
13:22Or, you can only send, newsletters on Mondays or whatever it is.
13:28Right.
13:28And you can scope that down arbitrarily, syntactically.
13:32So updating those policies is just issuing a new certificate, to say
13:35this is what you're allowed to do now.
13:36and, you know, you can revoke the old ones if that's needed.
13:40But I think the more interesting part of this actually is on the far other end.
13:44So we were talking about, you know, the developer sets these policies.
13:47And that's true, I would say, the majority of the time.
13:50But it's not very, It doesn't respect user agency, right?
13:55You're giving the developer all of the agency, but the user's the one
13:58who owns whatever, let's say that it's a text editing app, right?
14:02You know, so they own the document.
14:04Why can't they decide, you know, when they share with somebody else what they
14:07should be able to do with that document?
14:09so in, say, you know, Google Docs, you've got that little share button in the top
14:12corner and then says, you know, invite people and then you can say, well, they're
14:15an editor and this person said, you know, another admin and this is another viewer.
14:19This person can only comment.
14:21I think the UI is.
14:22You know, we'll usually stay like that, but you could add whatever
14:26options you wanted in there, right?
14:28Why not?
14:29So when we were doing, back at Fission, the file system work, you could scope
14:33down to say, like, well, you're allowed to write into only this directory, for
14:37example, and that was very, very flexible.
14:39Or, you're allowed to write files under a certain size limit, right?
14:43And so the user now can make these decisions of like, I'm giving
14:46you access to my file system.
14:48I only want you, you know, maybe I'm, you know, I'm thinking back to my school days,
14:52you know, a teacher and they're having students submit, assignments to them.
14:55Well, you can only submit them to this one directory and I don't
14:59want you filling up my entire disk.
15:00So they have to be under a gigabyte or whatever, right?
15:04And so you can imagine scenarios like this, where we're now inviting
15:07the end user to participate in what should the policy be.
15:11It's not all set completely.
15:13The developer can absolutely set it in advance, but you can
15:16also then refine it further and further, for the user's intention.
15:19Right.
15:19I love that.
15:20Since particularly now with like LMs and AIs in general, now a non technical
15:26user can now just in the way how they would say to another person, like,
15:30Hey, I want to give Alice access to this file, but Alice is only allowed
15:36to like read the first page here.
15:38The second two pages, those are like my private notes.
15:42Please don't give anyone access to this.
15:44You know what?
15:44Like actually Alice is allowed to also like comment on it.
15:48Like just from like a, a very like colloquial sentence like that,
15:52a computer can now derive, those capabilities very accurately.
15:56Represented to the user, like, Hey, does this look right to you?
16:00And, leveling up the entire application user experience.
16:04so it's very reassuring to me that all of this is built on top of very sound
16:10cryptography, however, even though I've studied computer science and like
16:14I have done my cryptography classes.
16:17That being said, I have, that's not my day to day thing.
16:20And as an application developer, I'm trying to steer away from like low
16:25level cryptography things as much as possible, just because I don't
16:28consider myself an expert in this.
16:31So it's very good to know that everything on that is built on top of very solid
16:36cryptography, but how much as an application developer, how much do I
16:40need to deal with like signing things, et cetera, or how much of that is
16:45abstracted from what I'm dealing with?
16:47Yeah.
16:48so I would say that there's two layers here that people find.
16:52correctly find scary, myself included, right?
16:56cryptography and auth in general, both super scary topics.
16:59I remember, you know, as a web dev, whatever, 10 years ago adding, in a
17:04web app, the, You know, the Auth plugin and kind of going, and if I don't
17:09touch it, hopefully it'll work, right?
17:11really the goal with all these projects was to hide as much of the scary
17:15complexities in there as possible.
17:18So we handle all of the encryption and signing and all of this
17:21stuff in a way that should make it, if we do our job well.
17:24Completely invisible, to the developer.
17:27So even, you know, we haven't talked about Beehive very much.
17:29Beehive has both a, which is this, project I'm doing, at Ink & Switch
17:33to add access control to Automerge.
17:36It has both a encryption side, so that's read controls, and then capabilities for
17:41these mutations or, or write controls.
17:44and for encryption, there's a bunch of things that have to happen.
17:48We have to serialize things in an efficient way.
17:51We have to chunk them up.
17:52We have to, make sure that we share the encryption key with everyone.
17:57but no and nobody else, right?
17:59And that could be, Thousands of people, potentially, and we've set
18:02ourselves these, these goals of, you know, you should be able to run, run
18:05this inside of a large organization or a medium sized organization.
18:08how do you do all that stuff efficiently?
18:09And our goal is you should be able to say, add these people, and it just works.
18:16You do all your normal Automerge stuff, and on, you know, when you
18:19persist to disk, or when you send it out to the network, then it gets
18:22encrypted, then it gets secured, then it gets signed, all of this stuff.
18:25And you don't have to worry about any of it.
18:27when you set up Beehive, it generates keys, it does all the key
18:31management for you, it does all of the key rotation, all of this stuff.
18:35so, again, it's one of these things where it's like, I'm really excited about this.
18:40and it's like super cool to get to work on.
18:42And there's a lot of interesting detail on the inside, but in an ideal
18:47world, nobody has to think about this other than I want to grant these
18:50rights to these people and everything else is taken care of automatically.
18:54I love that.
18:55so you've motivated initially that UCAN, happened as a project while you've been
19:01working on various projects at Fission.
19:05and right now you're mostly focused on Beehive.
19:08So can you share a bit more, what was the impetus for Beehive coming
19:14into existence and then going into what Beehive is exactly?
19:19absolutely.
19:20So, you know, we started UCAN very, very early in 2020, came out of
19:26normal, regular product requirements of like, oh, well, we probably want
19:30everyone to read this document.
19:32How do we do that?
19:33Or I don't want somebody to fill up my entire disk.
19:36How do we prevent that?
19:37And, that went through a bunch of iterations and we, we had a lot
19:40of learnings come out of that.
19:42I'd say that really the big one was in a traditional app stack, you have data
19:47at the bottom, you know, you have to say Postgres and that's your source of truth.
19:49And then above that, you have some computes, maybe you're running.
19:52Whatever, Express.
19:53js, or Rails, or Phoenix, or you know, one of these.
19:57And then on top of that, you put in an Auth plugin, right, that uses all
20:02the facilities of everything below it.
20:04but that requires that you have a database that has all this information
20:10in it that lives at a location.
20:11We call this, internally at Ink & Switch, auth-as-place.
20:15Right?
20:15Because your auth goes to somewhere, right?
20:18And on every request, you present your ID, they go, okay, sure, you know, here's
20:22a temporary token, then you hand that to the application, the application
20:25checks with the auth, you know, server again, and you do this whole loop.
20:28And that has, you know, problems with latency, if you go offline,
20:32this doesn't work, and it doesn't scale very well, right?
20:34Like, even Google ran into problems with this and started,
20:37adjusting their auth system.
20:38we found at Fission, and I, I think this, this Very much holds true, like
20:42we just kept learning this over and over again, is you can't rely on that system.
20:47In fact, auth has to go at the bottom of the stack.
20:50your auth logic and the auth, the thing that actually does the guarding of your
20:55data has to move with the data itself.
20:57So we call this "auth as data".
20:59So for read control, it's no longer, oh, I'm making a request to a web server and
21:04they may or may not send something to me.
21:05It's, I've encrypted it.
21:06Do you have the key?
21:08Yes or no.
21:09If you.
21:10Have the key.
21:10You can read it.
21:11If you don't, you can't.
21:11And it doesn't matter where you are.
21:14You could be on a plane disconnected from the internet.
21:16You can decrypt the data, right?
21:19So we developed these ideas with, with UCAN and, the web native file system,
21:23in particular, Fission unfortunately didn't make it, earlier this year,
21:27or I, I'm not sure when this will be released in, early in 2024.
21:31and, Ink & Switch reached out.
21:32So we, we've, we've known those folks for a while, cause we've been, you
21:34know, obviously working in the same space for a while and, PVH, the lab
21:39director was actually an advisor at Fission and said, Hey, we have a bunch
21:42of people that are interested in getting, auth for Automerge in particular.
21:48could you apply UCAN and WNFS to Automerge?
21:53And I said, I don't see why not.
21:55Right.
21:56and so we, we looked at it, a little bit deeper and went, well,
21:59yes, like we, we could use these things directly, but they're tuned
22:02for slightly different use cases.
22:04UCAN is extremely powerful.
22:06It's very flexible.
22:07and it has a bunch of stuff in it for this, you know, network
22:10layer, in addition to CRDTs.
22:13You pay for that in space, right?
22:15The certificates get a little bit bigger.
22:17And so we said, well, okay, maybe, you know, we want these
22:20documents be as small as possible.
22:23You know, there's been a lot of work in Automerge to do compression, right?
22:26Really, really, really good compression on them.
22:28So the documents are tiny and, you know, you're not going to get that with UCAN.
22:32So could we take the principles and the learnings from UCAN and
22:35WNFS and apply them, to Automerge?
22:37And so ultimately that's what we've done.
22:41And there are a couple of different requirements that
22:43have come out of it as well.
22:44So it's tuned for a slightly different thing.
22:46But essentially, Beehive says, what if we had end to end encrypted?
22:50So in the same way that, you know, say, Signal, end to end encrypts your chats.
22:55What if I had end to end encrypted documents?
22:58That only certain people could write into, and I can control who can write into them.
23:03Has there been any prior art in regards to CRDTs to fulfill those sort of
23:09like end user driven authentication authorization requirements?
23:14there's some, some nearer term stuff that was also exploring things with CRDTs.
23:19But, you know, if you go really, really, you know, further back,
23:22there's, uh, the Tahoe least authority file system, for example,
23:27which was, you know, this encrypted at rest, file system capabilities
23:30model, you know, whole, whole thing.
23:32Mark Miller was doing capabilities based off going back into, you know, uh, The
23:38late 90s, there's capability stuff that goes even further back, but he's, he's,
23:41you know, really did the, the work that everybody points at, in, in the stuff.
23:44But for CRDTs and for a local-first context where we don't assume at
23:48all, like there's no server in the middle whatsoever, we may have been
23:54the first to do this at Fission.
23:55It's, it's possible.
23:56I mean, when we got started, the local-first essay hadn't
23:59even been published, right?
24:00We were doing local-first without, without the term.
24:02but there was a bunch of others in the space.
24:04So, Serenity Notes has done related work, Matrix, Signal, obviously has done
24:09a bunch of the end to end encryption stuff, and, local-first to auth, is a,
24:13a project that has also worked with, Automerge, to do similar things.
24:17so most of these projects, showed up, after the fact.
24:20but yeah, so we're drawing from, in fact, we've talked to, all these
24:23people and all of the fantastic work that they've done over the past few
24:26years, and, collected the learnings, from them into, into Beehive.
24:31That's awesome.
24:32I would love to get a better feeling for what it would mean
24:35to build an app with Beehive.
24:38My understanding is that Beehive right now is very centric around Automerge.
24:42However, it is designed in a way that over time, other CRDT systems,
24:48other sync engines, et cetera could actually embrace it and integrate
24:52it into their specific system.
24:54I would like to get into that in a moment as well, but zooming into
24:58the Automerge use case right now, let's say I have already built a
25:02little side project with Automerge.
25:04I have like some Automerge documents that are happily syncing the
25:09data between my different apps.
25:11so far I've maybe.
25:13Put the entire thing, maybe I don't even, have any auth fences around it at all.
25:19hopefully no one knows the end point where all of my data lives.
25:22And if so, okay.
25:24It's like not very sensitive data.
25:26or maybe I'm running all of that behind like a tail scale network or something
25:30like that, which I think in a lot of use cases, simpler use cases, this can also
25:34be a very pragmatic approach, by the way.
25:37when you can run the entire thing already, like in a fully secured frame of like
25:44a, guarded network, and you, you're just going to run this for yourself
25:47or like in your home network or for your family and you're all on like the
25:51same, tail scale wire guard network.
25:54I think that's also a very pragmatic approach.
25:56but, let's say I want to build an app that I can share more publicly on the
26:01internet, where maybe I want to build a TLDraw like thing where I can send over
26:06a link where people can read it, but they need to have special permissions to
26:11actually also write something into it.
26:14I want to build the thing with Automerge.
26:16What does my experience look like?
26:18Yeah.
26:19there are, I would say two parts to that question, right?
26:22One is, I have an existing documents.
26:25how do I migrate it in?
26:27And, you know, could I use it with something, you know, you alluded to
26:30other, other systems, in, in the future.
26:33and, what does the actual, experience building something
26:36with, with Behive look like?
26:38So Behive is still in progress.
26:40we're planning to have a first release of it, uh, in Q1.
26:44and, you know, we're currently going at this with with the viewpoint
26:47that like adding any auth is better than not having auth right now.
26:50So like there's definitely like further work where we want to like really
26:54polish off the edges of this thing but getting anything into people's hands is
26:57better than than not having it right.
27:00and there are some changes that we need to make to Automerge because
27:04as I mentioned before you know auth lives at the bottom of the stack so
27:08anything above in a stack needs to know something about the things below.
27:12Off being at the bottom means that if you wanna do in particular mutation
27:15control, Automerge needs to know about how to ingest that mutation.
27:18So we do need to make some small changes to Automerge to, to make this work.
27:22but the actual experience is, we're bundling it directly into Automerge
27:26or the current plan at least, is we're bundling it directly into the Automerge
27:30wasm, and then exposing a handful of functions on that, which is add
27:36member at a certain authority level.
27:40Remove member.
27:41And that's it.
27:42so your experience will be, we're going to do all the key management for you,
27:46behind the scenes, under the hood.
27:48if you have an existing document, it'll get serialized and encrypted
27:53and put, you know, into storage.
27:56And you can add other people to the document.
27:58By inviting them using add member or remove member from that document.
28:03maybe, maybe also worth noting, this gives you a couple extra, concepts to work with.
28:08So today we have documents, and you can have a whole bunch of them, and
28:11they're really independent pieces, right?
28:14And maybe they can refer to each other by, you know, an Automerge URL.
28:17instead, or in addition, I should say, not instead, you want to be able
28:22to say, I'm building a file system.
28:24If I give you access to the root of the file system, you should have access to.
28:27The entire file system.
28:28I don't want to have to share with you every individual thing.
28:32So we have this concept of a group.
28:34so you have your individual device, you have groups, and you have documents.
28:39Each individual device has its own, under the hood, you don't have to worry about
28:43this specific detail, but has its own key.
28:45So it's in, Uniquely identifiable.
28:48Somebody steals your phone, you can kick your phone out of the group, right?
28:52Or out of the document and that, that's fine.
28:54then we have groups.
28:55So let's say that I have a group for everyone at Ink & Switch.
28:59and then that can add everybody to that, but it doesn't have
29:01a document associated with it.
29:03It's purely just a way of managing people and saying, I want to add
29:07everybody in this group to this document.
29:10Right?
29:11And so you can have groups contain users and other groups.
29:15Then you have documents, which are groups that have some
29:18content associated with them.
29:19So I say on this document, here's who's allowed to see it.
29:21So it could be individuals or other groups or other documents.
29:25Other documents is interesting because I can say then you have
29:28access to this document, this document represents a directory.
29:31And so you also have access to all of its children, right?
29:33In a, in a file system, you can do things like this.
29:36So Add member, remove member becomes very, very powerful because now you can
29:40have groups and, you know, set up these, hierarchies of, here's all of my devices.
29:46All of my devices sit in a group of Brook's devices.
29:49All of Brook's devices should be added to Ink & Switch, and Ink
29:53& Switch has the following documents.
29:54And then, you know, whenever one of my contract finishes and I get
29:57kicked out of Ink & Switch, then they can kick all of my devices out
30:00by, by revoking that group, right?
30:04So using, Beehive is going to feel like that.
30:07It's going to say, yeah, I know about the ID for Brooke's devices.
30:11Please add her or, you know, contract finishes, please remove her.
30:15all of the rest of the stuff should be completely invisible to you.
30:19So when you persist things to disk or you send them to a sync server,
30:23that all gets encrypted first.
30:24And even the sync servers have permission.
30:28There's a permission level in here of, you're allowed to ask for the,
30:33the bytes off, from another node.
30:35And they can prove that because you have these certificates under the hood, right?
30:40because, and this is an uncomfortable truth, all cryptography is breakable.
30:44So in 10 years, maybe they break all of our current ciphers.
30:48Right?
30:48It could happen.
30:49In fact, older Cypher's already, you know, broken.
30:52Or maybe quantum computing gets very, very advanced, and it becomes
30:56practical to break keys, right?
30:58Whatever it is.
30:58Or there's an advancement in, discrete log problem, or whatever the thing is, right?
31:03You know, we have some mathematical advance, and it gets broken.
31:05the best thing to do, then, is to just not make those bytes available.
31:10Make the encrypted content only pullable by people that you trust.
31:13And yes, somebody could break into the sync server, let's
31:17say, and download everything.
31:18But that's a much higher bar than anybody can download.
31:21Anybody on the internet can download whatever chunk they want, right?
31:23But all of that is handled really for the developer to say, this is
31:26the sync server, sync server has the ability to pull down these documents.
31:30Or even the user could say, I want to sync to this sync server, I'm going
31:34to grant that sync server access to my documents to replicate them.
31:37But really, we're trying to keep the top level API for this
31:41as boring as possible, right?
31:43That is a top line goal.
31:45Add member, remove member, and the sync server is just
31:48another member in the system.
31:51Got it.
31:52So in terms of the auth as data, that, that mental model, that's very intuitive.
31:58And, as you're like rewiring your brain as an application developer, like how
32:02data flows through the system, now to understand that, like everything that's
32:07necessary to make those auth decisions, should someone have access to, to read
32:12this, to like write this, et cetera, that this is just data that's also being
32:17synchronized, across the different nodes.
32:20That is very intuitive.
32:22is this something that in this particular case, at least with Beehive and Automerge,
32:27is this purely an implementation detail?
32:29And this is like your internal mental model of this data, or is this actually
32:34data that is available somehow to the application developer that the application
32:38developer would work with that as they work with the normal Automerge documents?
32:43Yeah.
32:44So, Again, we're trying to hide these details as much as possible.
32:48So, you'll hear me talking about things like add member or groups, right?
32:52And that sounds very access control list like.
32:56capabilities are, like there's a formal proof of this, are more powerful.
33:00Like they can express more things than access control lists.
33:03So at least for this first revision, we've restricted ourselves down
33:06to making things look like access control lists, on the outside.
33:11and so it should feel very, very similar to doing things with role
33:15based access control using, say, OAuth.
33:20That should all feel totally normal.
33:23You shouldn't really have to think about it in any special way.
33:25In the same way that, you know, if you have a sync server, other than having
33:28to set up the sync server, or maybe you pointed at an existing one, knowing that
33:32it's there doesn't mean that you have to, like, design it from first principles.
33:35Or, you know, same thing with Automerge.
33:38Technically, you have access to all of the events.
33:41But really you're going to materialize a view and treat it like it's JSON.
33:45And so we're saying the same thing here with Beehive is you will automatically
33:50get only the data that you can decrypt and that you're allowed to receive from
33:54others and So, essentially, Beehive takes things off the wire, decrypts
34:00it, and hands it to Automerge, and then Automerge does its normal Automerge stuff.
34:03The one wrinkle is if an old write has been revoked, so it turns out
34:07that somebody was, like, defacing the document and doing all this horrible
34:10stuff, and we had to kick them out, we have to send it to Automerge,
34:13Hey, ignore this run of changes.
34:15And then it has to recalculate.
34:17So that's the one change that we have to make inside of Automerge.
34:19but really you will use Automerge as normal.
34:22you will have an extra API that is add this person to this document or
34:25to this group, and remove them, right?
34:28As needed.
34:29And you shouldn't have to think about any of these other
34:31parts, even the sync server.
34:33Like, Alex Good, who's the, the main maintainer of, of Automerge.
34:37has been working on, on sync and improving sync.
34:41and that project started around the same time as Beehive and we realized,
34:44Oh, there's actually this challenge because we're, you know, on the
34:47security side, trying to hide as much information from the network as possible,
34:50including from the sync server, right?
34:52Sync server shouldn't be able to read your documents.
34:54To do efficient sync, you want to have like a lot of information about the
34:56structure of the thing that you're syncing so that you have no redundancy.
34:59Right?
35:00And you can do it in a few round trips, all of this stuff.
35:02So we ended up having to co design and essentially, like, negotiate
35:06between the two systems, like, how, how much information can we
35:09reveal, and still have it be secure?
35:11And given that you can't read inside the documents, like, how do we
35:15package things up in an efficient way?
35:17But again, none of that information should be a concern for a developer
35:22in the same way that the sync system right now, you don't really interact
35:24with the sync system, other than you say, that's my sync server over
35:26there and the bytes go over there.
35:28There's an extra layer now of, it gets encrypted first
35:31before it goes over the wire.
35:32That makes sense.
35:33I think as an application developer, there's typically sort
35:36of this two pronged approach.
35:39There is like, You, on the one hand, you ideally, you want to embrace
35:43that things are hidden from you.
35:45That you don't need to understand them to use it correctly, et cetera.
35:49But particularly if something's new, some, maybe you're like an
35:52early adopter of the technology.
35:54you would like to figure out like, what are the worst case scenarios?
35:57Maybe the thing is no longer being developed.
35:59Could I take it over and like, can I become a contributor or maintainer
36:03of, of that, or you'd still like to understand it for the sake of like
36:08figuring, really understanding, is this.
36:11The thing that I want.
36:13and just by like understanding how it works, you can come to the right
36:16conclusion, like, is this for me or not, particularly if it's not
36:19yet as well documented, et cetera.
36:21So channeling our like inner understanding application developer.
36:27I'd like to understand a bit better of like how, Beehive and in that regard,
36:32also the sync server works under the hood.
36:35Like, it's hard enough to build a syncing system.
36:38and now, you build an authorization layer on top of it.
36:42What sort of implications does this have for the sync server?
36:46And my understanding is that Alex Good is working on this and I think
36:50this has been semi public so far.
36:52And that there's like a, you know, like a sibling product or a sibling
36:56project, next to Beehive called Beelay, which I guess like relays
37:01messages in the Beehive system.
37:03And I think that's a step towards what eventually, we're all dreaming about as
37:09like a generic sync server that ideally is compatible with like as many things
37:14as possible, I guess, at the beginning for Automerge, but also beyond that.
37:19So what is Beelay?
37:21What are its design goals and how does it work?
37:25So Beelay, has a requirement that it has to work with, encrypted chunks.
37:30So, you know, we do this compression and then encryption, on top of it,
37:34and then send that to the Sync Server.
37:36The Sync Server can see, because it has to know who it can send these
37:39chunks around to, the membership.
37:41So Sync Server does have access to the membership.
37:44of each doc, but not the content of the document.
37:47so if you make a request, it checks, you know, okay, are you somebody
37:50that, has the, the rights to, to have this sent to you, yes or no,
37:53and then it'll send it to you or not.
37:55And this isn't only for sync servers, you know, if you connect to somebody,
37:58you know, directly over Bluetooth, you know, you'd do the same thing, right?
38:01Even if, you know, you can both see the document.
38:04There's nothing special here about sync servers.
38:06To do this sync, well, we're no longer syncing individual ops, right?
38:10Like, we could do that, but then we lose the compression.
38:13It's not great, right?
38:15And ideally, we don't want people to know, you know, if somebody were to
38:19break into your server, hey, here's how everything's related to each other, right?
38:22Like, that compression and encryption, you know, also hides
38:25a little bit more of this data.
38:27We do show the links between these, you know, compressed chunks, but
38:30we'll, we'll get to that in a second.
38:32Essentially what we want to do is chunk up the documents in such a way where,
38:38there's the fewest number of chunks to get synced, and the longer ranges that
38:43we have of, you Automerge ops that we get compressed before we encrypt it, right?
38:48On the, I'll call it client.
38:50It's not really a client in a local-first setting, right?
38:52But like not on the not sync server when you're sending it to it.
38:55the more stuff that you have, the better the compression is.
38:58And chunking up the document here means basically, you're really
39:02chunking up the history of operations that then get internally rolled up
39:07into one snapshot of the document.
39:09And that could be very long.
39:11And, there's room for optimization.
39:14That is like the, the compression here, where if you set a ton of times, like,
39:19Hey, the name of the document is Peter.
39:22And later you say like, no, it's Brooke.
39:24And later you say, no, it's Peter.
39:26No, it's Johannes.
39:28Then you, you can like compress it into, for example, just the latest operation.
39:33Yeah, exactly.
39:34So, you know, if you want to think about how this, you know, to get, to get more
39:37concrete, you know, if you take this slider all the way to one end and you take
39:40the entire history and run length encoded, you know, do this Automerge compression,
39:45you get very, very good compression.
39:47If we take it to the far other end, we go really granular.
39:50Every op, doesn't get compressed, but you know, so it's just like each individual
39:55op, so you don't get compression.
39:56So there's something in between here of like, how can we chop up
39:59the history in a way where I get a nice balance between these two?
40:04When Automerge receives new ops, It has to know where in the history to place it.
40:10So you have this partial order, you know, you have this, you
40:12know, typical CRDT lattice.
40:14And then, we put that, or it puts it into a strict order.
40:18It orders all the events and then plays over them like a log.
40:21And this new event that you get, maybe it becomes the first event.
40:24Like you could go way to the beginning of history, right?
40:26Like you, you don't know because everything's eventually consistent.
40:29So if you do that linearization first and then chop up the documents,
40:34you have this problem where.
40:36If I do this chunking, or you do this chunking, well, it really depends
40:39on what history we have, right?
40:41And so it makes it very, very difficult to have a small amount of redundancy.
40:46So we found, two techniques helped us with this.
40:49One was, we take some particular, operation as a head and we
40:55say, ignore everything else.
40:56Only give me the history for this operation.
40:58Only instruct ancestors.
41:00So even if there's something concurrent, forget about all of that stuff.
41:04So that gets us something stable relative to a certain head.
41:08And then to know where the chunk boundaries are, we
41:13run a hash hardness metric.
41:15So, the number of zeros at the end of the hash for each op, gives
41:20you, you know, you can basically say, you know, each individual op,
41:23there may or may not be a 0, 0, 0, so I'm, I'm happy with, with anything.
41:28Or if I want it to be a range of, you know, 4, then give me two 0s at the
41:32end, because that will be, you know, 2 to the power of 2 is 4, so I'll chunk
41:35it up into 2s, and you, you make this as big or as small as you want, right?
41:38So now you have some way of probabilistically chunking up the
41:41documents, relative to some head.
41:44And you can say how big you want that to be based on this hash hardness metric.
41:47the advantage of this is even if we're doing things relative to
41:51different heads, now we're going to hit the same boundaries for these
41:54different, hash hardness metrics.
41:56So now we're sharing how we're chunking up the document.
41:59And we, Assume that on average, not all the time, but like on
42:04average, older, operations will have been seen by more people.
42:08So, or, you know, more and more peers.
42:11So, you're going to be appending things really to the end of the document, right?
42:17So you, you will less frequently have something concurrent with the
42:20first operation using this system.
42:22That means that we can get really good compression on older operations.
42:28Let's take, I'm just picking numbers out of the air here, but let's take
42:30the first two thirds of the document, which are relatively stable, compress
42:34those, we get really good compression.
42:36And then encrypt it and send it to the server.
42:38And then for the next, you know, of the remaining third, let's take the
42:42first two thirds of that and compress them and send them to the server.
42:46And then at some point we get each individual op.
42:48This means that as the, the document grows and changes.
42:52We can take these smaller chunks and as that gets pushed further and further into
42:56history, we can, whoever can actually read them, can recompress those ranges.
43:02So, Alex has this, I think, really fantastic, name for this, which is
43:06sedimen-tree because it's almost acting in sedimen-tree layers, but it's sedimen-tree
43:12because you get a tree of these layers.
43:14Yeah, it's cute, right?
43:15and so if you want to do a sync, like let's say you're doing a sync
43:18of like completely fresh, you've never seen the document before.
43:21You will get the really big chunk, and then you'll move up a layer,
43:25and you'll get the next biggest chunk of history, and then you move
43:27up a layer, and then eventually get like the last couple of ops.
43:30So we can get you really good compression, but again, it's this
43:32balance of the these two forces.
43:35Or, if you've already seen the first half of the document, you
43:38never have to sync that chunk again.
43:39You only need to get these higher layers of the sedimentary sync.
43:44So that's how we chunk up the document.
43:46Additionally, and I'm not at all going to go into how this thing works,
43:49but if people are into sync systems, this is like a pretty cool paper.
43:53It's called Practically Rateless Set Reconciliation is the name of the paper.
43:57And it does really interesting things with, compressing how, all the information
44:02you need to know what the other side has.
44:04So in half a round trip, so in one direction on average, you can get all
44:09the information you need to know what the delta is between your two sets.
44:13Literally, what are, what's the handful of ops that we've diverged by without
44:18having to send all of the hashes?
44:20so if people are into that stuff, go check out that paper.
44:22It's pretty cool.
44:23but there's a lot of detail in there that we're not, we're not
44:25going to cover on this podcast.
44:26Thanks a lot for explaining.
44:29I suppose it's like, Just a tip of the iceberg of like how Beelay works,
44:33but I think it's important to get a feeling for like, this is a new world
44:37in a way where it's decentralized, it is encrypted, et cetera.
44:42There's like really hard constraints what certain things can do since you could
44:47say like in your traditional development mindset, you would just say like, yeah,
44:52let's treat the client like it's just like a, like a Kindle, with like no
44:56CPU in it let's have the server do as much as the heavy lifting as possible.
45:01I think that's like a, the muscle that we're used to so far.
45:04But in this case, the server, even if it has a super beefy machine, et cetera, it
45:11can't really do that because it doesn't have access to do all of this work.
45:15So the clients need to do it.
45:17And, and when the clients independently do so, They need to
45:21eventually end up in the same spot.
45:23Otherwise the entire system, falls over or it gets very inefficient.
45:27So that sounds like a really elegant system that, that you're
45:30like working on in that regard.
45:32So with Beehive overall, like again, you're starting out here with
45:38Automerge as the driving system that drives the requirements, et cetera.
45:43But I think your, bigger ambition here, your bigger goals, is that this
45:48actually becomes a system that is, that at some point goes beyond just
45:54applying to Automerge, and that being a system that applies to many more other
45:59local-first technologies in the space.
46:01If there are application framework authors or like, like other people building a
46:07sync system, et cetera, and they'd be interested in seeing like, Hmm, instead
46:11of like us trying to come up with our own, research here for like what it
46:17means to do, authentication authorization for our sync system, particularly if
46:23you're doing it in a decentralized way.
46:25What would be a good way for those frameworks, those technologies to
46:30jump on the, the Beehive wagon.
46:33so if they're already using Automerge, I think that'll be
46:37pretty straightforward, right?
46:38You'll have bindings, it'll just work.
46:40but Beehive doesn't have a hard dependency on Automerge at all.
46:45because it lives at this layer below and we, Early on, we're like, well, should
46:50we just weld it directly into Automerge?
46:51Or like, you know, how much does it really need to know about it?
46:55and where we landed on this was you just need to have some kind
46:58of way of saying, here's the partial order between these events.
47:02and then everything works.
47:04So, as, just as a intuition.
47:07You could put Git inside of, Beehive, and it would work, I don't think
47:11GitHub's gonna adopt this anytime soon, but like, if you had your own
47:14Git syncing system, like, you, you could do this, and, and it would work.
47:18you just need to have some way of ordering, events next to each other.
47:22and yes, then you have to get a little bit more into slightly lower level APIs.
47:27So I, when I build stuff, I tend to work in layers of like, here's the very
47:32low level primitives, and then here's a slightly higher level, and a slightly
47:35higher level, and a slightly lower level.
47:37so people using it from Automerge will just have add member, remove
47:40member, and like, everything works.
47:41to go down one layer, you have to wire into it, here's how to do ordering.
47:47And that's it.
47:48And then everything else should, should wire all the way through.
47:51And you have to be able to pass it, serialized bytes.
47:53So, like, Beehive doesn't know anything about this compression that we were
47:56just talking about that Automerge does.
47:58But you tell it, hey, this is, you know, this is some batch, this is
48:02some, like, archive that I want to do.
48:03It starts at this timestamp and ends at that timestamp,
48:06or, you know, logical clock.
48:07please encrypt this for me.
48:09And it goes, sure, here you go.
48:10Encrypted.
48:11And, you know, off it goes.
48:12So it has very, very few, assumptions
48:15That's certainly something that I might also pick up a bit further down the
48:18road myself for, for LiveStore where the underlaying substrate to sync data
48:23around is like a ordered event log.
48:26And, if I'm encrypting those events.
48:29then I think that fulfills, perfectly the requirements that you've listed,
48:34which are very few for, for Beehive.
48:37So I'm really looking forward to once that gets further along.
48:40So speaking of like, where is Beehive right now?
48:43I've seen the, lab notebooks from what you have been working on at Ink & Switch.
48:49can I get my hands on Beehive already right now?
48:52Where is it at?
48:54what are the plans for the coming years?
48:56So at the time that we're recording this, at least, which is in early
48:59December, there's unfortunately not, not a publicly available version of it.
49:02I really hoped we'd have it ready by now, but, unfortunately we're still, wrapping
49:06up the last few, items in, in there.
49:09but, Q1, we plan to have, a release.
49:12as I mentioned before, there are some changes required, to Automerge to consume.
49:16specifically to, to manage revocation history.
49:19So somebody got kicked out, but we're still in this eventually consistent world.
49:23Automerge needs to know how to manage that.
49:24But.
49:25Managing things, sync, encryption, all of that stuff, we, we hope to have
49:30in, I'm not going to commit, commit the team to any particular, timeframe
49:33here, but like, we'll, we'll say in the next few, in the next coming weeks.
49:37right now the team is, myself.
49:39John Mumm, who joined a couple months into the project, and has been working
49:43on, BeeKEM, focused primarily on BeeKEM, which is a, again, I'm just going to
49:48throw out words here for people that are interested in this stuff, related to
49:51TreeKEM, but we made a concurrent, Which is based on, MLS or one of the primitives
49:55for, for messaging layer security.
49:57he's been doing great work there.
49:58And, Alex, amongst the many, many things that Alex Good does between
50:02writing the sync system and maintaining Automerge and all of these, you
50:07know, community stuff that he does, has also been, lending a hand.
50:11So I'm sure there's like for, for Beehive in a way you're, Just
50:15scratching the surface and there's probably enough work here for, to
50:19fill like another few years, maybe even decades worth of ambitious work.
50:24Can you paint a picture of like, what are some of like the, like right now
50:28you're probably working through the kind of POC or just the table stakes things.
50:33What are some of like the, way more ambitious longterm things
50:36that you would like to see in under the umbrella of Beehive?
50:39Yeah.
50:40So, There's a few.
50:41Yes.
50:42and we have this running list internally of like, what would a V2 look like?
50:45So, one is, adding a little policy language.
50:48I think it's just like the, bang for the buck that you get on having
50:51something like UCAN's policy language.
50:53It's just so high.
50:54It just gives you so much flexibility.
50:56hiding the membership, from even the sync server, is possible.
51:00it's just requires more engineering.
51:02so there are many, many places in here where, zero knowledge proofs, I
51:06think, would be very, Useful, for, for people who knows, know what those are.
51:09essentially it would let the sync server say, yes, I can send you bytes
51:14without knowing anything about you.
51:16Right,
51:17but it would still deny others.
51:19And right now it basically needs to run more logic to actually
51:22enforce those auth rules.
51:25Yeah.
51:25So today you have to, sign a message that says, I signed this with the same
51:30private key that you know about the public key for in this membership, we
51:36can hide the entire membership from the sync server and still do this.
51:39Without revealing even who's making the request, right?
51:41Like, that would be awesome.
51:43in fact, and this is a bit of a tangent, I think there's a number
51:45of places where, that class of technology would be really helpful.
51:49Even for things like, in CRDTs, there's this challenge where you have
51:53to keep all the history for all time.
51:55and I think with zero knowledge proofs, we can actually, like, this, this would
51:58very much be a research project, but I, I think it's possible to delete history, but
52:02still maintain cryptographic proofs, that things were done correctly and compress
52:06that down to, you know, a couple bytes, basically, but that's a bit of a tangent.
52:10I would love to work on that at some point in the future, but for, for
52:13Beehive, yeah, hiding more metadata, Hiding, you know, the membership
52:17from, from the group, making it, all the signatures post quantum.
52:21that is like even the main, recommendations from, from NIST, the U.
52:26S.
52:26government agency that that handles these things only just came out.
52:30So, you know, we're still kind of waiting for good libraries on it and, you know,
52:34all, all of this stuff and what have you.
52:36But yeah, making it post quantum, or fully, big chunks of it are already
52:40post quantum, but making it fully post quantum, would, would be great.
52:43and then yeah, adding all kinds of, bells and whistles and features, you know,
52:46making it faster, adding, it's not going to have its own compression, because it
52:50relies so heavily on cryptography, So it doesn't compress super well, right?
52:54So we're going to need to figure out our own version of, you know,
52:58Automerge has run length encoding.
52:59What is our version of that, given that we can't run length encode
53:02easily, encrypted things, right?
53:04Or, or signatures or, you know, all, all of this.
53:06so there's a lot of stuff, down, down in the plumbing.
53:08Plus I think this policy language would be really, really helpful.
53:11That sounds awesome.
53:12Both in terms of new features, capabilities, no pun intended, being
53:16added here, but also in terms of just, removing overhead from the system and like
53:22simplifying the surface area by doing, more of like clever work internally,
53:27which simplifies the system overall.
53:29That sounds very intriguing.
53:31The, the other thing worth noting with this, just, I think both to show point
53:35away into the future and then also draw a boundary over where what Beehive
53:39does and doesn't do, is identity.
53:41so Beehive only knows about public keys because those are universal.
53:46They work everywhere.
53:47They don't require a naming system, any of this stuff.
53:50we have lots of ideas and opinions on how to do a naming system.
53:55but you know, if, if you look at, for example, uh, BlueSky, under
53:58the hood, all of the accounts are managed with public keys, and then
54:02you map a name to them using DNS.
54:04So either you're using, you know, myname.
54:07bluesky.
54:07social, or you have your own domain name like I'm expede.Wtf
54:12on BlueSky, for example, right?
54:13Because I own that domain name and I can edit the text record.
54:15and that's great and it, definitely gives users a lot of agency over
54:20how to name themselves, right?
54:21Or, you know, there are other related systems.
54:24But it's not local-first because it relies on DNS.
54:28So, like, how could I invite you to a group without having to know your public
54:32key, We're probably going to ship, I would say, just because it's like
54:35relatively easy to do, a system called Edge Names, based on pet names, where
54:40basically I say, here's my contact book.
54:42I invited you.
54:43And at the time I invited you, I named you.
54:45Johannes right?
54:46And I named Peter, Peter, and so on and so forth, but there's no way to prove
54:52that that's just my name for them.
54:54Right.
54:54And for these people, and having a more universal system where
54:59I could invite somebody by like their email address, for example, I
55:02think would be really interesting.
55:03Back at Fission, Blaine Cook.
55:06Who's also done a bunch of stuff with Ink & Switch in the past, had proposed
55:09this system, the NameName system, that would give you local-first names
55:12that were rooted in things like email, so you could invite somebody with
55:17their email address and A local-first system could validate that that person
55:21actually had control over that email.
55:23It was a very interesting system.
55:25So there's a lot of work to be done in identity as separate from, authorization.
55:29Right, yeah.
55:30I feel like there just always, There's so much interesting stuff happening
55:35across the entire spectrum from, like, the world that we're currently in,
55:40which is mostly centralized, for just in terms of, like, that things work at
55:45all, and even there, it's hard to keep things up to date and, like, working,
55:50et cetera, but we want to aim higher.
55:54And one way to improve things a lot is like by going more decentralized but
55:59there's like so many hard problems to tame and like, we're starting to just peel
56:04off like the layers from the onion here.
56:07And, Automerge I think is a, is a great, canonical case study there, like it has
56:12started with the data and now things are around, authorization, et cetera.
56:17And like, then authentication, identity there, we probably have
56:21enough research work ahead of us for, for the coming decades to come.
56:25And super, super cool to see that so many bright minds are working on it.
56:29maybe one last question in regards to Beehive.
56:34When there's a lot of cryptography involved, that also means there's
56:38even more CPU cycles that need to be spent to make stuff work.
56:43have you been looking into some, performance benchmarks, when you, let's
56:48say you want to synchronize a certain, history of Automerge for some Automerge
56:54documents, with Beehive disabled and with Beehive enabled, do you see like
57:00a certain factor of like how much it gets slower with, Beehive and sort of
57:05the authorization rules applied both on the client as well as on the server?
57:10Yeah.
57:10So, it's a great question.
57:12so obviously there's different dimensions in, in Beehive, right?
57:14So for encryption, which is where I would say most people would expect there
57:19to be the, the performance overhead.
57:21There's absolutely overhead there.
57:22You're, you're doing decryption, but we're using algorithms that decrypt on the
57:26order of like multiple gigabytes a second.
57:29So it's fine, basically.
57:32and that's also part of why we wanted to chunk things up in this way,
57:35because when we get good compression, you know, all, all of this stuff.
57:37So if you're doing like a total, you know, first time you've seen this document,
57:42you've got to pull everything and decrypt everything and hand it off to Automerge.
57:45the, the encryption's not.
57:46going to be the bottleneck.
57:48and then on like a rolling basis, like as you know, per keystroke, yes, there
57:53there's absolutely overhead there, but remember this is relative to latency.
57:59So if you have 200 milliseconds of latency, that's your bottleneck.
58:03It's not going to be the five milliseconds of, of encryption that we're doing or
58:08signatures or, or whatever it is, there's a space cost because now we have to keep.
58:14Public keys, which are 32 bytes, and signatures, which are 64 bytes.
58:19So there is some overhead in space.
58:22that happens.
58:23but for the most part we've taken, we've chosen algorithms that
58:26are known to be very, very fast.
58:28They're, they're sort of like the, the, the best in class.
58:30So I'll just rattle down, down, down a list for the, the, the, the best.
58:33People that are interested.
58:34so we're using, EdDSA Edwards Keys for signatures, and key exchange, chacha
58:40for encryption, and BLAKE3 for hashing.
58:44BLAKE3 is very interesting what you do.
58:45Things like verifiable, streams.
58:47So like as you're streaming the data in, you can start hashing even
58:50parts of it as you're going along.
58:52the really big, bottleneck, the, like, the heaviest part of the system.
58:57or, or sorry, the part that we were at least happy with our original design on
59:00that we then ended up doing a bunch of research on was, doing key agreement.
59:06So if I have whatever, a thousand people in a company, and they're all,
59:12you know, working on this document, I don't want to have to send a
59:14thousand messages every time I change the key, which will be rotated.
59:18every message, let's say, or you know, once a day, if we're being,
59:22you know, more conservative with it.
59:24and that's a lot of data and a lot of just like latency on
59:27this and just a lot of network.
59:29So we switched to, instead of it being linear, we found a way
59:32of doing it in logarithmic time.
59:35So we can now do key rotations concurrently, like totally eventually
59:39consistently, in log n time.
59:41and That has been, a lot of research, happened in there, but then that let
59:47us scale up much, much, much more.
59:48So the prior algorithm that we were using off the shelf from a paper
59:52scaled up to, in the paper, they say about like 128 people, right?
59:55It's sort of like your upper bound and we're like, uh, you know, we had set
59:58ourselves these, these higher, levels that we actually want to work with.
1:00:02and so now we can scale into, into the thousands.
1:00:05When you get up to 50,000 people, yeah, it starts to slow down.
1:00:07You start to get into, you know, closer to a second if you're doing,
1:00:11very, very concurrent, you know, uh, 40,000 of the 50,000 people
1:00:14are doing concurrent key rotations.
1:00:16Doesn't happen very often, but like it could happen.
1:00:19if one person's doing an update, then it'll happen.
1:00:21in, like you won't even notice it.
1:00:23Right.
1:00:24So it depends on how heavily concurrent your document is.
1:00:26Do you have 40, 000 people writing to your document?
1:00:28Yeah.
1:00:28You're going to see it slow down a little bit.
1:00:30It's so amazing to see that.
1:00:32I mean, in academia, there is so much progress in those various fields.
1:00:36And I feel like in local-first, we actually get to benefit and like directly
1:00:42apply a lot of like those, those great achievements from other places where
1:00:45like, we can now like it makes a, Big difference for the applications that
1:00:49we'll be using, whether there is a cryptographic breakthrough in efficiency
1:00:53or being more long term secure, et cetera.
1:00:57And like, I fully agree that latency is probably by far the most important
1:01:02one when it comes to does it make a difference or not, but if my, like
1:01:06battery usage, et cetera, is another one.
1:01:08And like, If I synchronize data a lot, maybe I open a lot of data, like a lot
1:01:13of documents just once because maybe I'm reviewing documents a lot and
1:01:17like someone sends it, or maybe I'm an executive, I get to review a lot of
1:01:20documents and I like, I don't really amortize the documents too much because
1:01:26I don't reuse them on a day to day basis.
1:01:28I think that initial sync also tends to matter quite a bit.
1:01:33But, it's great to hear that, efficiency seems to be already,
1:01:37very well under control.
1:01:39So maybe rounding out this, you've been at Fission, you've been seeing, like, the
1:01:45innovation around local-first in, like, three buckets auth data and compute.
1:01:51As mentioned before, on this podcast, we've mostly been
1:01:54exploring the data aspect.
1:01:56Now we went quite deep on some of your work in regards to auth.
1:02:01We don't have too much time to spend on something else, but I'm curious
1:02:06whether you can just seed some ideas in regards to what does, where does compute
1:02:12fit in this new local-first world?
1:02:15Like, if you could fork yourself and like do a lot more work, what would you do be
1:02:21doing in regards to that compute bucket?
1:02:23Yeah.
1:02:24So, we, we had a project, related to compute at Fission,
1:02:27right, right at the end.
1:02:29and, I'm very fortunate that I actually have some grants to continue that
1:02:32work after I finish with Beehive.
1:02:33I'll switch to that and then, after that project, see what else is,
1:02:36is, is interesting kicking around.
1:02:38but, essentially the motivation is, all the compute for local-first stuff happens
1:02:43Completely locally today, or you're talking to some cloud service, right?
1:02:47Like maybe you're using an LLM.
1:02:48So you go to, you know, use the open AI APIs, that kind of thing.
1:02:53but what if you're on a very low powered device and you're on a plane?
1:02:58Right.
1:02:59you know, you still need to be able to do some compute some of the time.
1:03:02So the, the trade off that we're trying to, to strike in, in these
1:03:05kinds of projects is, what if I can always run it even slowly?
1:03:08So let's say I'm rendering a 3D scene and it's gonna take a, a minute
1:03:11to paint, versus I have a, desktop computer, you know, nearby and I can
1:03:18farm that drop out to that machine because it's nearby in latency,
1:03:22and it has more compute resources.
1:03:25Or maybe, I need to send email to a mail server that only exists in one place.
1:03:30Like, how can I do these, you know, compute dynamically where I can
1:03:35always run my jobs or my resource management whenever, whenever possible.
1:03:40Email server is a case where you can't always do this, right?
1:03:42But when somebody else could run it.
1:03:45Maybe I can farm that out to them instead.
1:03:46so there's a lot of interest, I think, in how do we bridge between what is
1:03:53sometimes called in the blue sky world, big world versus small world, right?
1:03:56So I have my local stuff.
1:03:57I'm doing things entirely on my own.
1:03:59I'm completely offline.
1:04:00And that is the baseline.
1:04:02But when I am online, how much more powerful can it get?
1:04:06Can I, you know, I'm not going to ingest the entire blue sky firehose myself.
1:04:10I'm going to leave that to an indexer.
1:04:12To do for me.
1:04:13So when I'm online, maybe I can get better search, right?
1:04:17Things like this, or maybe if I'm rendering PDFs, maybe I want to farm
1:04:20that out to some, server somewhere rather than doing that with Wasm in my browser.
1:04:25So kind of progressively enhancing the app.
1:04:28And I think, there's a lot of like recent, Oh, even more relevant
1:04:31with AI, but like with AI, this is particularly more relevant because
1:04:35now suddenly, we get lot of work.
1:04:38to be done that get massively benefits from a lot of compute.
1:04:43And with AI, in particular, I think it's also like, now we're
1:04:47in this, in this tricky spot.
1:04:49Either we already get to live in the future, but that means, typically all of
1:04:54like our Our AI intelligence is coming from like some very beefy servers and
1:04:58some data centers and the way how I get that instant, those, these enhancements
1:05:03is by just sending over all of like my context data into those servers.
1:05:09well, I guess you could get those beefy servers also, next to your
1:05:13desk, but that is a very expensive and I think not very practical.
1:05:17I guess step by step, like now the newest MacBooks, et cetera, are already
1:05:21like very capable and running things locally, but there will be always like
1:05:26a reason that you want to, fan things out a bit more, but doing so in a
1:05:30way that preserves like your, privacy around your data, et cetera, like
1:05:34leverages your, your resources properly.
1:05:37Like, if I'm just looking around myself, like I have an iPad over here,
1:05:41which sits entirely idle, et cetera.
1:05:44So.
1:05:45It's as with most things, in regards to application developers, if it's
1:05:50the right thing, it should be easy and doing, compute in sort of a
1:05:55distributed way is by far not easy.
1:05:58So very excited to, to hear that you want to explore this more.
1:06:02Yeah.
1:06:02Well, and you know, especially things like AI, you know, the, the question
1:06:06always is I should never be cut off from, from performing actions, if
1:06:10possible, like when possible, sometimes something lives at a particular
1:06:13place and I'm not connected to it.
1:06:15Fine, right?
1:06:16email being, you know, the canonical example here.
1:06:18Mail server lives in one place.
1:06:19Okay, fine.
1:06:21but why not with an LLM?
1:06:23Like, maybe I run a smaller, simpler LLM locally.
1:06:27And then again, when I'm connected and I'm online, I just get better results.
1:06:30I get better answers.
1:06:32so I'm never totally, totally cut off.
1:06:34mean, there's plenty of research on distributed machine learning
1:06:38and all of this stuff, but that's like, I would say in the future.
1:06:41just kind of to put an arc on all of this stuff.
1:06:43and everybody's seen my talks before has probably heard me give, give
1:06:46this short spiel, once or twice.
1:06:48but you know, in, in the nineties, when we were developing the web, right.
1:06:52As opposed to the internet.
1:06:54the assumption was that you had a computer under your desk.
1:06:57It was a beige box that you would turn on and you would turn it off sometimes.
1:07:00Right.
1:07:00It was the last time you actually turned off your, your laptop,
1:07:02or your phone for that matter.
1:07:04And when you wanted to connect to the internet, you'd tie up your phone line.
1:07:08That's no good.
1:07:09So you would rent from somebody else, something that was always
1:07:12online with a lot of power.
1:07:14And we now live in a different world, but we're still, you know, the centralized,
1:07:18you know, or the, the cloud systems rather, all have this assumption of,
1:07:23well, we have more power and we're more online and are better connected than you.
1:07:28Okay.
1:07:29That's true, but how many things do we, does that actually matter for?
1:07:32And with systems like Automerge and, you know, local-first things developing, it's
1:07:36like, actually, you know what, my, my machines are fast enough now where I can
1:07:41keep the entire log of the entire history.
1:07:43And it's fine because we can compress it down to a couple hundred K and it's okay.
1:07:48And I'm fast enough to play over the whole log.
1:07:50And we can do all of this eventually consistent stuff and it doesn't
1:07:53completely, you know, hurt the performance of my application.
1:07:56It's massively simplifying the architecture.
1:07:59Things have gotten out of hand.
1:08:00So there is this dividing line between things that are still, you know, the
1:08:06cloud isn't completely the enemy.
1:08:09They do have some advantages, right?
1:08:12But they don't, not everything needs to live there.
1:08:14And so we're moving into this world of like, how much can we
1:08:16pull back down into our individual devices and get control over them?
1:08:20Yeah, I love that.
1:08:21I think that very neatly summarizes a huge aspect why
1:08:26local-first talks to so many of us.
1:08:29So I've learned a lot in this conversation and I'm really
1:08:34excited to get my hands on Beehive.
1:08:37As it becomes more publicly available, hopefully already a lot closer to the
1:08:43time when the, this episode comes out.
1:08:45In the meanwhile, if someone got really excited to get their hands dirty and
1:08:51like digging into some of the knowledge that you've shared here, I certainly
1:08:55recommend checking out your amazing talks.
1:08:59I have still a lot of them on my watch lists and like our, I think
1:09:02there's many shared interests that we didn't go into this episode here.
1:09:06Like you're also, a lot into functional programming, et cetera.
1:09:09And I think you're, you're like going really deep on Rust as well, et cetera.
1:09:13So lots for me to, to learn.
1:09:15But, If you can't wait to get your hands on beehive, I think it's also very
1:09:20practical to, play around with UCAN.
1:09:23I think there are a bunch of, implementations for, for various language
1:09:26stacks, and that is something that you can already build things with today.
1:09:32and I think, it's not like that Beehive will fully replace
1:09:34UCAN or the other way around.
1:09:36I think there will be use cases where you can use both, but this way you
1:09:39can already get in the right mental model, and, and be ready, Beehive
1:09:44ready when, when it gets available.
1:09:47So that's certainly, what I would recommend folks to check out.
1:09:50Is there anything else you would like the audience to do, look up or watch?
1:09:56Yeah, so definitely keep an eye on the the Ink & Switch, webpage.
1:10:00we have lab notes, at the time of this recording.
1:10:03There's just the one note up there, but I'm, I have a whole bunch of
1:10:06them, like many, in draft that I just need to clean up and publish.
1:10:10we'll also be releasing an essay, Ink & Switch style essay, on, on
1:10:14this whole project, in the new year.
1:10:16And, yeah, keep, keep an eye out for, for when this all gets released.
1:10:20there's a bunch of stuff coming, in Automerge, in, in the new years, I
1:10:23can't remember if it's Automerge V2 or V3, but there's, you know, some,
1:10:27some, some branding with it of like much faster, lower memory footprint,
1:10:31better sync, and, and security.
1:10:33And like all of these sort of, you know, big, big headline features.
1:10:35So definitely keep an eye on, all the stuff happening in Automerge.
1:10:38That's awesome.
1:10:39Brooke, thank you so much for taking the time and sharing
1:10:42all of this knowledge with us.
1:10:44super appreciated.
1:10:45Thank you.
1:10:45Thank you so much for having me.
1:10:47Thank you for listening to the Local First FM podcast.
1:10:49If you've enjoyed this episode and haven't done so already, please
1:10:52Please subscribe and leave a review.
1:10:54Please also share this episode with your friends and colleagues.
1:10:57Spreading the word about the podcast is a great way to support
1:11:00it and to help me keep it going.
1:11:03A special thanks again to Convex and ElectricSQL for supporting this podcast.
1:11:07See you next time.