00:00we've restricted ourselves down
to making things look like access
00:03control lists, on the outside.
00:06and so it should feel very, very
similar to doing things with role
00:10based access control using, say, OAuth.
00:14That should all feel totally normal.
00:17You shouldn't really have to
think about it in any special way.
00:20In the same way that, you know, if you
have a sync server, other than having
00:22to set up the sync server, or maybe you
pointed at an existing one, knowing that
00:26it's there doesn't mean that you have to,
like, design it from first principles.
00:30Welcome to the localfirst.fm podcast.
00:33I'm your host, Johannes Schickling,
and I'm a web developer, a
00:35startup founder, and love the
craft of software engineering.
00:39For the past few years, I've been on a
journey to build a modern, high quality
00:42music app using web technologies.
00:45And in doing so, I've fallen down the
rabbit hole of local-first software.
00:49This podcast is your invitation
to join me on that journey.
00:53In this episode, I'm speaking to
Brooklyn Zelenka, a local-first
00:57researcher and creator of various
projects, including UCAN and Beehive.
01:01In this conversation, we go deep
on authorization and access control
01:05in a local-first decentralized
environment and explore this topic
01:10by learning about UCAN and Beehive.
01:12Later, we are also diving into Beelay,
a new generic sync server implementation
01:17developed by Ink and Switch.
01:19Before getting started, also a big
thank you to Convex and Electric
01:23SQL for supporting this podcast.
01:26And now my interview with Brooklyn.
01:28Hey Brooke, so nice to
have you on the show.
01:31How are you doing?
01:32I'm doing great.
01:32Super excited to be here.
01:33I'm glad that we, made this happen.
01:35Thanks so much for having me.
01:37I was really looking forward to this
episode and honestly, I was quite nervous
01:42because this is certainly bringing me
to an aspect of local-first where I have
01:47much less first hand experience myself.
01:49I think overall local-first is a big
frontier of pushing the boundaries, what's
01:54possible technologically, et cetera.
01:57And you're pushing forward even a further
frontier here all around local-first auth.
02:03So the people in the audience who
are already familiar with your work,
02:08I'm sure they're very thrilled for
you to be here, but for the folks
02:11who don't know who you are, would
you mind giving a brief background?
02:15Yeah, absolutely.
02:16I'll maybe do in slightly
reverse chronological order.
02:20So, these days I'm working on a,
Auth system for local-first, mostly
02:25focused on Automerge called Beehive,
which does both read controls with
02:29encryption and mutation controls
with something called capabilities.
02:33I'm sure we'll get into that.
02:35Prior to this, for A little
over five years, I was the, CTO
02:40at a company called Fission.
02:41so 2019, we started
doing local-first, there.
02:44And we worked on, the stack we always
called, auth, data, and compute and so
02:48we ranged out way ahead on, a variety
of things, trying local-first, you know,
02:53Encrypted at rest databases databases,
file system, a auth system, that has
02:58gotten some adoption called UCAN, and
Compute Layer, IPVM and prior to that,
03:04I did a lot of web and was, temporarily,
did work with the, Ethereum core
03:08development community, mostly working
on the, Ethereum virtual machine.
03:11That is super impressive.
03:13I am very curious to dig into
all of the parts really around
03:18auth, data, and compute.
03:20however, in this episode, I think
we should keep it a bit more
03:23focused on particularly on auth.
03:26Maybe towards the end, we can also
talk a bit more about compute.
03:29Most of the episodes we've done so far
have been very centric around data.
03:34Only a few have been more, also exploring
what auth in a local-first setting
03:39could look like, but I think there is no
better person in the local-first space
03:44to really go deep on, on all things auth.
03:47So through your work on Fission, and
previous backgrounds, et cetera, you've,
03:53both participated in, contributed to,
and started a whole myriad of different
03:58projects, which are now really like on
the forefront on those various fields.
04:03One of it is UCAN.
04:04You've also mentioned
Beehive at Ink & Switch.
04:08Maybe starting with UCAN, for those of
us who have no idea what UCAN, that four
04:13letter acronym, stands for and what it
means, could you give us an introduction?
04:18Yeah, absolutely.
04:19So UCAN, U C A N, User Controlled
Authorization Networks, is A way of
04:25doing authorization, so granting the
ability to somebody else to perform
04:30some action on a resource, in a
totally peer to peer, local-first way.
04:36It uses a model called Capabilities.
04:39So instead of having a database
that lists all of the users and what
04:43they can do, you get certificates
that are cryptographically provable.
04:48And so if I wanted to give you access
to some resource I controlled, I
04:52would sign a certificate to you.
04:54And then if you wanted to give
access to someone else, you
04:56would sign a certificate to them.
04:58And then when it came back to me, I could
check that that whole chain was correct.
05:02And so people have used this
to, do all kinds of things.
05:05So at Fission, we were using it for CRDTs.
05:08For example, there's a CRDT based
file system that we had developed,
05:12to guard whether or not you
were allowed to write into it.
05:15There's a bunch of teams now
using it for, managing resources.
05:19So, storage quotas.
05:20How much are you allowed to
store inside of some data volume?
05:23and for them, it's really helpful
because then they can say, Okay.
05:26Here's a certificate from us to, you
know, say a developer, and then they can
05:31portion that out to all of their users
without having to always register all of
05:35their users back to, the storage company.
05:37and so it can, both lower the amount
of interaction that they have to do
05:41with, you know, registering all of
these different people, but it also
05:44means that they can scale up, really
nicely their service so as long as
05:48they know about the root signature.
05:50They can scale horizontally, very, very
easily or interact with other teams very
05:55easily by just issuing them certificates.
05:56So, like, people are
doing that kind of thing,
05:58So, you've mentioned the term
capabilities before, and I think
06:01that's also a central part in UCAN.
06:04I'm most familiar with, from my more
traditional background of like building
06:08more centralized server applications,
et cetera, and how you implement auth is
06:13always very, very dependent on the kind
of application that you want to build.
06:17if you want to start out a bit more
easily, then you could maybe lean
06:21on some of the primitives that a
certain technology or platform is
06:25giving you, maybe using Postgres
and use sort of like the, role based
06:29access control patterns that you
have in Postgres or maybe something
06:33even as off the shelf as Firebase.
06:36is this sort of like a useful mental
model to think about it that you
06:40can gives me similar building blocks
or how much more fine granular can
06:45I get with what UCAN offers to me?
06:48Yes, it's a great question.
06:50So, in, role based access control
or any of these, access control
06:55list based systems, right?
06:59you put a database that has You know, a
list of users and what they're able to do.
07:05So often their role, are they an admin?
07:07Are they a writer?
07:08Are they a reader?
07:08You know, all of these things.
07:10and, to update that list, you have
to go to that database, update that
07:15database, and on every request that
you make, you have to check the list.
07:20So sometimes we call this like, it's
like having a bouncer at a club.
07:23You know, you show up,
you show them your ID.
07:25They check, are you on the VIP list?
07:27And then you're allowed into the club or
not, And what those rules are, are set by
07:31that, you know, by that bouncer, right?
07:34These are the only rules, no others.
07:36in a capabilities world, the analogy is,
is often to having like a ticket to go
07:40see a movie, So, this last weekend, I
went to go see Wicked, it was awesome.
07:45but I bought my ticket online, it showed
up in my email, they didn't ID me on
07:49the way in, I just showed them my ticket
and they're like, Oh, great, yeah.
07:52Theater 4, you can go in.
07:54so as long as I had that proof with me.
07:57I'm allowed in.
07:58They didn't have to check a list.
07:59There was no central place to look.
08:02Capabilities, are not a new model.
08:05They've existed for some time.
08:07In fact, a big part of the internet
infrastructure runs on top of
08:11capabilities as well, or a subset of them.
08:15But it hasn't found its way as much
into applications because we're
08:18so used to access control lists.
08:20The granularity that you mentioned
before is really interesting because,
08:24in the capability system, anytime I
make that delegation to somebody else,
08:28I say, you're allowed to use this
thing, or then you go to somebody else
08:31and say, you can also use this thing.
08:33You can grant them the
ability to see or to use that.
08:37or fewer capabilities.
08:38So if it was like, here's a terabyte of
storage, you could turn around and say,
08:42well, here's only 50 MBs to somebody.
08:44And so you can get as
granular as you want, with it.
08:47And, there's never any confusion
about who's acting in what way, right?
08:54So in a traditional system, if we had,
you know, with, with access control lists,
08:59you sat, you know, you ran a service.
09:02between the user and me, and
they made a request to you.
09:05Well, they only have a link to you
and you only have a link to me.
09:08So when you'd make the request to me,
you'd be using your terabyte of storage.
09:12And so there are some cases where
that can confuse the resource.
09:16So it's like, oh yeah, you can totally
store it, you know, use a terabyte
09:19of storage, even though the actual
user shouldn't be able to do that.
09:22With capabilities, we get
rid of that completely.
09:25We have this entire chain of
custody, basically, of this.
09:28As granular as you want to get,
it's very clear on every request,
09:32what that request is allowed to do.
09:34so I think this is going to become really
important for things like, LLMs and other
09:38sort of automated agents where you can
tell it, Hey, go do things for me, but
09:43not with all of my rights, not as sudo.
09:46Only with, in this scenario for
the next five minutes, these things
09:50are what you're allowed to do.
09:51And even if it hallucinates some
other intention, those are the
09:54only things it's able to do.
09:55Yeah, I think this is,
such an important aspect.
09:59since I think you don't even need to
reach as far as giving agency to a An
10:06agent to an AI, but even if you want
to go a bit more dumb and a bit more
10:11traditional, if you want to use some off
the shelf SaaS service, and, maybe that
10:17thing integrates with your Google account.
10:20Then you also like, you need to
give the thing somehow access.
10:23So you do like the, OAuth flow
with Google and then it asks you
10:27like, Hey, is it okay that we have
access to all of those things,
10:30that we can do all of those things?
10:33And even though Google's already offers
some pretty fine granular things there,
10:37often I feel like, Oh, actually I want
to make it even more fine granular.
10:42Wait, you're going to have like
access to all of my emails.
10:44Can I maybe just give you
access to my invoice emails
10:48if this is an invoicing thing?
10:50So I feel like it's both a bit
overwhelming to make all of those
10:54decisions upfront, like what should be
allowed, Both from a application end
10:59user perspective, me using the thing,
but then particularly also from like
11:03an application developer perspective.
11:05And, yeah, it feels like a really,
really important aspect of using the
11:10app and building, designing the app.
11:12And if that is not, intuitive and
ergonomic, then I feel it's going
11:18to, everyone's going to suffer.
11:19The application developer, they're
Probably just going to wing it, and
11:23that will mean probably too coarse of a,
granularity for application users, etc.
11:30So I'm really excited that
you're pushing forward on this.
11:33maybe also to draw the analogy,
between more traditional OAuth
11:38flows and what UCAN is providing.
11:40It's, should I think about UCAN as a
replacement for OAuth from like both,
11:46end user perspective, as well as from
an application developer perspective?
11:50Yeah exactly.
11:52so the, the underlying mechanism
is different, But we really wanted
11:56it to feel as familiar as possible.
11:59So even the early versions
of UCAN used the same token
12:02format and things like this.
12:04We've since switched over,
to some more modern formats.
12:08There are problems with JWTs.
12:10but yeah, exactly.
12:11You can think of it as, local-first OAuth
is one way of thinking about it, exactly.
12:16Right.
12:17So as an application developer,
I need to make up my mind once to
12:22say like, this is what's possible.
12:23This is what is allowed and like
define, and then the system then
12:28enforces those rules, but often I,
as an application developer get it
12:32wrong and I need to like, either
make the rules like more permissive,
12:38or or less permissive over time.
12:40And similar to how I might get wrong
a database schema and then later need
12:46to do those dreaded database schema
migrations, what is the equivalent
12:49of a schema migration, but for
UCAN capability definitions, etc.
12:56so all of the information that you
need to fulfill a request in UCAN
13:00is contained in the token itself.
13:02so, these days we have a little, policy
language, think of it a little bit
13:06like, like SAML, inside the token.
13:08And it says, okay, when you go to actually
do something with this token, the, Action
13:14has to match the following criteria.
13:16you're sending an email.
13:17So the two fields has to only be
two people inside of the company.
13:22Or, you can only send, newsletters
on Mondays or whatever it is.
13:28Right.
13:28And you can scope that down
arbitrarily, syntactically.
13:32So updating those policies is just
issuing a new certificate, to say
13:35this is what you're allowed to do now.
13:36and, you know, you can revoke
the old ones if that's needed.
13:40But I think the more interesting part of
this actually is on the far other end.
13:44So we were talking about, you know,
the developer sets these policies.
13:47And that's true, I would say,
the majority of the time.
13:50But it's not very, It doesn't
respect user agency, right?
13:55You're giving the developer all of
the agency, but the user's the one
13:58who owns whatever, let's say that
it's a text editing app, right?
14:02You know, so they own the document.
14:04Why can't they decide, you know, when
they share with somebody else what they
14:07should be able to do with that document?
14:09so in, say, you know, Google Docs, you've
got that little share button in the top
14:12corner and then says, you know, invite
people and then you can say, well, they're
14:15an editor and this person said, you know,
another admin and this is another viewer.
14:19This person can only comment.
14:21I think the UI is.
14:22You know, we'll usually stay like
that, but you could add whatever
14:26options you wanted in there, right?
14:28Why not?
14:29So when we were doing, back at Fission,
the file system work, you could scope
14:33down to say, like, well, you're allowed
to write into only this directory, for
14:37example, and that was very, very flexible.
14:39Or, you're allowed to write files
under a certain size limit, right?
14:43And so the user now can make these
decisions of like, I'm giving
14:46you access to my file system.
14:48I only want you, you know, maybe I'm, you
know, I'm thinking back to my school days,
14:52you know, a teacher and they're having
students submit, assignments to them.
14:55Well, you can only submit them
to this one directory and I don't
14:59want you filling up my entire disk.
15:00So they have to be under a
gigabyte or whatever, right?
15:04And so you can imagine scenarios
like this, where we're now inviting
15:07the end user to participate
in what should the policy be.
15:11It's not all set completely.
15:13The developer can absolutely
set it in advance, but you can
15:16also then refine it further and
further, for the user's intention.
15:19Right.
15:19I love that.
15:20Since particularly now with like LMs
and AIs in general, now a non technical
15:26user can now just in the way how they
would say to another person, like,
15:30Hey, I want to give Alice access to
this file, but Alice is only allowed
15:36to like read the first page here.
15:38The second two pages, those
are like my private notes.
15:42Please don't give anyone access to this.
15:44You know what?
15:44Like actually Alice is allowed
to also like comment on it.
15:48Like just from like a, a very like
colloquial sentence like that,
15:52a computer can now derive, those
capabilities very accurately.
15:56Represented to the user, like,
Hey, does this look right to you?
16:00And, leveling up the entire
application user experience.
16:04so it's very reassuring to me that all
of this is built on top of very sound
16:10cryptography, however, even though
I've studied computer science and like
16:14I have done my cryptography classes.
16:17That being said, I have,
that's not my day to day thing.
16:20And as an application developer, I'm
trying to steer away from like low
16:25level cryptography things as much
as possible, just because I don't
16:28consider myself an expert in this.
16:31So it's very good to know that everything
on that is built on top of very solid
16:36cryptography, but how much as an
application developer, how much do I
16:40need to deal with like signing things,
et cetera, or how much of that is
16:45abstracted from what I'm dealing with?
16:47Yeah.
16:48so I would say that there's two
layers here that people find.
16:52correctly find scary,
myself included, right?
16:56cryptography and auth in
general, both super scary topics.
16:59I remember, you know, as a web dev,
whatever, 10 years ago adding, in a
17:04web app, the, You know, the Auth plugin
and kind of going, and if I don't
17:09touch it, hopefully it'll work, right?
17:11really the goal with all these projects
was to hide as much of the scary
17:15complexities in there as possible.
17:18So we handle all of the encryption
and signing and all of this
17:21stuff in a way that should
make it, if we do our job well.
17:24Completely invisible, to the developer.
17:27So even, you know, we haven't
talked about Beehive very much.
17:29Beehive has both a, which is this,
project I'm doing, at Ink & Switch
17:33to add access control to Automerge.
17:36It has both a encryption side, so that's
read controls, and then capabilities for
17:41these mutations or, or write controls.
17:44and for encryption, there's a bunch
of things that have to happen.
17:48We have to serialize
things in an efficient way.
17:51We have to chunk them up.
17:52We have to, make sure that we share
the encryption key with everyone.
17:57but no and nobody else, right?
17:59And that could be, Thousands of
people, potentially, and we've set
18:02ourselves these, these goals of, you
know, you should be able to run, run
18:05this inside of a large organization
or a medium sized organization.
18:08how do you do all that stuff efficiently?
18:09And our goal is you should be able to
say, add these people, and it just works.
18:16You do all your normal Automerge
stuff, and on, you know, when you
18:19persist to disk, or when you send
it out to the network, then it gets
18:22encrypted, then it gets secured, then
it gets signed, all of this stuff.
18:25And you don't have to
worry about any of it.
18:27when you set up Beehive, it
generates keys, it does all the key
18:31management for you, it does all of
the key rotation, all of this stuff.
18:35so, again, it's one of these things where
it's like, I'm really excited about this.
18:40and it's like super
cool to get to work on.
18:42And there's a lot of interesting
detail on the inside, but in an ideal
18:47world, nobody has to think about
this other than I want to grant these
18:50rights to these people and everything
else is taken care of automatically.
18:54I love that.
18:55so you've motivated initially that UCAN,
happened as a project while you've been
19:01working on various projects at Fission.
19:05and right now you're
mostly focused on Beehive.
19:08So can you share a bit more, what
was the impetus for Beehive coming
19:14into existence and then going
into what Beehive is exactly?
19:19absolutely.
19:20So, you know, we started UCAN very,
very early in 2020, came out of
19:26normal, regular product requirements
of like, oh, well, we probably want
19:30everyone to read this document.
19:32How do we do that?
19:33Or I don't want somebody
to fill up my entire disk.
19:36How do we prevent that?
19:37And, that went through a bunch of
iterations and we, we had a lot
19:40of learnings come out of that.
19:42I'd say that really the big one was in
a traditional app stack, you have data
19:47at the bottom, you know, you have to say
Postgres and that's your source of truth.
19:49And then above that, you have some
computes, maybe you're running.
19:52Whatever, Express.
19:53js, or Rails, or Phoenix,
or you know, one of these.
19:57And then on top of that, you put in
an Auth plugin, right, that uses all
20:02the facilities of everything below it.
20:04but that requires that you have a
database that has all this information
20:10in it that lives at a location.
20:11We call this, internally at
Ink & Switch, auth-as-place.
20:15Right?
20:15Because your auth goes
to somewhere, right?
20:18And on every request, you present your
ID, they go, okay, sure, you know, here's
20:22a temporary token, then you hand that
to the application, the application
20:25checks with the auth, you know, server
again, and you do this whole loop.
20:28And that has, you know, problems
with latency, if you go offline,
20:32this doesn't work, and it
doesn't scale very well, right?
20:34Like, even Google ran into
problems with this and started,
20:37adjusting their auth system.
20:38we found at Fission, and I, I think
this, this Very much holds true, like
20:42we just kept learning this over and over
again, is you can't rely on that system.
20:47In fact, auth has to go at
the bottom of the stack.
20:50your auth logic and the auth, the thing
that actually does the guarding of your
20:55data has to move with the data itself.
20:57So we call this "auth as data".
20:59So for read control, it's no longer, oh,
I'm making a request to a web server and
21:04they may or may not send something to me.
21:05It's, I've encrypted it.
21:06Do you have the key?
21:08Yes or no.
21:09If you.
21:10Have the key.
21:10You can read it.
21:11If you don't, you can't.
21:11And it doesn't matter where you are.
21:14You could be on a plane
disconnected from the internet.
21:16You can decrypt the data, right?
21:19So we developed these ideas with, with
UCAN and, the web native file system,
21:23in particular, Fission unfortunately
didn't make it, earlier this year,
21:27or I, I'm not sure when this will
be released in, early in 2024.
21:31and, Ink & Switch reached out.
21:32So we, we've, we've known those folks
for a while, cause we've been, you
21:34know, obviously working in the same
space for a while and, PVH, the lab
21:39director was actually an advisor at
Fission and said, Hey, we have a bunch
21:42of people that are interested in getting,
auth for Automerge in particular.
21:48could you apply UCAN
and WNFS to Automerge?
21:53And I said, I don't see why not.
21:55Right.
21:56and so we, we looked at it, a
little bit deeper and went, well,
21:59yes, like we, we could use these
things directly, but they're tuned
22:02for slightly different use cases.
22:04UCAN is extremely powerful.
22:06It's very flexible.
22:07and it has a bunch of stuff in
it for this, you know, network
22:10layer, in addition to CRDTs.
22:13You pay for that in space, right?
22:15The certificates get a little bit bigger.
22:17And so we said, well, okay,
maybe, you know, we want these
22:20documents be as small as possible.
22:23You know, there's been a lot of work
in Automerge to do compression, right?
22:26Really, really, really
good compression on them.
22:28So the documents are tiny and, you know,
you're not going to get that with UCAN.
22:32So could we take the principles
and the learnings from UCAN and
22:35WNFS and apply them, to Automerge?
22:37And so ultimately that's what we've done.
22:41And there are a couple of
different requirements that
22:43have come out of it as well.
22:44So it's tuned for a
slightly different thing.
22:46But essentially, Beehive says, what
if we had end to end encrypted?
22:50So in the same way that, you know, say,
Signal, end to end encrypts your chats.
22:55What if I had end to
end encrypted documents?
22:58That only certain people could write into,
and I can control who can write into them.
23:03Has there been any prior art in regards
to CRDTs to fulfill those sort of
23:09like end user driven authentication
authorization requirements?
23:14there's some, some nearer term stuff that
was also exploring things with CRDTs.
23:19But, you know, if you go really,
really, you know, further back,
23:22there's, uh, the Tahoe least
authority file system, for example,
23:27which was, you know, this encrypted
at rest, file system capabilities
23:30model, you know, whole, whole thing.
23:32Mark Miller was doing capabilities based
off going back into, you know, uh, The
23:38late 90s, there's capability stuff that
goes even further back, but he's, he's,
23:41you know, really did the, the work that
everybody points at, in, in the stuff.
23:44But for CRDTs and for a local-first
context where we don't assume at
23:48all, like there's no server in the
middle whatsoever, we may have been
23:54the first to do this at Fission.
23:55It's, it's possible.
23:56I mean, when we got started,
the local-first essay hadn't
23:59even been published, right?
24:00We were doing local-first
without, without the term.
24:02but there was a bunch
of others in the space.
24:04So, Serenity Notes has done related
work, Matrix, Signal, obviously has done
24:09a bunch of the end to end encryption
stuff, and, local-first to auth, is a,
24:13a project that has also worked with,
Automerge, to do similar things.
24:17so most of these projects,
showed up, after the fact.
24:20but yeah, so we're drawing from,
in fact, we've talked to, all these
24:23people and all of the fantastic work
that they've done over the past few
24:26years, and, collected the learnings,
from them into, into Beehive.
24:31That's awesome.
24:32I would love to get a better
feeling for what it would mean
24:35to build an app with Beehive.
24:38My understanding is that Beehive right
now is very centric around Automerge.
24:42However, it is designed in a way
that over time, other CRDT systems,
24:48other sync engines, et cetera could
actually embrace it and integrate
24:52it into their specific system.
24:54I would like to get into that in
a moment as well, but zooming into
24:58the Automerge use case right now,
let's say I have already built a
25:02little side project with Automerge.
25:04I have like some Automerge documents
that are happily syncing the
25:09data between my different apps.
25:11so far I've maybe.
25:13Put the entire thing, maybe I don't even,
have any auth fences around it at all.
25:19hopefully no one knows the end
point where all of my data lives.
25:22And if so, okay.
25:24It's like not very sensitive data.
25:26or maybe I'm running all of that behind
like a tail scale network or something
25:30like that, which I think in a lot of use
cases, simpler use cases, this can also
25:34be a very pragmatic approach, by the way.
25:37when you can run the entire thing already,
like in a fully secured frame of like
25:44a, guarded network, and you, you're
just going to run this for yourself
25:47or like in your home network or for
your family and you're all on like the
25:51same, tail scale wire guard network.
25:54I think that's also a
very pragmatic approach.
25:56but, let's say I want to build an app
that I can share more publicly on the
26:01internet, where maybe I want to build a
TLDraw like thing where I can send over
26:06a link where people can read it, but
they need to have special permissions to
26:11actually also write something into it.
26:14I want to build the thing with Automerge.
26:16What does my experience look like?
26:18Yeah.
26:19there are, I would say two
parts to that question, right?
26:22One is, I have an existing documents.
26:25how do I migrate it in?
26:27And, you know, could I use it with
something, you know, you alluded to
26:30other, other systems, in, in the future.
26:33and, what does the actual,
experience building something
26:36with, with Behive look like?
26:38So Behive is still in progress.
26:40we're planning to have a first
release of it, uh, in Q1.
26:44and, you know, we're currently going
at this with with the viewpoint
26:47that like adding any auth is better
than not having auth right now.
26:50So like there's definitely like further
work where we want to like really
26:54polish off the edges of this thing but
getting anything into people's hands is
26:57better than than not having it right.
27:00and there are some changes that we
need to make to Automerge because
27:04as I mentioned before you know auth
lives at the bottom of the stack so
27:08anything above in a stack needs to
know something about the things below.
27:12Off being at the bottom means that if
you wanna do in particular mutation
27:15control, Automerge needs to know
about how to ingest that mutation.
27:18So we do need to make some small changes
to Automerge to, to make this work.
27:22but the actual experience is, we're
bundling it directly into Automerge
27:26or the current plan at least, is we're
bundling it directly into the Automerge
27:30wasm, and then exposing a handful
of functions on that, which is add
27:36member at a certain authority level.
27:40Remove member.
27:41And that's it.
27:42so your experience will be, we're going
to do all the key management for you,
27:46behind the scenes, under the hood.
27:48if you have an existing document,
it'll get serialized and encrypted
27:53and put, you know, into storage.
27:56And you can add other
people to the document.
27:58By inviting them using add member
or remove member from that document.
28:03maybe, maybe also worth noting, this gives
you a couple extra, concepts to work with.
28:08So today we have documents, and you
can have a whole bunch of them, and
28:11they're really independent pieces, right?
28:14And maybe they can refer to each
other by, you know, an Automerge URL.
28:17instead, or in addition, I should
say, not instead, you want to be able
28:22to say, I'm building a file system.
28:24If I give you access to the root of the
file system, you should have access to.
28:27The entire file system.
28:28I don't want to have to share
with you every individual thing.
28:32So we have this concept of a group.
28:34so you have your individual device, you
have groups, and you have documents.
28:39Each individual device has its own, under
the hood, you don't have to worry about
28:43this specific detail, but has its own key.
28:45So it's in, Uniquely identifiable.
28:48Somebody steals your phone, you can
kick your phone out of the group, right?
28:52Or out of the document
and that, that's fine.
28:54then we have groups.
28:55So let's say that I have a group
for everyone at Ink & Switch.
28:59and then that can add everybody
to that, but it doesn't have
29:01a document associated with it.
29:03It's purely just a way of managing
people and saying, I want to add
29:07everybody in this group to this document.
29:10Right?
29:11And so you can have groups
contain users and other groups.
29:15Then you have documents, which
are groups that have some
29:18content associated with them.
29:19So I say on this document,
here's who's allowed to see it.
29:21So it could be individuals or
other groups or other documents.
29:25Other documents is interesting
because I can say then you have
29:28access to this document, this
document represents a directory.
29:31And so you also have access
to all of its children, right?
29:33In a, in a file system, you
can do things like this.
29:36So Add member, remove member becomes
very, very powerful because now you can
29:40have groups and, you know, set up these,
hierarchies of, here's all of my devices.
29:46All of my devices sit in a
group of Brook's devices.
29:49All of Brook's devices should be
added to Ink & Switch, and Ink
29:53& Switch has the following documents.
29:54And then, you know, whenever one
of my contract finishes and I get
29:57kicked out of Ink & Switch, then
they can kick all of my devices out
30:00by, by revoking that group, right?
30:04So using, Beehive is
going to feel like that.
30:07It's going to say, yeah, I know
about the ID for Brooke's devices.
30:11Please add her or, you know,
contract finishes, please remove her.
30:15all of the rest of the stuff should
be completely invisible to you.
30:19So when you persist things to disk
or you send them to a sync server,
30:23that all gets encrypted first.
30:24And even the sync servers have permission.
30:28There's a permission level in here
of, you're allowed to ask for the,
30:33the bytes off, from another node.
30:35And they can prove that because you have
these certificates under the hood, right?
30:40because, and this is an uncomfortable
truth, all cryptography is breakable.
30:44So in 10 years, maybe they break
all of our current ciphers.
30:48Right?
30:48It could happen.
30:49In fact, older Cypher's
already, you know, broken.
30:52Or maybe quantum computing gets
very, very advanced, and it becomes
30:56practical to break keys, right?
30:58Whatever it is.
30:58Or there's an advancement in, discrete log
problem, or whatever the thing is, right?
31:03You know, we have some mathematical
advance, and it gets broken.
31:05the best thing to do, then, is to
just not make those bytes available.
31:10Make the encrypted content only
pullable by people that you trust.
31:13And yes, somebody could break
into the sync server, let's
31:17say, and download everything.
31:18But that's a much higher bar
than anybody can download.
31:21Anybody on the internet can download
whatever chunk they want, right?
31:23But all of that is handled really
for the developer to say, this is
31:26the sync server, sync server has the
ability to pull down these documents.
31:30Or even the user could say, I want to
sync to this sync server, I'm going
31:34to grant that sync server access
to my documents to replicate them.
31:37But really, we're trying to
keep the top level API for this
31:41as boring as possible, right?
31:43That is a top line goal.
31:45Add member, remove member,
and the sync server is just
31:48another member in the system.
31:51Got it.
31:52So in terms of the auth as data, that,
that mental model, that's very intuitive.
31:58And, as you're like rewiring your brain
as an application developer, like how
32:02data flows through the system, now to
understand that, like everything that's
32:07necessary to make those auth decisions,
should someone have access to, to read
32:12this, to like write this, et cetera,
that this is just data that's also being
32:17synchronized, across the different nodes.
32:20That is very intuitive.
32:22is this something that in this particular
case, at least with Beehive and Automerge,
32:27is this purely an implementation detail?
32:29And this is like your internal mental
model of this data, or is this actually
32:34data that is available somehow to the
application developer that the application
32:38developer would work with that as they
work with the normal Automerge documents?
32:43Yeah.
32:44So, Again, we're trying to hide
these details as much as possible.
32:48So, you'll hear me talking about things
like add member or groups, right?
32:52And that sounds very
access control list like.
32:56capabilities are, like there's a formal
proof of this, are more powerful.
33:00Like they can express more
things than access control lists.
33:03So at least for this first revision,
we've restricted ourselves down
33:06to making things look like access
control lists, on the outside.
33:11and so it should feel very, very
similar to doing things with role
33:15based access control using, say, OAuth.
33:20That should all feel totally normal.
33:23You shouldn't really have to
think about it in any special way.
33:25In the same way that, you know, if you
have a sync server, other than having
33:28to set up the sync server, or maybe you
pointed at an existing one, knowing that
33:32it's there doesn't mean that you have to,
like, design it from first principles.
33:35Or, you know, same thing with Automerge.
33:38Technically, you have
access to all of the events.
33:41But really you're going to materialize
a view and treat it like it's JSON.
33:45And so we're saying the same thing here
with Beehive is you will automatically
33:50get only the data that you can decrypt
and that you're allowed to receive from
33:54others and So, essentially, Beehive
takes things off the wire, decrypts
34:00it, and hands it to Automerge, and then
Automerge does its normal Automerge stuff.
34:03The one wrinkle is if an old write
has been revoked, so it turns out
34:07that somebody was, like, defacing the
document and doing all this horrible
34:10stuff, and we had to kick them out,
we have to send it to Automerge,
34:13Hey, ignore this run of changes.
34:15And then it has to recalculate.
34:17So that's the one change that we
have to make inside of Automerge.
34:19but really you will use
Automerge as normal.
34:22you will have an extra API that is
add this person to this document or
34:25to this group, and remove them, right?
34:28As needed.
34:29And you shouldn't have to
think about any of these other
34:31parts, even the sync server.
34:33Like, Alex Good, who's the, the
main maintainer of, of Automerge.
34:37has been working on, on
sync and improving sync.
34:41and that project started around the
same time as Beehive and we realized,
34:44Oh, there's actually this challenge
because we're, you know, on the
34:47security side, trying to hide as much
information from the network as possible,
34:50including from the sync server, right?
34:52Sync server shouldn't be
able to read your documents.
34:54To do efficient sync, you want to have
like a lot of information about the
34:56structure of the thing that you're
syncing so that you have no redundancy.
34:59Right?
35:00And you can do it in a few
round trips, all of this stuff.
35:02So we ended up having to co design
and essentially, like, negotiate
35:06between the two systems, like,
how, how much information can we
35:09reveal, and still have it be secure?
35:11And given that you can't read inside
the documents, like, how do we
35:15package things up in an efficient way?
35:17But again, none of that information
should be a concern for a developer
35:22in the same way that the sync system
right now, you don't really interact
35:24with the sync system, other than
you say, that's my sync server over
35:26there and the bytes go over there.
35:28There's an extra layer now
of, it gets encrypted first
35:31before it goes over the wire.
35:32That makes sense.
35:33I think as an application
developer, there's typically sort
35:36of this two pronged approach.
35:39There is like, You, on the one hand,
you ideally, you want to embrace
35:43that things are hidden from you.
35:45That you don't need to understand
them to use it correctly, et cetera.
35:49But particularly if something's
new, some, maybe you're like an
35:52early adopter of the technology.
35:54you would like to figure out like,
what are the worst case scenarios?
35:57Maybe the thing is no
longer being developed.
35:59Could I take it over and like, can
I become a contributor or maintainer
36:03of, of that, or you'd still like to
understand it for the sake of like
36:08figuring, really understanding, is this.
36:11The thing that I want.
36:13and just by like understanding how
it works, you can come to the right
36:16conclusion, like, is this for me
or not, particularly if it's not
36:19yet as well documented, et cetera.
36:21So channeling our like inner
understanding application developer.
36:27I'd like to understand a bit better of
like how, Beehive and in that regard,
36:32also the sync server works under the hood.
36:35Like, it's hard enough to
build a syncing system.
36:38and now, you build an
authorization layer on top of it.
36:42What sort of implications does
this have for the sync server?
36:46And my understanding is that Alex
Good is working on this and I think
36:50this has been semi public so far.
36:52And that there's like a, you know,
like a sibling product or a sibling
36:56project, next to Beehive called
Beelay, which I guess like relays
37:01messages in the Beehive system.
37:03And I think that's a step towards what
eventually, we're all dreaming about as
37:09like a generic sync server that ideally
is compatible with like as many things
37:14as possible, I guess, at the beginning
for Automerge, but also beyond that.
37:19So what is Beelay?
37:21What are its design goals
and how does it work?
37:25So Beelay, has a requirement that it
has to work with, encrypted chunks.
37:30So, you know, we do this compression
and then encryption, on top of it,
37:34and then send that to the Sync Server.
37:36The Sync Server can see, because it
has to know who it can send these
37:39chunks around to, the membership.
37:41So Sync Server does have
access to the membership.
37:44of each doc, but not the
content of the document.
37:47so if you make a request, it checks,
you know, okay, are you somebody
37:50that, has the, the rights to, to
have this sent to you, yes or no,
37:53and then it'll send it to you or not.
37:55And this isn't only for sync servers,
you know, if you connect to somebody,
37:58you know, directly over Bluetooth, you
know, you'd do the same thing, right?
38:01Even if, you know, you
can both see the document.
38:04There's nothing special
here about sync servers.
38:06To do this sync, well, we're no
longer syncing individual ops, right?
38:10Like, we could do that, but
then we lose the compression.
38:13It's not great, right?
38:15And ideally, we don't want people to
know, you know, if somebody were to
38:19break into your server, hey, here's how
everything's related to each other, right?
38:22Like, that compression and
encryption, you know, also hides
38:25a little bit more of this data.
38:27We do show the links between these,
you know, compressed chunks, but
38:30we'll, we'll get to that in a second.
38:32Essentially what we want to do is chunk
up the documents in such a way where,
38:38there's the fewest number of chunks to
get synced, and the longer ranges that
38:43we have of, you Automerge ops that we get
compressed before we encrypt it, right?
38:48On the, I'll call it client.
38:50It's not really a client in
a local-first setting, right?
38:52But like not on the not sync server
when you're sending it to it.
38:55the more stuff that you have,
the better the compression is.
38:58And chunking up the document here
means basically, you're really
39:02chunking up the history of operations
that then get internally rolled up
39:07into one snapshot of the document.
39:09And that could be very long.
39:11And, there's room for optimization.
39:14That is like the, the compression here,
where if you set a ton of times, like,
39:19Hey, the name of the document is Peter.
39:22And later you say like, no, it's Brooke.
39:24And later you say, no, it's Peter.
39:26No, it's Johannes.
39:28Then you, you can like compress it into,
for example, just the latest operation.
39:33Yeah, exactly.
39:34So, you know, if you want to think about
how this, you know, to get, to get more
39:37concrete, you know, if you take this
slider all the way to one end and you take
39:40the entire history and run length encoded,
you know, do this Automerge compression,
39:45you get very, very good compression.
39:47If we take it to the far other
end, we go really granular.
39:50Every op, doesn't get compressed, but you
know, so it's just like each individual
39:55op, so you don't get compression.
39:56So there's something in between
here of like, how can we chop up
39:59the history in a way where I get
a nice balance between these two?
40:04When Automerge receives new ops, It has
to know where in the history to place it.
40:10So you have this partial order,
you know, you have this, you
40:12know, typical CRDT lattice.
40:14And then, we put that, or it
puts it into a strict order.
40:18It orders all the events and
then plays over them like a log.
40:21And this new event that you get,
maybe it becomes the first event.
40:24Like you could go way to the
beginning of history, right?
40:26Like you, you don't know because
everything's eventually consistent.
40:29So if you do that linearization
first and then chop up the documents,
40:34you have this problem where.
40:36If I do this chunking, or you do this
chunking, well, it really depends
40:39on what history we have, right?
40:41And so it makes it very, very difficult
to have a small amount of redundancy.
40:46So we found, two techniques
helped us with this.
40:49One was, we take some particular,
operation as a head and we
40:55say, ignore everything else.
40:56Only give me the history
for this operation.
40:58Only instruct ancestors.
41:00So even if there's something concurrent,
forget about all of that stuff.
41:04So that gets us something stable
relative to a certain head.
41:08And then to know where the
chunk boundaries are, we
41:13run a hash hardness metric.
41:15So, the number of zeros at the
end of the hash for each op, gives
41:20you, you know, you can basically
say, you know, each individual op,
41:23there may or may not be a 0, 0, 0, so
I'm, I'm happy with, with anything.
41:28Or if I want it to be a range of, you
know, 4, then give me two 0s at the
41:32end, because that will be, you know, 2
to the power of 2 is 4, so I'll chunk
41:35it up into 2s, and you, you make this
as big or as small as you want, right?
41:38So now you have some way of
probabilistically chunking up the
41:41documents, relative to some head.
41:44And you can say how big you want that to
be based on this hash hardness metric.
41:47the advantage of this is even if
we're doing things relative to
41:51different heads, now we're going to
hit the same boundaries for these
41:54different, hash hardness metrics.
41:56So now we're sharing how we're
chunking up the document.
41:59And we, Assume that on average,
not all the time, but like on
42:04average, older, operations will
have been seen by more people.
42:08So, or, you know, more and more peers.
42:11So, you're going to be appending things
really to the end of the document, right?
42:17So you, you will less frequently
have something concurrent with the
42:20first operation using this system.
42:22That means that we can get really
good compression on older operations.
42:28Let's take, I'm just picking numbers
out of the air here, but let's take
42:30the first two thirds of the document,
which are relatively stable, compress
42:34those, we get really good compression.
42:36And then encrypt it and
send it to the server.
42:38And then for the next, you know, of
the remaining third, let's take the
42:42first two thirds of that and compress
them and send them to the server.
42:46And then at some point we
get each individual op.
42:48This means that as the, the
document grows and changes.
42:52We can take these smaller chunks and as
that gets pushed further and further into
42:56history, we can, whoever can actually
read them, can recompress those ranges.
43:02So, Alex has this, I think, really
fantastic, name for this, which is
43:06sedimen-tree because it's almost acting in
sedimen-tree layers, but it's sedimen-tree
43:12because you get a tree of these layers.
43:14Yeah, it's cute, right?
43:15and so if you want to do a sync,
like let's say you're doing a sync
43:18of like completely fresh, you've
never seen the document before.
43:21You will get the really big chunk,
and then you'll move up a layer,
43:25and you'll get the next biggest
chunk of history, and then you move
43:27up a layer, and then eventually
get like the last couple of ops.
43:30So we can get you really good
compression, but again, it's this
43:32balance of the these two forces.
43:35Or, if you've already seen the
first half of the document, you
43:38never have to sync that chunk again.
43:39You only need to get these higher
layers of the sedimentary sync.
43:44So that's how we chunk up the document.
43:46Additionally, and I'm not at all
going to go into how this thing works,
43:49but if people are into sync systems,
this is like a pretty cool paper.
43:53It's called Practically Rateless Set
Reconciliation is the name of the paper.
43:57And it does really interesting things
with, compressing how, all the information
44:02you need to know what the other side has.
44:04So in half a round trip, so in one
direction on average, you can get all
44:09the information you need to know what
the delta is between your two sets.
44:13Literally, what are, what's the handful
of ops that we've diverged by without
44:18having to send all of the hashes?
44:20so if people are into that
stuff, go check out that paper.
44:22It's pretty cool.
44:23but there's a lot of detail in
there that we're not, we're not
44:25going to cover on this podcast.
44:26Thanks a lot for explaining.
44:29I suppose it's like, Just a tip of
the iceberg of like how Beelay works,
44:33but I think it's important to get a
feeling for like, this is a new world
44:37in a way where it's decentralized,
it is encrypted, et cetera.
44:42There's like really hard constraints what
certain things can do since you could
44:47say like in your traditional development
mindset, you would just say like, yeah,
44:52let's treat the client like it's just
like a, like a Kindle, with like no
44:56CPU in it let's have the server do as
much as the heavy lifting as possible.
45:01I think that's like a, the
muscle that we're used to so far.
45:04But in this case, the server, even if it
has a super beefy machine, et cetera, it
45:11can't really do that because it doesn't
have access to do all of this work.
45:15So the clients need to do it.
45:17And, and when the clients
independently do so, They need to
45:21eventually end up in the same spot.
45:23Otherwise the entire system, falls
over or it gets very inefficient.
45:27So that sounds like a really
elegant system that, that you're
45:30like working on in that regard.
45:32So with Beehive overall, like
again, you're starting out here with
45:38Automerge as the driving system that
drives the requirements, et cetera.
45:43But I think your, bigger ambition
here, your bigger goals, is that this
45:48actually becomes a system that is,
that at some point goes beyond just
45:54applying to Automerge, and that being a
system that applies to many more other
45:59local-first technologies in the space.
46:01If there are application framework authors
or like, like other people building a
46:07sync system, et cetera, and they'd be
interested in seeing like, Hmm, instead
46:11of like us trying to come up with our
own, research here for like what it
46:17means to do, authentication authorization
for our sync system, particularly if
46:23you're doing it in a decentralized way.
46:25What would be a good way for those
frameworks, those technologies to
46:30jump on the, the Beehive wagon.
46:33so if they're already using
Automerge, I think that'll be
46:37pretty straightforward, right?
46:38You'll have bindings, it'll just work.
46:40but Beehive doesn't have a hard
dependency on Automerge at all.
46:45because it lives at this layer below and
we, Early on, we're like, well, should
46:50we just weld it directly into Automerge?
46:51Or like, you know, how much does
it really need to know about it?
46:55and where we landed on this was
you just need to have some kind
46:58of way of saying, here's the
partial order between these events.
47:02and then everything works.
47:04So, as, just as a intuition.
47:07You could put Git inside of, Beehive,
and it would work, I don't think
47:11GitHub's gonna adopt this anytime
soon, but like, if you had your own
47:14Git syncing system, like, you, you
could do this, and, and it would work.
47:18you just need to have some way of
ordering, events next to each other.
47:22and yes, then you have to get a little
bit more into slightly lower level APIs.
47:27So I, when I build stuff, I tend to
work in layers of like, here's the very
47:32low level primitives, and then here's
a slightly higher level, and a slightly
47:35higher level, and a slightly lower level.
47:37so people using it from Automerge
will just have add member, remove
47:40member, and like, everything works.
47:41to go down one layer, you have to wire
into it, here's how to do ordering.
47:47And that's it.
47:48And then everything else should,
should wire all the way through.
47:51And you have to be able to
pass it, serialized bytes.
47:53So, like, Beehive doesn't know anything
about this compression that we were
47:56just talking about that Automerge does.
47:58But you tell it, hey, this is, you
know, this is some batch, this is
48:02some, like, archive that I want to do.
48:03It starts at this timestamp
and ends at that timestamp,
48:06or, you know, logical clock.
48:07please encrypt this for me.
48:09And it goes, sure, here you go.
48:10Encrypted.
48:11And, you know, off it goes.
48:12So it has very, very few, assumptions
48:15That's certainly something that I might
also pick up a bit further down the
48:18road myself for, for LiveStore where
the underlaying substrate to sync data
48:23around is like a ordered event log.
48:26And, if I'm encrypting those events.
48:29then I think that fulfills, perfectly
the requirements that you've listed,
48:34which are very few for, for Beehive.
48:37So I'm really looking forward
to once that gets further along.
48:40So speaking of like, where
is Beehive right now?
48:43I've seen the, lab notebooks from what
you have been working on at Ink & Switch.
48:49can I get my hands on
Beehive already right now?
48:52Where is it at?
48:54what are the plans for the coming years?
48:56So at the time that we're recording
this, at least, which is in early
48:59December, there's unfortunately not,
not a publicly available version of it.
49:02I really hoped we'd have it ready by now,
but, unfortunately we're still, wrapping
49:06up the last few, items in, in there.
49:09but, Q1, we plan to have, a release.
49:12as I mentioned before, there are some
changes required, to Automerge to consume.
49:16specifically to, to
manage revocation history.
49:19So somebody got kicked out, but we're
still in this eventually consistent world.
49:23Automerge needs to know
how to manage that.
49:24But.
49:25Managing things, sync, encryption,
all of that stuff, we, we hope to have
49:30in, I'm not going to commit, commit
the team to any particular, timeframe
49:33here, but like, we'll, we'll say in
the next few, in the next coming weeks.
49:37right now the team is, myself.
49:39John Mumm, who joined a couple months
into the project, and has been working
49:43on, BeeKEM, focused primarily on BeeKEM,
which is a, again, I'm just going to
49:48throw out words here for people that
are interested in this stuff, related to
49:51TreeKEM, but we made a concurrent, Which
is based on, MLS or one of the primitives
49:55for, for messaging layer security.
49:57he's been doing great work there.
49:58And, Alex, amongst the many, many
things that Alex Good does between
50:02writing the sync system and maintaining
Automerge and all of these, you
50:07know, community stuff that he does,
has also been, lending a hand.
50:11So I'm sure there's like for,
for Beehive in a way you're, Just
50:15scratching the surface and there's
probably enough work here for, to
50:19fill like another few years, maybe
even decades worth of ambitious work.
50:24Can you paint a picture of like, what
are some of like the, like right now
50:28you're probably working through the kind
of POC or just the table stakes things.
50:33What are some of like the, way
more ambitious longterm things
50:36that you would like to see in
under the umbrella of Beehive?
50:39Yeah.
50:40So, There's a few.
50:41Yes.
50:42and we have this running list internally
of like, what would a V2 look like?
50:45So, one is, adding a
little policy language.
50:48I think it's just like the, bang
for the buck that you get on having
50:51something like UCAN's policy language.
50:53It's just so high.
50:54It just gives you so much flexibility.
50:56hiding the membership, from even
the sync server, is possible.
51:00it's just requires more engineering.
51:02so there are many, many places in
here where, zero knowledge proofs, I
51:06think, would be very, Useful, for, for
people who knows, know what those are.
51:09essentially it would let the sync
server say, yes, I can send you bytes
51:14without knowing anything about you.
51:16Right,
51:17but it would still deny others.
51:19And right now it basically needs
to run more logic to actually
51:22enforce those auth rules.
51:25Yeah.
51:25So today you have to, sign a message
that says, I signed this with the same
51:30private key that you know about the
public key for in this membership, we
51:36can hide the entire membership from
the sync server and still do this.
51:39Without revealing even who's
making the request, right?
51:41Like, that would be awesome.
51:43in fact, and this is a bit of a
tangent, I think there's a number
51:45of places where, that class of
technology would be really helpful.
51:49Even for things like, in CRDTs,
there's this challenge where you have
51:53to keep all the history for all time.
51:55and I think with zero knowledge proofs,
we can actually, like, this, this would
51:58very much be a research project, but I, I
think it's possible to delete history, but
52:02still maintain cryptographic proofs, that
things were done correctly and compress
52:06that down to, you know, a couple bytes,
basically, but that's a bit of a tangent.
52:10I would love to work on that at some
point in the future, but for, for
52:13Beehive, yeah, hiding more metadata,
Hiding, you know, the membership
52:17from, from the group, making it,
all the signatures post quantum.
52:21that is like even the main,
recommendations from, from NIST, the U.
52:26S.
52:26government agency that that handles
these things only just came out.
52:30So, you know, we're still kind of waiting
for good libraries on it and, you know,
52:34all, all of this stuff and what have you.
52:36But yeah, making it post quantum, or
fully, big chunks of it are already
52:40post quantum, but making it fully
post quantum, would, would be great.
52:43and then yeah, adding all kinds of, bells
and whistles and features, you know,
52:46making it faster, adding, it's not going
to have its own compression, because it
52:50relies so heavily on cryptography, So
it doesn't compress super well, right?
52:54So we're going to need to figure
out our own version of, you know,
52:58Automerge has run length encoding.
52:59What is our version of that, given
that we can't run length encode
53:02easily, encrypted things, right?
53:04Or, or signatures or, you
know, all, all of this.
53:06so there's a lot of stuff,
down, down in the plumbing.
53:08Plus I think this policy language
would be really, really helpful.
53:11That sounds awesome.
53:12Both in terms of new features,
capabilities, no pun intended, being
53:16added here, but also in terms of just,
removing overhead from the system and like
53:22simplifying the surface area by doing,
more of like clever work internally,
53:27which simplifies the system overall.
53:29That sounds very intriguing.
53:31The, the other thing worth noting with
this, just, I think both to show point
53:35away into the future and then also
draw a boundary over where what Beehive
53:39does and doesn't do, is identity.
53:41so Beehive only knows about public
keys because those are universal.
53:46They work everywhere.
53:47They don't require a naming
system, any of this stuff.
53:50we have lots of ideas and opinions
on how to do a naming system.
53:55but you know, if, if you look at,
for example, uh, BlueSky, under
53:58the hood, all of the accounts are
managed with public keys, and then
54:02you map a name to them using DNS.
54:04So either you're using, you know, myname.
54:07bluesky.
54:07social, or you have your own
domain name like I'm expede.Wtf
54:12on BlueSky, for example, right?
54:13Because I own that domain name
and I can edit the text record.
54:15and that's great and it, definitely
gives users a lot of agency over
54:20how to name themselves, right?
54:21Or, you know, there are
other related systems.
54:24But it's not local-first
because it relies on DNS.
54:28So, like, how could I invite you to a
group without having to know your public
54:32key, We're probably going to ship,
I would say, just because it's like
54:35relatively easy to do, a system called
Edge Names, based on pet names, where
54:40basically I say, here's my contact book.
54:42I invited you.
54:43And at the time I
invited you, I named you.
54:45Johannes right?
54:46And I named Peter, Peter, and so on and
so forth, but there's no way to prove
54:52that that's just my name for them.
54:54Right.
54:54And for these people, and having
a more universal system where
54:59I could invite somebody by like
their email address, for example, I
55:02think would be really interesting.
55:03Back at Fission, Blaine Cook.
55:06Who's also done a bunch of stuff with
Ink & Switch in the past, had proposed
55:09this system, the NameName system,
that would give you local-first names
55:12that were rooted in things like email,
so you could invite somebody with
55:17their email address and A local-first
system could validate that that person
55:21actually had control over that email.
55:23It was a very interesting system.
55:25So there's a lot of work to be done in
identity as separate from, authorization.
55:29Right, yeah.
55:30I feel like there just always, There's
so much interesting stuff happening
55:35across the entire spectrum from, like,
the world that we're currently in,
55:40which is mostly centralized, for just
in terms of, like, that things work at
55:45all, and even there, it's hard to keep
things up to date and, like, working,
55:50et cetera, but we want to aim higher.
55:54And one way to improve things a lot is
like by going more decentralized but
55:59there's like so many hard problems to
tame and like, we're starting to just peel
56:04off like the layers from the onion here.
56:07And, Automerge I think is a, is a great,
canonical case study there, like it has
56:12started with the data and now things
are around, authorization, et cetera.
56:17And like, then authentication,
identity there, we probably have
56:21enough research work ahead of us
for, for the coming decades to come.
56:25And super, super cool to see that so
many bright minds are working on it.
56:29maybe one last question
in regards to Beehive.
56:34When there's a lot of cryptography
involved, that also means there's
56:38even more CPU cycles that need
to be spent to make stuff work.
56:43have you been looking into some,
performance benchmarks, when you, let's
56:48say you want to synchronize a certain,
history of Automerge for some Automerge
56:54documents, with Beehive disabled and
with Beehive enabled, do you see like
57:00a certain factor of like how much it
gets slower with, Beehive and sort of
57:05the authorization rules applied both
on the client as well as on the server?
57:10Yeah.
57:10So, it's a great question.
57:12so obviously there's different
dimensions in, in Beehive, right?
57:14So for encryption, which is where I
would say most people would expect there
57:19to be the, the performance overhead.
57:21There's absolutely overhead there.
57:22You're, you're doing decryption, but
we're using algorithms that decrypt on the
57:26order of like multiple gigabytes a second.
57:29So it's fine, basically.
57:32and that's also part of why we wanted
to chunk things up in this way,
57:35because when we get good compression,
you know, all, all of this stuff.
57:37So if you're doing like a total, you know,
first time you've seen this document,
57:42you've got to pull everything and decrypt
everything and hand it off to Automerge.
57:45the, the encryption's not.
57:46going to be the bottleneck.
57:48and then on like a rolling basis, like
as you know, per keystroke, yes, there
57:53there's absolutely overhead there, but
remember this is relative to latency.
57:59So if you have 200 milliseconds of
latency, that's your bottleneck.
58:03It's not going to be the five milliseconds
of, of encryption that we're doing or
58:08signatures or, or whatever it is, there's
a space cost because now we have to keep.
58:14Public keys, which are 32 bytes,
and signatures, which are 64 bytes.
58:19So there is some overhead in space.
58:22that happens.
58:23but for the most part we've taken,
we've chosen algorithms that
58:26are known to be very, very fast.
58:28They're, they're sort of like
the, the, the best in class.
58:30So I'll just rattle down, down, down
a list for the, the, the, the best.
58:33People that are interested.
58:34so we're using, EdDSA Edwards Keys for
signatures, and key exchange, chacha
58:40for encryption, and BLAKE3 for hashing.
58:44BLAKE3 is very interesting what you do.
58:45Things like verifiable, streams.
58:47So like as you're streaming the
data in, you can start hashing even
58:50parts of it as you're going along.
58:52the really big, bottleneck, the,
like, the heaviest part of the system.
58:57or, or sorry, the part that we were at
least happy with our original design on
59:00that we then ended up doing a bunch of
research on was, doing key agreement.
59:06So if I have whatever, a thousand
people in a company, and they're all,
59:12you know, working on this document,
I don't want to have to send a
59:14thousand messages every time I change
the key, which will be rotated.
59:18every message, let's say, or you
know, once a day, if we're being,
59:22you know, more conservative with it.
59:24and that's a lot of data and
a lot of just like latency on
59:27this and just a lot of network.
59:29So we switched to, instead of
it being linear, we found a way
59:32of doing it in logarithmic time.
59:35So we can now do key rotations
concurrently, like totally eventually
59:39consistently, in log n time.
59:41and That has been, a lot of research,
happened in there, but then that let
59:47us scale up much, much, much more.
59:48So the prior algorithm that we were
using off the shelf from a paper
59:52scaled up to, in the paper, they
say about like 128 people, right?
59:55It's sort of like your upper bound and
we're like, uh, you know, we had set
59:58ourselves these, these higher, levels
that we actually want to work with.
1:00:02and so now we can scale
into, into the thousands.
1:00:05When you get up to 50,000 people,
yeah, it starts to slow down.
1:00:07You start to get into, you know,
closer to a second if you're doing,
1:00:11very, very concurrent, you know,
uh, 40,000 of the 50,000 people
1:00:14are doing concurrent key rotations.
1:00:16Doesn't happen very often,
but like it could happen.
1:00:19if one person's doing an
update, then it'll happen.
1:00:21in, like you won't even notice it.
1:00:23Right.
1:00:24So it depends on how heavily
concurrent your document is.
1:00:26Do you have 40, 000 people
writing to your document?
1:00:28Yeah.
1:00:28You're going to see it
slow down a little bit.
1:00:30It's so amazing to see that.
1:00:32I mean, in academia, there is so much
progress in those various fields.
1:00:36And I feel like in local-first, we
actually get to benefit and like directly
1:00:42apply a lot of like those, those great
achievements from other places where
1:00:45like, we can now like it makes a, Big
difference for the applications that
1:00:49we'll be using, whether there is a
cryptographic breakthrough in efficiency
1:00:53or being more long term secure, et cetera.
1:00:57And like, I fully agree that latency
is probably by far the most important
1:01:02one when it comes to does it make a
difference or not, but if my, like
1:01:06battery usage, et cetera, is another one.
1:01:08And like, If I synchronize data a lot,
maybe I open a lot of data, like a lot
1:01:13of documents just once because maybe
I'm reviewing documents a lot and
1:01:17like someone sends it, or maybe I'm
an executive, I get to review a lot of
1:01:20documents and I like, I don't really
amortize the documents too much because
1:01:26I don't reuse them on a day to day basis.
1:01:28I think that initial sync also
tends to matter quite a bit.
1:01:33But, it's great to hear that,
efficiency seems to be already,
1:01:37very well under control.
1:01:39So maybe rounding out this, you've been
at Fission, you've been seeing, like, the
1:01:45innovation around local-first in, like,
three buckets auth data and compute.
1:01:51As mentioned before, on this
podcast, we've mostly been
1:01:54exploring the data aspect.
1:01:56Now we went quite deep on some
of your work in regards to auth.
1:02:01We don't have too much time to spend
on something else, but I'm curious
1:02:06whether you can just seed some ideas in
regards to what does, where does compute
1:02:12fit in this new local-first world?
1:02:15Like, if you could fork yourself and like
do a lot more work, what would you do be
1:02:21doing in regards to that compute bucket?
1:02:23Yeah.
1:02:24So, we, we had a project,
related to compute at Fission,
1:02:27right, right at the end.
1:02:29and, I'm very fortunate that I actually
have some grants to continue that
1:02:32work after I finish with Beehive.
1:02:33I'll switch to that and then, after
that project, see what else is,
1:02:36is, is interesting kicking around.
1:02:38but, essentially the motivation is, all
the compute for local-first stuff happens
1:02:43Completely locally today, or you're
talking to some cloud service, right?
1:02:47Like maybe you're using an LLM.
1:02:48So you go to, you know, use the
open AI APIs, that kind of thing.
1:02:53but what if you're on a very low
powered device and you're on a plane?
1:02:58Right.
1:02:59you know, you still need to be able
to do some compute some of the time.
1:03:02So the, the trade off that we're
trying to, to strike in, in these
1:03:05kinds of projects is, what if I
can always run it even slowly?
1:03:08So let's say I'm rendering a 3D
scene and it's gonna take a, a minute
1:03:11to paint, versus I have a, desktop
computer, you know, nearby and I can
1:03:18farm that drop out to that machine
because it's nearby in latency,
1:03:22and it has more compute resources.
1:03:25Or maybe, I need to send email to a mail
server that only exists in one place.
1:03:30Like, how can I do these, you know,
compute dynamically where I can
1:03:35always run my jobs or my resource
management whenever, whenever possible.
1:03:40Email server is a case where
you can't always do this, right?
1:03:42But when somebody else could run it.
1:03:45Maybe I can farm that out to them instead.
1:03:46so there's a lot of interest, I think,
in how do we bridge between what is
1:03:53sometimes called in the blue sky world,
big world versus small world, right?
1:03:56So I have my local stuff.
1:03:57I'm doing things entirely on my own.
1:03:59I'm completely offline.
1:04:00And that is the baseline.
1:04:02But when I am online, how
much more powerful can it get?
1:04:06Can I, you know, I'm not going to ingest
the entire blue sky firehose myself.
1:04:10I'm going to leave that to an indexer.
1:04:12To do for me.
1:04:13So when I'm online, maybe I
can get better search, right?
1:04:17Things like this, or maybe if I'm
rendering PDFs, maybe I want to farm
1:04:20that out to some, server somewhere rather
than doing that with Wasm in my browser.
1:04:25So kind of progressively
enhancing the app.
1:04:28And I think, there's a lot of
like recent, Oh, even more relevant
1:04:31with AI, but like with AI, this is
particularly more relevant because
1:04:35now suddenly, we get lot of work.
1:04:38to be done that get massively
benefits from a lot of compute.
1:04:43And with AI, in particular, I
think it's also like, now we're
1:04:47in this, in this tricky spot.
1:04:49Either we already get to live in the
future, but that means, typically all of
1:04:54like our Our AI intelligence is coming
from like some very beefy servers and
1:04:58some data centers and the way how I get
that instant, those, these enhancements
1:05:03is by just sending over all of like
my context data into those servers.
1:05:09well, I guess you could get those
beefy servers also, next to your
1:05:13desk, but that is a very expensive
and I think not very practical.
1:05:17I guess step by step, like now the
newest MacBooks, et cetera, are already
1:05:21like very capable and running things
locally, but there will be always like
1:05:26a reason that you want to, fan things
out a bit more, but doing so in a
1:05:30way that preserves like your, privacy
around your data, et cetera, like
1:05:34leverages your, your resources properly.
1:05:37Like, if I'm just looking around
myself, like I have an iPad over here,
1:05:41which sits entirely idle, et cetera.
1:05:44So.
1:05:45It's as with most things, in regards
to application developers, if it's
1:05:50the right thing, it should be easy
and doing, compute in sort of a
1:05:55distributed way is by far not easy.
1:05:58So very excited to, to hear that
you want to explore this more.
1:06:02Yeah.
1:06:02Well, and you know, especially things
like AI, you know, the, the question
1:06:06always is I should never be cut off
from, from performing actions, if
1:06:10possible, like when possible, sometimes
something lives at a particular
1:06:13place and I'm not connected to it.
1:06:15Fine, right?
1:06:16email being, you know, the
canonical example here.
1:06:18Mail server lives in one place.
1:06:19Okay, fine.
1:06:21but why not with an LLM?
1:06:23Like, maybe I run a smaller,
simpler LLM locally.
1:06:27And then again, when I'm connected and
I'm online, I just get better results.
1:06:30I get better answers.
1:06:32so I'm never totally, totally cut off.
1:06:34mean, there's plenty of research
on distributed machine learning
1:06:38and all of this stuff, but that's
like, I would say in the future.
1:06:41just kind of to put an
arc on all of this stuff.
1:06:43and everybody's seen my talks before
has probably heard me give, give
1:06:46this short spiel, once or twice.
1:06:48but you know, in, in the nineties,
when we were developing the web, right.
1:06:52As opposed to the internet.
1:06:54the assumption was that you
had a computer under your desk.
1:06:57It was a beige box that you would turn
on and you would turn it off sometimes.
1:07:00Right.
1:07:00It was the last time you actually
turned off your, your laptop,
1:07:02or your phone for that matter.
1:07:04And when you wanted to connect to the
internet, you'd tie up your phone line.
1:07:08That's no good.
1:07:09So you would rent from somebody
else, something that was always
1:07:12online with a lot of power.
1:07:14And we now live in a different world, but
we're still, you know, the centralized,
1:07:18you know, or the, the cloud systems
rather, all have this assumption of,
1:07:23well, we have more power and we're more
online and are better connected than you.
1:07:28Okay.
1:07:29That's true, but how many things do
we, does that actually matter for?
1:07:32And with systems like Automerge and, you
know, local-first things developing, it's
1:07:36like, actually, you know what, my, my
machines are fast enough now where I can
1:07:41keep the entire log of the entire history.
1:07:43And it's fine because we can compress it
down to a couple hundred K and it's okay.
1:07:48And I'm fast enough to
play over the whole log.
1:07:50And we can do all of this eventually
consistent stuff and it doesn't
1:07:53completely, you know, hurt the
performance of my application.
1:07:56It's massively simplifying
the architecture.
1:07:59Things have gotten out of hand.
1:08:00So there is this dividing line between
things that are still, you know, the
1:08:06cloud isn't completely the enemy.
1:08:09They do have some advantages, right?
1:08:12But they don't, not everything
needs to live there.
1:08:14And so we're moving into this
world of like, how much can we
1:08:16pull back down into our individual
devices and get control over them?
1:08:20Yeah, I love that.
1:08:21I think that very neatly
summarizes a huge aspect why
1:08:26local-first talks to so many of us.
1:08:29So I've learned a lot in this
conversation and I'm really
1:08:34excited to get my hands on Beehive.
1:08:37As it becomes more publicly available,
hopefully already a lot closer to the
1:08:43time when the, this episode comes out.
1:08:45In the meanwhile, if someone got really
excited to get their hands dirty and
1:08:51like digging into some of the knowledge
that you've shared here, I certainly
1:08:55recommend checking out your amazing talks.
1:08:59I have still a lot of them on my
watch lists and like our, I think
1:09:02there's many shared interests that
we didn't go into this episode here.
1:09:06Like you're also, a lot into
functional programming, et cetera.
1:09:09And I think you're, you're like going
really deep on Rust as well, et cetera.
1:09:13So lots for me to, to learn.
1:09:15But, If you can't wait to get your
hands on beehive, I think it's also very
1:09:20practical to, play around with UCAN.
1:09:23I think there are a bunch of,
implementations for, for various language
1:09:26stacks, and that is something that you
can already build things with today.
1:09:32and I think, it's not like
that Beehive will fully replace
1:09:34UCAN or the other way around.
1:09:36I think there will be use cases where
you can use both, but this way you
1:09:39can already get in the right mental
model, and, and be ready, Beehive
1:09:44ready when, when it gets available.
1:09:47So that's certainly, what I would
recommend folks to check out.
1:09:50Is there anything else you would like
the audience to do, look up or watch?
1:09:56Yeah, so definitely keep an eye
on the the Ink & Switch, webpage.
1:10:00we have lab notes, at the
time of this recording.
1:10:03There's just the one note up there,
but I'm, I have a whole bunch of
1:10:06them, like many, in draft that I
just need to clean up and publish.
1:10:10we'll also be releasing an essay,
Ink & Switch style essay, on, on
1:10:14this whole project, in the new year.
1:10:16And, yeah, keep, keep an eye out
for, for when this all gets released.
1:10:20there's a bunch of stuff coming, in
Automerge, in, in the new years, I
1:10:23can't remember if it's Automerge V2
or V3, but there's, you know, some,
1:10:27some, some branding with it of like
much faster, lower memory footprint,
1:10:31better sync, and, and security.
1:10:33And like all of these sort of, you
know, big, big headline features.
1:10:35So definitely keep an eye on, all
the stuff happening in Automerge.
1:10:38That's awesome.
1:10:39Brooke, thank you so much for
taking the time and sharing
1:10:42all of this knowledge with us.
1:10:44super appreciated.
1:10:45Thank you.
1:10:45Thank you so much for having me.
1:10:47Thank you for listening to
the Local First FM podcast.
1:10:49If you've enjoyed this episode and
haven't done so already, please
1:10:52Please subscribe and leave a review.
1:10:54Please also share this episode
with your friends and colleagues.
1:10:57Spreading the word about the
podcast is a great way to support
1:11:00it and to help me keep it going.
1:11:03A special thanks again to Convex and
ElectricSQL for supporting this podcast.
1:11:07See you next time.