March 25, 2025

#22 – Paul Butler: Jamsocket

All episodes

March 25, 2025

#22 – Paul Butler: Jamsocket

Transcript

Dowload transcript

0:00:00 Intro

0:00:00 There's database.

0:00:01 sync engines and then there's document sync engines.

0:00:04 So for database sync engine, I think of things like Linear, like things

0:00:07 where you have some relational model data, you probably don't

0:00:11 want the client to have all of it.

0:00:12 You kind of have, the client storing some subset of a database

0:00:16 for each, for each account.

0:00:18 Maybe you're sharing this data across multiple people in that account.

0:00:22 On the document sync side you're sending all of the data down to the client.

0:00:27 The unit of data that gets synced is in memory size on the browser.

0:00:31 You're not dealing with like a terabyte of data here.

0:00:33 you're not taking a subset of it.

0:00:34 You're synchronizing the entire document.

0:00:36 This would be kind of things like Figma or Google Docs, where there's a full local

0:00:41 copy of some self standing piece of data.

0:00:45 Welcome to the localfirst.fm podcast.

0:00:47 I'm your host, Johannes Schickling and I'm a web developer, a startup founder, and

0:00:51 love the craft of software engineering.

0:00:53 For the past few years, I've been on a journey to build a modern, high quality

0:00:57 music app using web technologies, and in doing so, I've been falling down the

0:01:01 rabbit hole of local-first software.

0:01:04 This podcast is your invitation to join me on that journey.

0:01:07 In this episode, I'm speaking to Paul Butler, founder of Jamsocket,

0:01:11 and creator of the Y-Sweet Project.

0:01:14 In this conversation, we talk about building versus buying a sync engine

0:01:17 and explore the various projects behind Jamsocket, including Plane, Y-Sweet,

0:01:22 and Forever VM Before getting started, also a big thank you to ElectricSQL

0:01:28 and Jazz for supporting this podcast.

0:01:30 And now my interview with Paul.

0:01:33 Hey, Paul, so nice to have you on the podcast.

0:01:35 How are you doing?

0:01:37 I'm good.

0:01:37 Thank you, Johannes.

0:01:38 I'm excited to be here and been listening since the beginning.

0:01:41 Thank you so much.

0:01:42 The two of us had the pleasure to meet in person at last year's

0:01:45 local-first conf, and I'm hoping to see you there again this year.

0:01:49 So for those in the audience.

0:01:51 Who don't know who you are, would you mind introducing yourself?

0:01:54 Sure.

0:01:55 I'm Paul Butler.

0:01:56 I'm a co founder of a company called Jamsocket.

0:01:58 the kind of one line pitch is it's like a Lambda, but for WebSockets.

0:02:03 Yeah.

0:02:04 I've been looking a little bit into Jamsocket and it's

0:02:06 like, looks really fascinating.

0:02:08 And I also want to hear more about The origin story where it's coming from,

0:02:13 since by now we got more and more sort of like infrastructural options.

0:02:17 Obviously there's like CloudFlare with their primitives,

0:02:21 CloudFlare workers, et cetera.

0:02:22 And I think with Jamsocket, you provide a really powerful alternative.

0:02:27 For high scale applications.

0:02:30 So yeah, before we go into more into depth, what Jamsocket is and what

0:02:35 it offers, would you mind sharing a bit more of the origin story,

0:02:38 how you ended up working on it?

0:02:41 Jamsocket Origin Story

0:02:41 Sure.

0:02:41 Yeah.

0:02:41 so I started, with my co founder started the company about three years ago.

0:02:45 prior to that, I was working in finance and I was doing a lot of building a lot of

0:02:49 internal tools for myself, my team, that.

0:02:52 We're dealing with midsize amounts of data, so talking about like single digit,

0:02:56 double digit, gigabytes of data, not anything that was out of realm of putting

0:03:02 in RAM for a desktop application, but I realized that as soon as people wanted

0:03:06 these things to be delivered through the browser, there was really nowhere

0:03:09 to put that data, that I couldn't really load that over the Internet into the

0:03:13 browser, Chrome would just give up, didn't really make sense to load that

0:03:17 into kind of a flask server or something like that, because the web stack is

0:03:21 kind of built for these servers to not consume a lot of memory for each user

0:03:26 of the application, things like that.

0:03:28 So I really wanted this sort of neutral location and a way for the, I almost

0:03:33 think of it as a way for a browser.

0:03:35 Based application to spin up a server side sub process that

0:03:39 just belongs to that browser tab.

0:03:40 And that when you close that browser tab, that server side process also goes away.

0:03:45 So that was essentially the origin story of Jamsocket.

0:03:48 That makes a lot of sense, but could you motivate a little bit more?

0:03:52 What kind of application I should imagine there?

0:03:54 I've never worked in finance and I think the average web

0:03:57 developer has sort of like.

0:04:00 A list of like 50 Airbnb, items that they want to render.

0:04:05 And then there's like pagination and all of that easily fits like in a

0:04:09 JSON array that you can fetch into like, a single, CRUD, REST API call.

0:04:16 But when you say like single digits, double digit, gigabytes of data, what

0:04:20 sort of data are we dealing with here?

0:04:23 And if you want it to transfer it over the wire into the browser,

0:04:27 what would that even look like?

0:04:28 Would that be sort of like one big JSON blob or can't, yeah, maybe

0:04:33 you can motivate that a bit more.

0:04:35 Yeah, so, one of the motivating examples at the time was, we would run like a

0:04:40 simulation, a back test simulation.

0:04:41 So we have some model that we hypothesize is predictive of of stock returns,

0:04:47 run it back in time and generate a bunch of data could be say on every

0:04:52 five minute increment or even more fine grained than that, over petabytes

0:04:55 and petabytes of, past market data.

0:04:58 and then we get back some gigantic time series data, number of time

0:05:03 series, maybe you have profit and loss over time and record of all

0:05:07 the trades and everything like that.

0:05:08 so we have like massive, not massive, large, like, Gigabyte,

0:05:14 multi gigabyte of time series data, likely in something like Parquet.

0:05:18 that tend to be the best format for that type of thing.

0:05:21 and so over the wire, ideally it would be Parquet or Arrow.

0:05:25 Got it.

0:05:25 And over that sort of data you want to do like.

0:05:28 Based on the user's input through the UI, driving some sort of queries, some sort

0:05:34 of like accumulations to make sense of like what the data is trying to tell us.

0:05:40 Yeah.

0:05:41 It was things like, maybe I want to be able to drill down in the data, in the

0:05:44 client, be able to kind of go from this high level overview of data to kind of.

0:05:51 Looking at specific trades, specific stocks, things like that.

0:05:55 Got it.

0:05:55 And sort of your insight and way to deal with that, fundamental problem where

0:06:01 like ideally you could just like move that big data blob over in front of like

0:06:07 your, browser that you're looking at and then like happily query and compute

0:06:12 away, but that wasn't feasible because like Chrome or another browser has.

0:06:17 Certain limits and the way how you want it to like, cope with that

0:06:22 is to say like, okay, we're going to have like a little companion.

0:06:25 Each browser session has a little companion on some beefy server, which

0:06:30 holds all of that data in memory.

0:06:33 And then there's some sort of like.

0:06:34 Real time wire protocol that helps you do that, like still so fast that

0:06:41 it's, sort of proxying to being local.

0:06:44 Yeah, exactly that.

0:06:45 I kind of think of it as like a somewhere along the spectrum where

0:06:48 you have like there's thin client.

0:06:50 Sort of setups where the server does everything and the client's really

0:06:53 just a dumb display all the way to a full fledged browser based app that

0:06:59 where everything's happening in the browser that can happen in the browser.

0:07:02 I think there's some middle ground where you get.

0:07:05 Next frame latency on almost everything, but maybe in the background

0:07:11 it needs to query some server data and load that in and maybe it

0:07:14 can even approximate client side.

0:07:16 what that next frame will look like, but it's able to sort of do

0:07:18 that in the background as well.

0:07:20 Got it.

0:07:20 So as you face those problems that has led you to get so interested in the

0:07:26 problem that you started to dedicate your next chapter in life to that.

0:07:31 And that led to you building a technology called Plane.

0:07:35 And that was then also the foundation for Jamsocket.

0:07:38 So can you explain a bit more what Plane does and then how it

0:07:45 Plane

0:07:45 Yeah, so, and just to kind of continue on the story.

0:07:48 So my co founder, Taylor was at data dog and had kind of faced

0:07:51 some, some similar problems.

0:07:52 So we got together in 2022.

0:07:55 and the first thing we started working on was, yeah, what became

0:07:58 Plane, which is, it's open source.

0:08:01 and it's a, I think of it as kind of the way that we spin up those processes.

0:08:06 It's kind of the orchestration plane essentially for that type of application.

0:08:11 so what it's responsible for is you kind of give it a pool of computers.

0:08:16 you tell it you want to start a specific process and it will find where on those

0:08:20 on that pool of computers to start that.

0:08:22 But it will also give that process a secure web accessible URL.

0:08:28 so it can give it a host name and a kind of a password, essentially,

0:08:32 then anything on the web, anything on the public Internet.

0:08:35 That can access the web, can use that URL to send and receive

0:08:39 messages from that process.

0:08:42 And as long as there's at least one open connection to that

0:08:45 process, Plane will keep it alive.

0:08:47 And then as soon as there's no more processes, Plane

0:08:50 will start a countdown timer.

0:08:51 And if nothing reconnects, it'll shut that process off.

0:08:54 Got it.

0:08:55 So in terms of use cases, you've clearly motivated that original use case that you

0:09:01 had while working, at a financial company.

0:09:04 are those also the kind of use cases that, you know, mostly face when talking to

0:09:09 people that are interested in Jamsocket or is there a wider set of applications

0:09:14 that Jamsocket is trying to serve.

0:09:17 yeah, largely not that actually hasn't, we haven't seen that many

0:09:20 use cases of kind of wanting to just modify massive data sets or deal with

0:09:25 massive data sets in the browser.

0:09:27 But one of the things we quickly realized was that the infrastructure we were

0:09:31 building had a lot of parallels to how Figma did things, how Google Docs

0:09:35 did things, how a lot of these kind of collaborative applications did things.

0:09:39 And so we decided to kind of lean into the sync engine hosting side of things.

0:09:45 Got it.

0:09:45 And when I'm looking at your, website, among a few other

0:09:49 companies, it looks like Rayon is built also on, on top of Jamsocket.

0:09:54 So I've happened to have seen their launch, I think, a while back.

0:09:57 if I recall correctly, it was sort of like a really interesting Figma esque,

0:10:03 application, I think for architects.

0:10:06 And, yeah, maybe you can share a little bit more about their specific scenario,

0:10:15 Rayon

0:10:15 Yeah, Rayon's one of my favorite use cases.

0:10:17 cause we've really grown with them.

0:10:19 They've been using us since we started the company, essentially.

0:10:22 they were one of the first users on the platform and we've seen them kind of grow

0:10:26 as they launched and, and everything.

0:10:27 essentially the way that they're using us is that we are the data backend for

0:10:32 these documents while they're open.

0:10:34 So I open a document, you open a document, they'll start a server

0:10:39 on Jamsocket for that document.

0:10:41 And as I make edits.

0:10:43 They get pushed up to that server.

0:10:45 They get sent back down to you.

0:10:46 And, that back end is also what's storing the data on S3.

0:10:51 So even if it's single player mode, that Jamsocket server still sits between their

0:10:56 end user and the source of truth on the data source, the durable data source.

0:11:02 Okay.

0:11:02 So you mentioning S3 and a durable data source.

0:11:06 maybe we can take a step back and motivate if someone wants to

0:11:11 build their own, little version of something like Jamsocket or Plane.

0:11:16 how would that look like?

0:11:17 So there seems to be something pretty beefy in the middle that

0:11:22 holds the necessary data in memory.

0:11:25 And as the name, as in memory suggests, that's pretty volatile.

0:11:30 So if someone stumbles over a power cord, that data might be gone.

0:11:34 And, that's also why it needs to stay some more, more durable, something like S3.

0:11:39 So can you walk us through like the rough architecture?

0:11:43 And what were sort of like the insides and deliberate trade offs that went into it?

0:11:49 Yeah, I mean, a common pattern that I see people use with Jamsocket is that the

0:11:53 source of truth for application data will kind of shift as the application is used.

0:11:58 So at rest, the source of truth of the application data is In durable storage

0:12:04 somewhere, usually S3, something like that, where you might want to, might

0:12:09 not want to write to that like 60 times a second, but you want it to

0:12:13 persist when that document is open.

0:12:16 Then that source of truth effectively of that document is in memory on Jamsocket.

0:12:21 And the nice thing about that is it's, you know, it's memory.

0:12:23 You can write to it very frequently.

0:12:25 You can write to it 100 times a second if you want to more than that.

0:12:29 and that can then be synced down to all of the connected clients and then.

0:12:33 In some sort of loop or, you could have some sort of write ahead log, but

0:12:37 as changes are made to that document, you are then durably persisting them.

0:12:41 some people really care about that being really low latency.

0:12:44 I think in general, unless it's really a bad thing for users to lose

0:12:47 like 5 seconds of data that just batching everything up into writing

0:12:53 just the edits every five seconds.

0:12:54 Something like that is pretty reasonable.

0:12:56 or you can, you know, what a lot of people do is they just say 60 seconds is fine.

0:13:00 I'm just going to write the entire document over what existed there before

0:13:04 every 60 seconds because the outage, you know, a server just failing out of the

0:13:10 blue is actually pretty rare these days.

0:13:12 Got it.

0:13:13 So if we compared to, a technology like Cloudflare durable objects, with

0:13:19 Cloudflare workers, that's a particularly distinct programming model where it

0:13:25 kind of gives you kind of Best of both worlds in that regard that you

0:13:28 only pay for the CPU cycles where you actually want the CPU to do things.

0:13:34 And otherwise it can hibernate while still keeping a web socket connection alive, for

0:13:39 example, or keep like some memory alive or rehydrated from some persistent storage.

0:13:46 is that sort of like a useful parallel way to think about

0:13:50 the programming model and also.

0:13:53 can I implement any sort of free web socket messages or request handlers, or is

0:13:59 there a more pre specified API, something like Redis, how I interact with data from

0:14:07 a client to the server and vice versa?

0:14:09 Yeah, good question.

0:14:10 I agree that like, I think durable objects is probably the closest kind of

0:14:14 parallel product out there right now.

0:14:17 when we started this up, durable objects, wasn't really a big thing and had may have

0:14:22 existed, but had a lot of limitations.

0:14:24 like, I think we, we came at things from a very different angle, but kind of

0:14:27 landed in a similar architectural space.

0:14:30 in terms of the servers though, we just, Really host anything that's HTTP.

0:14:35 So, when I talk about it as being for WebSocket servers, I think that

0:14:38 we kind of came at it at an angle of we want this to be the right

0:14:42 model for hosting WebSocket servers.

0:14:44 But, and we do, you know, we sit on the connection.

0:14:48 So we.

0:14:49 Work well with WebSockets where there's a long lived connection,

0:14:52 because then we know not to terminate the server with HTTP requests.

0:14:56 We have to rely a little bit more on heuristics.

0:14:58 we've got that WebSocket connection open.

0:15:00 really, just anything could be Socket.IO could be, your own WebSocket protocol.

0:15:05 we essentially just take a container from our customers that

0:15:07 will serve HTTP on port 8080.

0:15:10 And we expose that to the, the outside web through a proxy that we wrote.

0:15:14 Got it.

0:15:14 So in the specific case of Rayon, did they build their own from scratch sync engine?

0:15:20 Did they leverage any specific off the shelf technology, something like Yjs?

0:15:26 given that Jamsocket advertises as the platform where you build your

0:15:31 own sync engine on top of, maybe you can walk us through by this example.

0:15:36 how I should think about that.

0:15:37 Yeah.

0:15:38 they're one of a number of customers who have kind of built their own

0:15:41 sync engine on top of Jamsocket.

0:15:44 there's not like an SDK that you need to adopt or anything like

0:15:46 that on, on the server side.

0:15:48 It's, you're just writing a web server.

0:15:51 but one of the things that's specific about this model is that.

0:15:54 You are guaranteed by the infrastructure that only one, at most one server is

0:16:00 running per document or however, you want to fragment your kind of space of things,

0:16:06 but, in their case, it's per document.

0:16:08 And so, yeah, you get that guarantee from the system, and then it becomes much

0:16:13 easier to implement your own sync engine.

0:16:14 But we, at least at the Jamsocket level, are not opinionated about how

0:16:19 you actually go about implementing that.

0:16:21 but then you mentioned Y-Sweet.

0:16:22 Yeah, we.

0:16:23 So Rayon does not use Y Suite, but, some of our customers use Y Suite, which is a

0:16:27 Yjs backend that we wrote that we provide.

0:16:29 That's a much more opinionated path if they want to take that.

0:16:32 Got it.

0:16:33 Yeah.

0:16:33 I want to learn a lot more about Y Suite in a moment as well.

0:16:37 But given that you've already mentioned those 2 paths of Y Suite, which is a

0:16:42 off the shelf technology that you're building that basis, on top of Yjs,

0:16:47 which is a very well known, CRDT implementation, probably the most, common

0:16:53 and, longest technology that's out there.

0:16:55 so that being an example for an off the shelf technology.

0:16:59 Rayon, which has built their own sync engine.

0:17:03 you've probably seen many, many, decisions being made where people

0:17:07 choose to use an off the shelf technology or choose to build their own.

0:17:11 which sort of advice would you give to people who are thinking

0:17:14 whether they should buy, or, as an alternative to buying is like

0:17:19 adopting an off the shelf technology.

0:17:22 Building vs Adopting a technology

0:17:22 Yeah, I think that where.

0:17:25 It kind of comes down to for the kind of build versus off the shelf is whether

0:17:29 you want to have business logic live in the sync engine on the server side.

0:17:34 so where I think you generally don't need that is if you want to just

0:17:38 think text documents, things like that, where CRDTs are probably the

0:17:43 best way to do it right now, at least the most off the shelf way to do it.

0:17:48 You can do your own way, but it's sort of a research problem.

0:17:51 where, on the other hand, I think if you have a very simple data

0:17:54 model, but you want to do atomic transactions, you want to have, kind

0:17:57 of an event sourcing type approach.

0:17:59 you want to be able to do things like trees with reparenting and

0:18:04 and some of that Ends up being that you're working against the CRDT.

0:18:08 and in those cases, I think it makes more sense to implement

0:18:12 your own business logic.

0:18:13 The other thing that we see is if maybe you want some change to trigger

0:18:16 some action server side, you want actions to have some side effect.

0:18:19 You want to, maybe some piece of data changes and you want

0:18:21 to insert that into a queue.

0:18:23 So it becomes really nice to have some server side code that

0:18:26 Reacts to changes to the document.

0:18:28 that's another place that we find.

0:18:30 Building your own tends to be really nice because you can just

0:18:34 have that be one server that's responsible both for the sync and,

0:18:38 and for triggering some side effect.

0:18:40 Right so maybe to linger a little bit on that specific point, I think

0:18:45 with, local-first software you have, in this scenario where you build your

0:18:50 own sync engine, you have kind of two, approaches, how to deal with that.

0:18:54 And also for the off the shelf approach, if you use something like Yjs.

0:18:58 so if you build your own, you can basically just wherever you handle the

0:19:02 messages, you can, possibly inspect the messages and see, okay, this

0:19:06 seems to be like a user signup event.

0:19:09 And so here let's send out that confirmation email or something like that.

0:19:15 but another approach could also be that you basically have a server

0:19:19 side client instance that listens to the same sync messages and you.

0:19:25 Based on the state that you have, on that server side client, you

0:19:30 could then basically React to that.

0:19:33 have you thoughts on one approach versus the other?

0:19:37 Maybe, one is like a lot more, expensive to run or, more, complex to model.

0:19:44 What thoughts do you have on the different approaches here?

0:19:47 I think that where I've tended to see this breakdown because we've, we've seen

0:19:50 it both ways and we've seen, we've seen customers do it both ways is that if it's.

0:19:56 Purely just sort of Reacting to a side effect.

0:19:58 And it's something that you want to that your model of it is that it's

0:20:02 like a server triggered type of thing.

0:20:05 Like, if it's that, you know, that send email example, send

0:20:07 some sort of notification.

0:20:09 I think that that makes more sense to just do in the server, just in

0:20:13 terms of architectural complexity.

0:20:15 you could certainly listen for the events.

0:20:16 And if there's architectural reasons that that makes sense for you, I don't see

0:20:20 any problems with it, but where I think that the server being a client can make a

0:20:26 lot of sense is like AI integration type things, where you want the server in this

0:20:32 case that, you know, it's code running on the server, but it, your application

0:20:36 should just treat it like another client.

0:20:38 This is something like maybe an agent's going out and modifying

0:20:40 a document based on some prompt.

0:20:42 Then I think it does make sense if you want to run it through the same

0:20:46 kind of code paths that a user edit would go through, then it makes

0:20:51 sense to, to kind of treat that as a distinct client of the data.

0:20:55 Got it.

0:20:55 So to dig a little bit more and towards that, server as a client, when I'm

0:21:03 thinking more about like a browser client, or like using my, phone,

0:21:08 or there's like a concrete point in time where I'm starting a session.

0:21:14 I'm opening a tab.

0:21:15 I'm opening an app.

0:21:16 I'm doing things afterwards.

0:21:18 Like I'm closing it.

0:21:19 So there's like a concrete start stop.

0:21:22 Maybe there's like some background stuff, but, let's pretend there's just

0:21:26 like a clear start, stop 30 seconds.

0:21:29 And that's it.

0:21:30 how should I think about that in a server context?

0:21:34 let's say I'm trying to offer that to a thousand customers.

0:21:38 Would I have a thousand separate, like, but let's go crazy.

0:21:42 Let's say we have a thousand VMs, one per customer.

0:21:47 that strikes me as very expensive.

0:21:49 So what is like a useful programming model, like a useful deployment model.

0:21:54 To, deploy those sort of server side clients.

0:21:57 so the way that Jamsocket does this is that we run a process

0:22:01 for every service, essentially.

0:22:04 So when, when you and I are connected to a document, we're running a

0:22:07 process, not a full fledged VM, but it's, using some Cisco interception

0:22:11 through something called G visor.

0:22:12 So it's a little bit more secure than sort of just.

0:22:14 Yeah.

0:22:15 Containerized workloads.

0:22:16 so the nice thing about that is that processes are pretty good at giving

0:22:20 resources back to the system when they're, when they're not actively in use.

0:22:24 So we've seen is that when you want the server to kind of first Okay.

0:22:29 Class of interactions where it's sort of definitely want it to

0:22:32 be processed by the service.

0:22:34 in those cases, it makes sense to run directly in the

0:22:36 sync engine when it comes to.

0:22:38 multiple clients, we tend to see those run off of Jamsocket.

0:22:42 So these are running on an end user server talking to Jamsocket and the

0:22:47 pattern that I've seen their work is that client will maybe trigger something

0:22:54 directly through like a web endpoint on that remote server that's not running

0:22:58 on Jamsocket, that server will then talk to Jamsocket to say, fetch some data or,

0:23:03 connect and sort of trigger something.

0:23:05 so it might synchronize data, but it's not, a long live client.

0:23:08 It's kind of a client that spun up based on a specific action.

0:23:12 That's usually triggered by the client.

0:23:14 Got it.

0:23:14 That makes a lot of sense.

0:23:15 So instead of like being super long running, and that's times

0:23:19 and for each possible instance, you make it more event based.

0:23:25 So, let's say there is a new sync message that you want to React

0:23:29 to, or there's like some other.

0:23:31 maybe like a webhook that's coming in from Stripe and then, so you, you do your

0:23:37 thing as a response to the event and, then you go, yield again, back to the runtime.

0:23:44 and I think a model that also comes to mind that could fit

0:23:47 really well together here.

0:23:49 Is, our durable long running workflows, something like Temporal.

0:23:54 And there's also other options as well, I think could work really well

0:23:57 together here that you have a workflow.

0:24:00 That's essentially a participant in a sync system where it's

0:24:05 just a long running workflow.

0:24:07 It's just like another client happens to live on a server and not in a browser.

0:24:12 yeah, I'm, I'm really excited to see more folks explore this since I think it will.

0:24:17 open the door for a whole bunch of different application topologies, really.

0:24:22 One of the things, things that we found with Y Suite is that, we had people ask

0:24:26 for, like, I want a Python client to this.

0:24:28 And it was for exactly that reason.

0:24:30 Like they want to run some server side code that interacts with a document.

0:24:34 same with the node on the node side.

0:24:37 We support kind of the built in WebSocket client in the browser, but

0:24:40 we also support a shimmed in WebSocket client so that you can run it in Node.

0:24:45 Very cool.

0:24:45 Yeah, I'm really looking forward to like, whether it's Python or well,

0:24:50 I'm a native person in JavaScript and JavaScript has this amazing, Aspect to

0:24:55 it that supposedly runs everywhere and we're getting more and more there with

0:25:00 like ESM now, being really, the default.

0:25:03 And, I'm really excited about bringing the same business logic, the same code

0:25:09 to all sorts of different platforms.

0:25:11 And I think sync engines are.

0:25:14 or like a huge lever that gets us closer towards that since like, otherwise

0:25:19 we can have, the code there, but if we don't have the data there, that

0:25:24 is only good for so many use cases.

0:25:27 So maybe.

0:25:28 transitioning towards Y-Sweet, what you've already mentioned.

0:25:32 before we get into what Y-Sweet is, can you share more about

0:25:42 Y-Sweet

0:25:42 Yeah.

0:25:42 so we'd already been working on Jamsocket for a while by the time we started Y-Sweet

0:25:46 and we sort of started to see for one thing, You know, we thought from the

0:25:50 get go that, well, people are going to want to write their own sync engines.

0:25:53 one of the things we saw was that a lot of people were sort of using Yjs

0:25:57 and other CRDTs and running those on Jamsocket and finding advantages, even

0:26:03 though they don't need the authoritative kind of model of Jamsocket that they were

0:26:08 still finding advantages to having that.

0:26:10 so we started thinking like, what would a Yjs server kind of built

0:26:14 to run on Jamsocket look like?

0:26:16 And one of the things that.

0:26:17 It's nice if we're, you know, running a lot of a process is that

0:26:20 it's really memory lightweight.

0:26:22 So we wrote Y-Sweet in Rust and it's pretty memory efficient.

0:26:26 another thing that we became really opinionated about is that you shouldn't

0:26:30 really start document data in a database.

0:26:32 I think it's just a bad fit.

0:26:33 I think with something like a, you know, if you're building something

0:26:37 like Figma, like Figma uses S3.

0:26:40 As where they store the document, they store the document metadata in Postgres

0:26:44 and started to see a lot of use cases of like patterns like that, because if

0:26:49 you're writing the document each document that's open many times a minute, If

0:26:54 you're using a Postgres database, that Postgres database is in the bottleneck.

0:26:57 Every, every edit is coming through that.

0:26:59 Whereas S3 is a more distributed kind of file system where if you

0:27:03 have a server that is the authority of what's in a document at that

0:27:07 given point in time, it can just write to S3 and you can horizontally

0:27:11 scale that out as much as you want.

0:27:13 So we kind of became opinionated about, okay, that should be rust.

0:27:16 It should be lightweight.

0:27:17 It should write to S3.

0:27:18 and it should be, Okay.

0:27:20 As simple as possible to just use, like, I really like software like Caddy, where

0:27:25 it is, which is web server written in Go, if people aren't familiar with it, where

0:27:28 you like that has really sane defaults.

0:27:31 It's somewhat opinionated about just doing things right.

0:27:34 You don't have to

0:27:35 fantastic.

0:27:36 It even gives you like, SL certificates that work locally works with tail scales.

0:27:41 Fantastic.

0:27:42 Definitely check it out.

0:27:43 If you're not using it yet.

0:27:45 Yeah, so Caddy just like simplifies so much and just like does things right.

0:27:50 And so we wanted to build a piece of software that was felt like

0:27:52 that to use, that it was, we wanted something that you could use in a

0:27:56 CICD process and it would be the same API as if you were using it at scale,

0:28:01 horizontally scaled out on the cluster.

0:28:03 so it was like, because the other thing, I mean, the things that we were thinking

0:28:06 about at the time were like, what would an open source document sync engine

0:28:11 look like, if we were to write it from scratch and we kind of kept landing on,

0:28:17 it would look something like, you know, pretty close to Yjs, even if we didn't

0:28:22 have the distributed constraints of Yjs.

0:28:25 So we're like, well, Yjs exists.

0:28:28 It has great community.

0:28:30 Great people involved with it.

0:28:32 this looks like what we would want to build.

0:28:34 So let's just build a sync engine around this.

0:28:37 Got it.

0:28:37 In terms of the, behavior or like what makes it a little bit more like Caddy

0:28:43 in terms of opinionated, but like, very well motivated opinions baked

0:28:48 into it, if you compare it to the Yjs.

0:28:51 Default server, any sort of thing that stands out there where you lean a little

0:28:56 bit more heavy into some opinions?

0:28:59 Yeah, I mean, I think the default Yjs server is built to be very modular

0:29:03 and suit a bunch of use cases The Yjs community in general embraces this

0:29:08 idea of providers where a provider.

0:29:10 So Yjs itself is just a data structure and then providers are what will synchronize

0:29:15 it to another client or synchronize it to a database or things like that.

0:29:19 the kind of official way to do things in the.

0:29:22 Yjs world is to kind of compose a bunch of providers together.

0:29:26 so you might have an index db provider on the client, synchronizing the index db.

0:29:30 You might have a web socket provider, synchronizing to other clients.

0:29:34 And then you might have a database provider on the server.

0:29:37 We wanted to just have a single stack that was kind of our opinionated stack.

0:29:41 So we have an index db implementation on the client.

0:29:45 We have our s3 storage, which we've Decided is, you know, the only storage

0:29:49 that will support will support S3 compatible storage, but it's, it's

0:29:52 ultimately our opinion was object storage is the right way to do storage for this.

0:29:58 and then we have our, our wire protocol as well.

0:30:00 WebSocket.

0:30:01 Got it.

0:30:02 That makes sense.

0:30:02 Yeah.

0:30:03 And I haven't managed yet to.

0:30:05 Have, Kevin Jans here on the podcast, but he happens to also live in Berlin,

0:30:10 and I've just seen him, for the, last local-first meetup that we've done here.

0:30:14 So I think it's, well, about time that we hear from Kevin, about YJS, there's

0:30:20 been, it's been such a rich ecosystem of different things around it, so I

0:30:26 think we gotta make that happen as well.

0:30:28 Yeah, you should.

0:30:29 So I'm actually, I've been procrastinating editing a podcast that I did with Kevin.

0:30:33 so we'll have that soon.

0:30:35 There you go.

0:30:36 we should put it in the show notes.

0:30:38 So, YJS you've built, just, as you've seen that this is a, flavor of Sync

0:30:45 server that can be hosted on, Jamsocket.

0:30:48 So, is my understanding correct that, if I want to use YJS with Y Suite, I can just

0:30:55 deploy that off the shelf on Jamsocket

0:30:59 yeah, so you could deploy that.

0:31:01 We have like a off the shelf offering that deploys it on Jamsocket.

0:31:04 you can run it on your own servers as well.

0:31:07 and it's one of the things we decided was like, regardless of how it's

0:31:10 hosted, it should be the same API.

0:31:12 So we have kind of the, what I call it, the document management API where

0:31:15 you're, you know, create a document, give somebody an access token to that document.

0:31:19 that is sort of just universal, no matter how it's deployed.

0:31:22 Got it, so I think Yjs is one of the most mature options right now for

0:31:28 people who want to build local-first apps, for people who are just, who've

0:31:32 heard it a bunch of times, but maybe haven't yet come around to, fully.

0:31:37 Implement their app using it.

0:31:39 what are questions that people should ask themselves?

0:31:42 Whether Yjs is a useful foundation for the app and in which scenarios

0:31:47 would you say, actually, you probably want to build your own sync engine.

0:31:51 When to choose Yjs

0:31:51 yeah, so I, I think the, one of the first dimensions to think about here is

0:31:54 I see this sort of, there's two worlds.

0:31:56 There's like database.

0:31:58 sync engines and then there's document sync engines.

0:32:01 So for database sync engine, I think of things like Linear, like things

0:32:04 where you have some relational model data, you probably don't

0:32:08 want the client to have all of it.

0:32:10 You kind of have, the client storing some subset of a Database

0:32:14 for each, for each account.

0:32:16 Maybe you're sharing this data across multiple people in that account.

0:32:20 database sync world where there's, Elastic SQL and, zero and, power sync

0:32:25 and kind of a number of players there.

0:32:28 instant DB and triplet and a number of others.

0:32:31 on the document sync side, that's where you kind of have, you're sending

0:32:36 all of the data down to the client.

0:32:38 So you're dealing with kind of the unit of data that gets synced

0:32:41 is in memory size on the browser.

0:32:43 You're not dealing with like a terabyte of data here.

0:32:45 you're not taking a subset of it.

0:32:46 You're synchronizing the entire document.

0:32:48 This would be kind of things like Figma or Google Docs, where

0:32:52 there's a full local copy of.

0:32:54 Some self standing piece of data.

0:32:57 and generically in, Yjs, that's essentially like a

0:33:00 JSON style or JSON shape data.

0:33:03 So things like nested maps, things like nested lists, and

0:33:07 text, and then JSON primitives.

0:33:10 Is it fair to say that, so you've mentioning Figma, Google Docs, if I think

0:33:15 about Figma and Google Docs, there is like a distinct boundary of a document.

0:33:20 So I have a Google Docs document open.

0:33:23 I have a Figma document open.

0:33:25 is it wherever a product experience has sort of like for a given part of

0:33:32 the experience is all centered around a document or tl draw comes to mind?

0:33:38 is that a great fit for embracing the document model and anything that

0:33:44 is more, rich in terms of, like a relational database where you can

0:33:49 just freely join between things.

0:33:51 That's where you would choose the other approach is that's a useful rule of thumb.

0:33:56 Yeah.

0:33:57 I think the words, that you use distinct boundary, I think that's really nails it.

0:34:00 a, if there's kind of like a document with This is like self contained.

0:34:04 It's distinct.

0:34:05 you mentioned TL draw like, and actually, I mean, I think this gets

0:34:08 to another point is that you can use both in the same application.

0:34:13 So TL draw uses zero and their own document sync engine.

0:34:17 Figma has built their own sync engine for both.

0:34:21 and they're distinct sync engines.

0:34:23 They can be used in tandem as well,

0:34:25 right I mean, that gets us to a really interesting, topic more generally, which

0:34:29 is combining multiple sync engines.

0:34:32 And I think for people who've been dabbling in local-first, that might be

0:34:37 more intuitive, but I think for, people who are just very new to, the local-first

0:34:42 space, it's hard enough to wrap your head around, choosing the right sync engine.

0:34:47 Now you're telling us, wait, you should choose multiple.

0:34:51 Can you motivate a little bit more of like, how to think about that?

0:34:55 Choosing multiple Sync Engines

0:34:55 so I think of it as like the app layer and the document layer.

0:34:57 If you have a document based application, there's, you know, if you have a

0:35:00 file viewer, for example, I think of that as app layer, you're not in

0:35:03 a specific document at that moment,

0:35:05 like in Figma where I'm on the home screen and I see my various projects.

0:35:09 Yeah, exactly.

0:35:11 and I think there's nothing that forces that part to be real time synced.

0:35:16 In a lot of cases, I think a traditional Postgres database

0:35:20 goes a long way for that.

0:35:21 and then, but then once you're in the document, that's where I think you, you

0:35:25 do kind of need a sync engine because, it's the type of thing that if you have

0:35:30 two Google Docs open in two different tabs, you expect them to be in sync,

0:35:33 even if you're just a single user.

0:35:35 I think that actually motivates like 98 percent of the value of

0:35:38 local-first is just somebody who has the same document open in two

0:35:41 tabs and they've got 100 tabs open.

0:35:43 I think that that's less of a given expectation these days for like a

0:35:48 project view or something like that.

0:35:50 I think that It's a nice surprise when that is in sync.

0:35:52 And I think it is becoming the status quo, but I think that overall it's.

0:35:57 less of an expectation that, Oh, you might have to refresh your Figma

0:36:00 project, to sort of see the new assets that come up or that kind of thing.

0:36:06 so yeah, but it is, I do think, and there's been a bit of Twitter debate

0:36:10 about this lately, but like whether the same sync engine can handle both.

0:36:14 I think that there are things that you are going to need transactions for, and if you

0:36:18 need transactions, you're going to need a database with a single that is effectively

0:36:21 a single bottleneck on updates.

0:36:23 At the same time, if you have lots of documents, you don't want those documents

0:36:27 to be bottlenecked in a single point.

0:36:29 So I think unless there's a solution that offers both distributed and centralized

0:36:33 with transactions, you kind of need both.

0:36:36 Got it.

0:36:37 So, if you're thinking more about the leaning into the document aspect of

0:36:42 it, or even, when you say like, that something is bottleneck, let's say we

0:36:47 also embrace the, database aspect of it.

0:36:50 Maybe you have different.

0:36:52 Workspaces, and, I think there's still like one aspect of like drawing

0:36:58 boundaries around some body of data, where you say like, Hey, within

0:37:03 that boundary, I care about certain constraints, maybe that there shouldn't

0:37:08 be more than 10 documents ever.

0:37:11 Or maybe you want to enforce some constraints around like

0:37:16 users, access control, et cetera.

0:37:19 can you share any sort of learnings or advice about how

0:37:23 to approach this entire topic?

0:37:25 Like, how do you decide this is a useful boundary about like how data should be

0:37:31 modeled at and fragmented or petitioned.

0:37:39 Boundaries

0:37:40 So I think in general, if it's not obvious what a document

0:37:43 should be in an application, then it's probably the document model

0:37:47 is probably not the right fit.

0:37:49 I think things like Figma where, you know, you're, in a document at a time, like.

0:37:53 You might have a different document in another tab, but

0:37:55 you don't have two documents in the same tab concurrently open.

0:37:59 it's taking up the whole screen.

0:38:01 Like, I think that there's certain heuristics like that, that just

0:38:05 tell you, like, this is definitely a document model application.

0:38:09 Same with Google Drive or Google Docs.

0:38:11 you kind of have one thing.

0:38:12 Open at once,

0:38:13 where would you put Linear?

0:38:15 Since you could, for example, put each Linear issue into its own document.

0:38:21 Why might that be a reasonable approach?

0:38:23 where is this?

0:38:24 Where might have not?

0:38:25 I think I could see.

0:38:28 That being reasonable, if there, if you really care about the tickets themselves

0:38:33 being, you know, multiple people editing a ticket at one time and seeing the text.

0:38:37 And, if you really wanted to make that kind of a first class experience.

0:38:41 But in general, I think that, Linear just screams kind of database approach to me.

0:38:47 although I do, I know they are, I believe using Yjs, for some of the issue.

0:38:51 text now could be wrong, but I think they do use it or a CRDT.

0:38:55 it might be a different CRDT, but I think they're using some sort

0:38:58 of collaborative text editor.

0:39:00 so given that you've seen quite a couple of different customers and products

0:39:05 build their own sync engines, any sort of interesting, almost second order effects

0:39:11 that you've seen there, unexpected things, new challenges that you didn't see in,

0:39:18 in previous applications, things like.

0:39:21 Database migrations or other things, which sort of challenges

0:39:27 Challenges in building Sync Engines

0:39:27 Yeah, I think whenever you're dealing with data on S3, data migrations do

0:39:32 become really interesting because you're not just sort of writing a

0:39:36 database query and issuing an update.

0:39:39 Usually some form of gradual lazy migration.

0:39:43 So it's kind of like the application that's reading the data has to know

0:39:46 how to transition from version one to two and two to three and then

0:39:50 kind of apply those consecutively.

0:39:52 And so that logic tends to linger around in the application for as long

0:39:57 as you have old documents to support.

0:39:59 and I think there's ways to do schema migrations or schema changes that

0:40:04 don't require a migration as well.

0:40:06 Like, I think that the, It was at Google and we, you know, there

0:40:09 were certain rules about what you could do with protocol buffers.

0:40:12 that would ensure that they were always backward and forward compatible.

0:40:16 and so I think, you know, things like a required field always has to be required.

0:40:20 And so.

0:40:21 Deciding being delicate of when you call a field required.

0:40:25 there's certain kind of things you can do at the schema design level and schema

0:40:28 migration or schema change migration level that you can avoid kind of having

0:40:34 to implement any sort of migration.

0:40:36 It can kind of be more access time oriented.

0:40:39 So I think doing that has been where I've seen.

0:40:43 It will be successful with that, in terms of second order effects, I

0:40:46 think kind of goes back to like, once you have the sync server, people are

0:40:50 like, oh, this is now a place where I can trigger this notification or I

0:40:54 can do this check or I can, you know, so I think we've sort of seen these,

0:41:00 these backends kind of grow in scope.

0:41:02 you know, we want that to be first class part of the application that

0:41:04 can do whatever you want it to do.

0:41:07 That makes a lot of sense.

0:41:08 And yeah, I think this is, an area that, has already caused a lot of, headaches,

0:41:14 schema migrations, data migrations in general, but now that we are rethinking

0:41:18 the data architectures at large here, we also need to rethink that part and

0:41:24 like you've mentioned, when you have all the data in a single Postgres database,

0:41:29 then you can at least like apply like your old playbooks there, but now if all

0:41:33 of your data is in an S3 bucket, laid out in whatever way, now you do need a

0:41:39 different new approach to deal with that.

0:41:41 And, That is one way to deal with it, to bake in the migration

0:41:46 logic into your app logic.

0:41:48 But, that is also, I think that also comes with its own downsides.

0:41:52 This way you're like litter some of that code that was once very clear.

0:41:58 and now you make it less clear because you need to account for.

0:42:02 That historical evolution, a project that I want to shout out here is the

0:42:07 project, Cambria by the folks at In I've actually studied this project myself quite

0:42:12 intensively and I've rebuilt it, myself a few times once even on a type level

0:42:17 just to, provide a nice type save API.

0:42:21 Given that the original implementation rather lets you specify those

0:42:25 sort of projection rules in YAML.

0:42:28 But, I've heard some rumors that they're thinking of like rebooting

0:42:31 that project at some point.

0:42:32 So fingers crossed for that.

0:42:34 And yeah, another approach that I'm investigating heavily myself,

0:42:39 given I have my fair share of like.

0:42:42 Database migration traumas, that I tried to remedy with, starting Prisma,

0:42:48 but now I'm, trying a different approach with event sourcing.

0:42:52 Where if you basically split up your documents your database

0:42:58 into a dedicated write model and derive the read model from it.

0:43:02 The core insight here is basically that if you split this up into two

0:43:07 parts, the schema for your read model, that is typically the thing.

0:43:11 That changes orders of magnitudes more often where you have different kind of

0:43:16 queries that you want to do different sort of aggregations and where you want

0:43:21 to maybe change the database layout to make certain queries faster and more

0:43:25 efficient and then the write operations.

0:43:29 Those are much more bound to the domain of when stuff actually happens.

0:43:34 So, and that's changes way less over time.

0:43:37 Like, maybe you want to capture, someone's preference on email,

0:43:42 marketing emails on, on signup, but historically you can way easier say,

0:43:47 like, actually we default to no.

0:43:49 but a user signup event.

0:43:52 Is always valid and way easier to upgrade.

0:43:56 And then you can basically reapply all prior events into the new read

0:44:00 model that you can change very easily.

0:44:03 And you can even have like multiple read models all at once.

0:44:06 So, that is what I'm exploring right now on the umbrella of Livestore.

0:44:12 But, that also comes then requires that rigor to split it

0:44:16 up into a read and write model.

0:44:18 But yeah, curious whether you have thoughts on that.

0:44:20 Yeah, that's really interesting.

0:44:21 I think that event sourcing in general does sort of simplify migrations.

0:44:27 If you're willing to kind of go back over the event source log and regenerate,

0:44:30 because then as long as you represented all of the data that matters, then you

0:44:35 can essentially just add fields as.

0:44:38 As needed to

0:44:39 another problem that emerges in that world is like, if your domain produces

0:44:44 a lot of events, so let's say you build a TL draw and whenever you move

0:44:50 a rectangle, that creates, you could model it in a way that when you let go

0:44:55 of the rectangle that creates an event, but you could even model it in a way.

0:44:59 Where, like whenever the browser registered a new move event, dragging

0:45:03 it can cause 5, 000 events and that can lead to a very long history of events.

0:45:10 So now you gotta keep that in mind as well.

0:45:13 And, whereas in the traditional mixed read and write model approach, you

0:45:18 would basically just overwrite the position and it would not necessarily

0:45:24 cause the database to explode.

0:45:26 because you have too much data.

0:45:27 but yeah, it's all about trade offs that that is like, what

0:45:30 data management is all about.

0:45:32 maybe a slightly different aspect about data that, you've also written about,

0:45:37 which is in regards to encrypting data.

0:45:41 So, you've written a great blog post about that.

0:45:48 Data encryption

0:45:49 Yeah, so this came out of when we were with Y-Sweet.

0:45:52 We wanted to do We wanted to have store the data locally in the

0:45:56 client, at least as an option.

0:45:58 so we looked at the options that were available or, you know, local

0:46:01 storage, indexed db, opfs, origin, private file system, realized that

0:46:06 indexed db was really the kind of the right way to go for this right now.

0:46:10 have high hopes on opfs, but they're still, I mean, they

0:46:14 all kind of have flaws, but.

0:46:16 Index DB is like the best people know the flaws the best, I guess,

0:46:20 and how to work around them.

0:46:21 So, looked at index DB.

0:46:23 But the problem that we found with all of them is that all of them store

0:46:27 the data in plain text, and that's not just a theoretical problem.

0:46:32 There is at least a couple months ago.

0:46:34 Now, there was some, you know, NPM and pie pie modules out there that

0:46:39 would read some application data from these plain text sources.

0:46:43 it's kind of a real problem that people have identified.

0:46:46 And has been exploited.

0:46:48 so we wanted to make sure that we provided an option that at least as,

0:46:51 as best as possible would prevent that.

0:46:54 so we said, okay, well, browsers have web crypto.

0:46:57 We can encrypt all this.

0:46:58 but then there's this problem of where do you store the key?

0:47:01 because you could start on on the server, but then kind of defeats

0:47:04 the purpose if you're offline, of then accessing that data.

0:47:07 So realize that.

0:47:10 don't really have a good way to store a key kind of credential.

0:47:16 we've got like WebAuthn, but WebAuthn is a bit more secure, like, which is where

0:47:21 you have pass keys and things like that.

0:47:23 It's a bit more opinionated.

0:47:24 It uses the operating systems key chain, but it, doesn't really expose that to

0:47:29 you as any sort of low level API that you can store your own secrets in.

0:47:34 What has started happening is that some browsers, particularly Chromium

0:47:37 based browsers, Google Chrome, Edge, Rave, have built in something called

0:47:44 App-Bound encryption, and they're just using this for cookies, but the idea

0:47:48 is that the browser will store, cookies in, you know, on disk as they always

0:47:54 have, but they'll be encrypted on disk, and then the symmetric key to that will

0:47:59 be stored in the, Operating systems keychain and the operating system is set

0:48:05 up to at least in theory, and there's been some vulnerabilities here, too.

0:48:09 But, at least in theory, only give that private key back to

0:48:14 the browser process itself not to another process that attempts to

0:48:18 impersonate, the browser process.

0:48:20 So what we landed on, which was pretty surprising to me, that this was kind

0:48:24 of the best available path right now.

0:48:27 But if you enable local storage, we encrypt it stored in index DB and

0:48:33 then store the key in a cookie and.

0:48:35 Kind of piggyback on that being App-Bound encrypted in at least

0:48:39 in browsers to support it.

0:48:41 That is very interesting.

0:48:42 Yeah, I've been studying, cryptography, particularly in a browser context,

0:48:46 also a bit more for various reasons.

0:48:49 I am, trying to see what would it take to, do the entire, sync.

0:48:55 messages for Livestore, what would it be, for them to be enter and encrypted,

0:49:01 but the hard part is not the encryption, but the hard part is the end to end

0:49:07 where, the various ends own their keys.

0:49:11 And there's a, we should do an entire episode just about that.

0:49:14 what's difficult about it, but, it can all be distilled down to

0:49:18 the hard part about, anything cryptography related as key management.

0:49:23 And you can either around the side of like being a little bit more loose with like

0:49:27 how you manage keys, but that defies a lot of the, purposes and the benefits here.

0:49:32 but then also the, browser makes that really, really tricky because it has very

0:49:38 constrained APIs and historically it's always been rather a web document viewer

0:49:43 than a fully fledged application platform and, we're getting the building blocks.

0:49:49 I mean, you can, use the, web crypto API.

0:49:52 I'm also using the Libsodium projects, compiled to WASM, which is very

0:49:57 powerful and gives you a couple.

0:50:00 of advanced, algorithms, et cetera, that you can use for, symmetric or

0:50:05 asymmetric encryption, signing, et cetera.

0:50:08 and pass keys, I think are also like, a super important foundation.

0:50:13 But, they also get you just so far.

0:50:16 And I think they don't really help you for the encryption as such,

0:50:20 but rather for signing messages.

0:50:23 So I think we're still lacking a few building blocks.

0:50:25 So very excited to hear about this what, what it was again, App-Bound.

0:50:31 App-Bound encryption, so ideally at some point, this goes even beyond cookies that,

0:50:37 this can be applied for other storage mechanisms, but I like the approach

0:50:41 to, basically encrypt it and then you reduce it to the key management problem

0:50:46 and that you put into a cookie, which also, there's another question, which is

0:50:52 what happens if that cookie goes away?

0:50:55 did you figure out a, an answer for that?

0:50:58 we don't.

0:50:58 We just set it to a long expiration, but it's the thinking there was like,

0:51:02 if the user is clearing their cookies on that tab or on that hosting, they

0:51:08 probably want to destroy the data.

0:51:10 And so are they, you know, they want to be logged out.

0:51:13 so we actually saw it as the right thing to do to, bind it.

0:51:17 The other nice thing about that is like, unlike indexed DB cookies can

0:51:21 actually have an expiration date.

0:51:22 So we could set an expiration of a week.

0:51:25 we're still relying on the browser to enforce that, but if the browser enforces

0:51:28 that, and then, you know, two weeks later, that person is fully hacked, including

0:51:33 their operating system key chain, the browser, at least in theory, will have

0:51:36 deleted that private key and then the data that's in IndexedDB will be gone.

0:51:40 So that's actually, funny enough, additional functionality.

0:51:43 It was just incidental to the, to using cookies for that.

0:51:46 Right.

0:51:46 I like this trick a lot and I got to look into it.

0:51:49 One thing to point out still is, you've mentioned that this mechanism is only

0:51:54 available in Chromium browsers anyway, but, cookies and IndexedDB, OPFS, et

0:52:00 cetera, all of that is available in other browsers and namely Safari as well.

0:52:05 One thing that, people find out the hard way about Safari is that it automatically

0:52:11 deletes a user's data after seven days if they haven't visited that website.

0:52:16 So if you're building a fully local-first web experience where

0:52:21 someone, creates some precious data, in Safari and maybe doesn't sync it

0:52:26 yet to somewhere else, go on vacation, come back and poof, the data is gone.

0:52:32 So I think as app builders, we need to be aware of that and

0:52:36 detect, Hey, is this Safari?

0:52:38 And in Safari, make this part of the product experience show sort of like

0:52:42 a message, like, Hey, be careful.

0:52:44 Your data might go away.

0:52:46 There are ways to remedy that.

0:52:48 And, to, for example, if you make the Safari app, a, progressive web

0:52:53 app by adding it to the home screen.

0:52:56 That limitation goes away.

0:52:58 but app builders need to be aware that they can make the app users aware.

0:53:04 it's just something that, I think is important to, note.

0:53:08 Yeah, I think that's an example of a number of cases where the

0:53:11 browsers are just not optimized for local-first apps, unfortunately.

0:53:15 you know, the, I think the ability to just store low level access to the operating

0:53:21 systems key chain is another, where.

0:53:23 Browsers have improved a ton in terms of what they expose of the APIs, but I

0:53:28 think they're still lagging when it comes to that storage and encrypted storage.

0:53:32 Yeah, totally.

0:53:34 So, maybe slightly, moving to another browser related topic.

0:53:39 you've been both through your work, Through your prior role, and

0:53:43 also as part of Jamsocket, you've been dealing with quite a bit

0:53:51 WebAssembly

0:53:51 guess I really wanted.

0:53:53 I wanted to build the company around WebAssembly.

0:53:55 I wanted WebAssembly to take off, particularly like server side,

0:53:59 client side, that kind of having isomorphic client side server

0:54:03 side code would be a big thing.

0:54:05 And I've, I guess, just generally soured on WebAssembly a little bit.

0:54:10 I think that it where I've seen it work really well is when it's

0:54:14 in the application layer and you kind of have an application.

0:54:18 there's a couple examples I like to go to that are like effectively the same model.

0:54:21 the same kind of architecture, Figma.

0:54:24 There's a company called Modify a few others that I'm, I'm blanking on, but the

0:54:28 architecture is essentially a JavaScript UI, with a, webAssembly, WebGL, WebGPU

0:54:36 kind of rendered canvas, behind it, so like Figma, you know, the core engine

0:54:41 is, I believe, in C, talking to WebGL, with Modify, it's in Rust and WebGPU,

0:54:47 but it's literally like, The application is layered that way that on screen,

0:54:52 there is the canvas behind the UI.

0:54:55 They're written in two different languages and they just talk to each other.

0:54:58 so I think that is the most promising architecture that I see for WebAssembly,

0:55:02 where I think it's been harder.

0:55:05 To get right is building something like a library that is ultimately consumed

0:55:09 by JavaScript developers, but written in WebAssembly, I think there's just so much

0:55:15 friction still in the bundling that, I've kind of soured on that as an approach.

0:55:19 Right.

0:55:20 I mean, I agree in that regard that I wish there was already, we'd be further

0:55:25 along with WebAssembly, but I think it's a bit of a chicken egg problem that

0:55:30 we need more inspiring applications.

0:55:33 That makes people feel like, wow, that is possible.

0:55:36 I didn't recognize that this was the web.

0:55:39 it feels so fast.

0:55:40 And I think that is still true and, more true than, than ever

0:55:45 that WebAssembly, I think.

0:55:46 Can unlock whole new experiences.

0:55:49 And there is a few Lighthouse examples like Figma that stand out here.

0:55:53 Also a big shout out, to the folks building Makepad, which is

0:55:58 a super ambitious project, which is, basically the same way as like.

0:56:03 I'm probably going to do it, I don't do it justice by pitching it, but,

0:56:07 I just want to speak to the ambition where it's basically like Unreal Engine

0:56:12 is sort of like it's full engine.

0:56:14 They, they're building their own platform and including, like a, a rendering layer

0:56:20 and, sort of like as a few people think about, think that MakePad is an editor.

0:56:26 No, MakePad has just as an example app.

0:56:30 Build an editor in which they build make pad, which is just so phenomenal.

0:56:34 So, and make pad is just such an incredibly fast app.

0:56:38 So you should definitely check it out, go to make pad.

0:56:41 dev and then press the option key, to see like how the entire code editor expands.

0:56:47 So apps like that get me very excited about what's possible with

0:56:51 WASM, but, they're fully, they're building everything in Rust.

0:56:54 They're fully leaning into everything, there.

0:56:57 And I think the either or, where you want to like, combined one step at a time.

0:57:02 I think that's a. Tooling problem, partially it's also a trade off

0:57:07 problem where if you move a lot of data back and forth between WASM and

0:57:12 JavaScript, that doesn't come for free.

0:57:14 So I think, you got to keep that in mind.

0:57:17 I've seen a few, I think the, the RepliCache folks actually in the past have

0:57:22 written a lot of their stuff in Rust and then moved to JavaScript because of that

0:57:27 boundary crossing being, too expensive.

0:57:30 But, I think not every use case suffers from that problem, but, I want to turn

0:57:35 it around and, invite anyone who is excited about WebAssembly as, seeing

0:57:42 that as an opportunity to make things significantly better, like working

0:57:46 on projects like WasmBindGen or other things, I think the Deno folks are

0:57:52 pushing heavily on that, so I'm seeing this glass half full and I think the

0:57:56 glass is going to get full pretty soon.

0:57:58 Yeah, I think to your point about like, the JavaScript WebAssembly boundary

0:58:03 crossing, and I think that it comes down to just placing that boundary in the right

0:58:07 place when it comes to, applications like the Figma model of sort of JavaScript

0:58:12 front end with, renderer in WebAssembly, make pad is is a great example.

0:58:17 I think of going like all the way in on WebAssembly.

0:58:21 Another one's called Remix.

0:58:22 and I think what's notable about both cases is that to do that

0:58:26 well, they've had to basically be living in the GUI toolkit layer.

0:58:30 Like, they've been writing their own code or adapting a

0:58:33 lot of their own code for it.

0:58:34 So I think that's, Not for the faint of heart.

0:58:36 I think that people who have done it have built amazing software, but what comes

0:58:41 up more often when I talk to people is like they there's a scarcity of rust

0:58:45 developers and they want to optimize the rust developers to working on kind of the

0:58:50 engine component and then be able to hire React developers and Svelte developers and

0:58:56 kind of front end web developers to work on the GUI where it may not be Like, you

0:59:03 know, think about Figma's UI components.

0:59:04 Like they're not super performance sensitive in the way that the canvas is.

0:59:08 Yeah, totally.

0:59:09 I think it just takes, some bold thinkers and this is not something where you're

0:59:14 gonna rebuild the world in two weeks, this is really something you gotta,

0:59:19 put in the five, 10 years possibly.

0:59:22 to really build something phenomenal, but I think the, rewards are massive and,

0:59:28 I'm really looking forward to getting kind of alternatives to something like

0:59:33 React that provide different trade offs and that allow you to build like really,

0:59:37 really high performance applications and fundamentally React biases towards

0:59:41 simplicity and biases towards that you can, Prevent, not so experienced

0:59:47 engineers to, hurt themselves or others and drag the application down.

0:59:53 But I think there's a different, trade off space as well, where you bias more

0:59:57 towards performance and you need to know a little bit more what you're doing.

1:00:01 And particularly now with AI being on the horizon, I think we can rethink

1:00:06 a lot of trade offs significantly where engineering team sizes, maybe,

1:00:11 get reduced as well, but that's the topic for another conversation.

1:00:16 But, related though, in regards to AI, you've recently also launched a new

1:00:21 project, that is certainly adjacent to AI.

1:00:25 It's called ForeverVM.

1:00:29 ForeverVM

1:00:29 Yeah.

1:00:30 So pretty much from the beginning with Jamsocket, one of the ways we've seen

1:00:32 people use it because we run these sandbox processes on demand is people

1:00:37 have run LLM generated code in them.

1:00:39 actually.

1:00:40 Going back to the beginning, it wasn't even LLM generated.

1:00:42 This was sort of pre chat GPT.

1:00:44 it was things like Jupyter notebooks, but over time we see

1:00:47 more and more LLM generated code.

1:00:50 and it's, you know, it's good.

1:00:51 I think we're, we're like competitive with other products for that, but we

1:00:54 kind of realized, first of all, we're not really positioning the product that way.

1:00:58 but also.

1:00:59 We're not building the product necessarily to be like the best

1:01:02 for that from first principles.

1:01:04 Like if we were just say, like, I want an LLM to be able to execute code.

1:01:08 What would that look like from first principles?

1:01:10 And we kind of thought, well, we don't really care about the session.

1:01:13 We don't really care about, you know, we want it to From the LLM's point of

1:01:17 view, feel like it can always run code.

1:01:19 It doesn't have to start a sandbox and stop a sandbox when it's done to cut

1:01:22 down on costs and things like that.

1:01:24 We, we kind of like cut out the rest of it, make that into the abstraction and

1:01:29 build it, frankly, into something that we can position for those products so

1:01:32 that we're not confusing people who are like, I thought you did sync engines.

1:01:36 now you're telling me running AI code and it's like, architecturally, they

1:01:40 can actually be fairly similar, but, we wanted to build a product around that.

1:01:43 So We have forever VM, it is.

1:01:46 Way to think about it.

1:01:47 It's like an API that runs Python code in a unbounded session.

1:01:52 so by that, I mean, if you, kind of make an API call and get a machine

1:01:56 ID, maybe ABC 123, you can run instructions on that machine set a

1:02:02 equals three or something like that.

1:02:03 Two years from now, if you kept that machine around, you can query

1:02:07 the value of a, you know, a plus five, and then you get back a value.

1:02:12 and the way we're doing that behind the scenes is using, memory snapshotting

1:02:16 of the underlying Python process.

1:02:19 So we kind of from the ground up architected the whole system

1:02:23 around this and it's kind of neat

1:02:24 fascinating.

1:02:25 Yeah.

1:02:26 My mind is also going to other technologies like mentioned Temporal

1:02:30 before, but there's also really fascinating project called Golem VM,

1:02:35 which I think is also, Also employing some really interesting tricks, to use WASM

1:02:41 and knowledge about, the WASM memory to make sort of checkpoints where you can

1:02:47 restore and resume computation or retry.

1:02:51 And yeah, I love that, Yeah, we, we get some bolder ideas out there.

1:02:56 and particularly now when there is the cost of writing code has come

1:03:02 down so much and, now it's also people write that code who know even

1:03:07 less about whether it's good or not.

1:03:09 So we need to put it into boxes that are somewhat blast safe.

1:03:15 but also long, like durable in a way that doesn't break the bank.

1:03:19 And I love how that is like an entirely different product, but yet leverages all

1:03:24 the benefits and all the, foundations that he's built with Jamsocket, or with,

1:03:29 I guess with Plain for that matter.

1:03:31 That is very, very cool.

1:03:33 Yeah, thanks.

1:03:34 one of the things that's been really cool to see is that if we give an LLM

1:03:39 the ability to write this code and get responses back very quickly, like kind

1:03:42 of just treat it as a local-repl, that the AIs can kind of do more like they get

1:03:48 that fast feedback loop and they can make mistakes and correct them almost faster

1:03:52 than, and in some cases we've observed them doing this faster than a reasoning

1:03:56 model could kind of just generate the right code in the first place.

1:04:00 Outro

1:04:00 Nice.

1:04:02 Any other things that you would like to share with the audience?

1:04:06 if you want to find me online, I'm, paulgb on Twitter or X and paulbutler.

1:04:11 org on BlueSky.

1:04:13 jamsocket.

1:04:13 com is the site, jamsockethq on Twitter.

1:04:16 also on BlueSky is jamsocket.

1:04:18 com.

1:04:19 and, yeah, forevervm.

1:04:20 com is the product we were just talking about.

1:04:22 Perfect.

1:04:23 We're going to put links to all of those things in the show notes.

1:04:27 Paul, thank you so much for coming on the show today.

1:04:29 I've learned a lot about so many different topics and yeah, really enjoyed it.

1:04:34 Thank you.

1:04:35 Thank you, Johannes.

1:04:36 And really looking forward to seeing you at local-first in Berlin this year.

1:04:40 Perfect.

1:04:40 See you then.

1:04:41 See you then.

1:04:42 Thank you for listening to the localfirst.fm podcast.

1:04:45 If you've enjoyed this episode and haven't done so already, please

1:04:48 subscribe and leave a review.

1:04:50 Please also share this episode with your friends and colleagues.

1:04:53 Spreading the word about the podcast is a great way to support

1:04:56 it and to help me keep it going.

1:04:58 A special thanks again to Jazz for supporting this podcast.