localfirst.fm
All episodes
March 25, 2025

#22 – Paul Butler: Jamsocket

#22 – Paul Butler: Jamsocket
Sponsored byElectricSQLJazz
Show notes

Transcript

0:00:00 Intro
0:00:00 There's database.
0:00:01 sync engines and then there's document sync engines.
0:00:04 So for database sync engine, I think of things like Linear, like things
0:00:07 where you have some relational model data, you probably don't
0:00:11 want the client to have all of it.
0:00:12 You kind of have, the client storing some subset of a database
0:00:16 for each, for each account.
0:00:18 Maybe you're sharing this data across multiple people in that account.
0:00:22 On the document sync side you're sending all of the data down to the client.
0:00:27 The unit of data that gets synced is in memory size on the browser.
0:00:31 You're not dealing with like a terabyte of data here.
0:00:33 you're not taking a subset of it.
0:00:34 You're synchronizing the entire document.
0:00:36 This would be kind of things like Figma or Google Docs, where there's a full local
0:00:41 copy of some self standing piece of data.
0:00:45 Welcome to the localfirst.fm podcast.
0:00:47 I'm your host, Johannes Schickling and I'm a web developer, a startup founder, and
0:00:51 love the craft of software engineering.
0:00:53 For the past few years, I've been on a journey to build a modern, high quality
0:00:57 music app using web technologies, and in doing so, I've been falling down the
0:01:01 rabbit hole of local-first software.
0:01:04 This podcast is your invitation to join me on that journey.
0:01:07 In this episode, I'm speaking to Paul Butler, founder of Jamsocket,
0:01:11 and creator of the Y-Sweet Project.
0:01:14 In this conversation, we talk about building versus buying a sync engine
0:01:17 and explore the various projects behind Jamsocket, including Plane, Y-Sweet,
0:01:22 and Forever VM Before getting started, also a big thank you to ElectricSQL
0:01:28 and Jazz for supporting this podcast.
0:01:30 And now my interview with Paul.
0:01:33 Hey, Paul, so nice to have you on the podcast.
0:01:35 How are you doing?
0:01:37 I'm good.
0:01:37 Thank you, Johannes.
0:01:38 I'm excited to be here and been listening since the beginning.
0:01:41 Thank you so much.
0:01:42 The two of us had the pleasure to meet in person at last year's
0:01:45 local-first conf, and I'm hoping to see you there again this year.
0:01:49 So for those in the audience.
0:01:51 Who don't know who you are, would you mind introducing yourself?
0:01:54 Sure.
0:01:55 I'm Paul Butler.
0:01:56 I'm a co founder of a company called Jamsocket.
0:01:58 the kind of one line pitch is it's like a Lambda, but for WebSockets.
0:02:03 Yeah.
0:02:04 I've been looking a little bit into Jamsocket and it's
0:02:06 like, looks really fascinating.
0:02:08 And I also want to hear more about The origin story where it's coming from,
0:02:13 since by now we got more and more sort of like infrastructural options.
0:02:17 Obviously there's like CloudFlare with their primitives,
0:02:21 CloudFlare workers, et cetera.
0:02:22 And I think with Jamsocket, you provide a really powerful alternative.
0:02:27 For high scale applications.
0:02:30 So yeah, before we go into more into depth, what Jamsocket is and what
0:02:35 it offers, would you mind sharing a bit more of the origin story,
0:02:38 how you ended up working on it?
0:02:41 Jamsocket Origin Story
0:02:41 Sure.
0:02:41 Yeah.
0:02:41 so I started, with my co founder started the company about three years ago.
0:02:45 prior to that, I was working in finance and I was doing a lot of building a lot of
0:02:49 internal tools for myself, my team, that.
0:02:52 We're dealing with midsize amounts of data, so talking about like single digit,
0:02:56 double digit, gigabytes of data, not anything that was out of realm of putting
0:03:02 in RAM for a desktop application, but I realized that as soon as people wanted
0:03:06 these things to be delivered through the browser, there was really nowhere
0:03:09 to put that data, that I couldn't really load that over the Internet into the
0:03:13 browser, Chrome would just give up, didn't really make sense to load that
0:03:17 into kind of a flask server or something like that, because the web stack is
0:03:21 kind of built for these servers to not consume a lot of memory for each user
0:03:26 of the application, things like that.
0:03:28 So I really wanted this sort of neutral location and a way for the, I almost
0:03:33 think of it as a way for a browser.
0:03:35 Based application to spin up a server side sub process that
0:03:39 just belongs to that browser tab.
0:03:40 And that when you close that browser tab, that server side process also goes away.
0:03:45 So that was essentially the origin story of Jamsocket.
0:03:48 That makes a lot of sense, but could you motivate a little bit more?
0:03:52 What kind of application I should imagine there?
0:03:54 I've never worked in finance and I think the average web
0:03:57 developer has sort of like.
0:04:00 A list of like 50 Airbnb, items that they want to render.
0:04:05 And then there's like pagination and all of that easily fits like in a
0:04:09 JSON array that you can fetch into like, a single, CRUD, REST API call.
0:04:16 But when you say like single digits, double digit, gigabytes of data, what
0:04:20 sort of data are we dealing with here?
0:04:23 And if you want it to transfer it over the wire into the browser,
0:04:27 what would that even look like?
0:04:28 Would that be sort of like one big JSON blob or can't, yeah, maybe
0:04:33 you can motivate that a bit more.
0:04:35 Yeah, so, one of the motivating examples at the time was, we would run like a
0:04:40 simulation, a back test simulation.
0:04:41 So we have some model that we hypothesize is predictive of of stock returns,
0:04:47 run it back in time and generate a bunch of data could be say on every
0:04:52 five minute increment or even more fine grained than that, over petabytes
0:04:55 and petabytes of, past market data.
0:04:58 and then we get back some gigantic time series data, number of time
0:05:03 series, maybe you have profit and loss over time and record of all
0:05:07 the trades and everything like that.
0:05:08 so we have like massive, not massive, large, like, Gigabyte,
0:05:14 multi gigabyte of time series data, likely in something like Parquet.
0:05:18 that tend to be the best format for that type of thing.
0:05:21 and so over the wire, ideally it would be Parquet or Arrow.
0:05:25 Got it.
0:05:25 And over that sort of data you want to do like.
0:05:28 Based on the user's input through the UI, driving some sort of queries, some sort
0:05:34 of like accumulations to make sense of like what the data is trying to tell us.
0:05:40 Yeah.
0:05:41 It was things like, maybe I want to be able to drill down in the data, in the
0:05:44 client, be able to kind of go from this high level overview of data to kind of.
0:05:51 Looking at specific trades, specific stocks, things like that.
0:05:55 Got it.
0:05:55 And sort of your insight and way to deal with that, fundamental problem where
0:06:01 like ideally you could just like move that big data blob over in front of like
0:06:07 your, browser that you're looking at and then like happily query and compute
0:06:12 away, but that wasn't feasible because like Chrome or another browser has.
0:06:17 Certain limits and the way how you want it to like, cope with that
0:06:22 is to say like, okay, we're going to have like a little companion.
0:06:25 Each browser session has a little companion on some beefy server, which
0:06:30 holds all of that data in memory.
0:06:33 And then there's some sort of like.
0:06:34 Real time wire protocol that helps you do that, like still so fast that
0:06:41 it's, sort of proxying to being local.
0:06:44 Yeah, exactly that.
0:06:45 I kind of think of it as like a somewhere along the spectrum where
0:06:48 you have like there's thin client.
0:06:50 Sort of setups where the server does everything and the client's really
0:06:53 just a dumb display all the way to a full fledged browser based app that
0:06:59 where everything's happening in the browser that can happen in the browser.
0:07:02 I think there's some middle ground where you get.
0:07:05 Next frame latency on almost everything, but maybe in the background
0:07:11 it needs to query some server data and load that in and maybe it
0:07:14 can even approximate client side.
0:07:16 what that next frame will look like, but it's able to sort of do
0:07:18 that in the background as well.
0:07:20 Got it.
0:07:20 So as you face those problems that has led you to get so interested in the
0:07:26 problem that you started to dedicate your next chapter in life to that.
0:07:31 And that led to you building a technology called Plane.
0:07:35 And that was then also the foundation for Jamsocket.
0:07:38 So can you explain a bit more what Plane does and then how it
0:07:45 Plane
0:07:45 Yeah, so, and just to kind of continue on the story.
0:07:48 So my co founder, Taylor was at data dog and had kind of faced
0:07:51 some, some similar problems.
0:07:52 So we got together in 2022.
0:07:55 and the first thing we started working on was, yeah, what became
0:07:58 Plane, which is, it's open source.
0:08:01 and it's a, I think of it as kind of the way that we spin up those processes.
0:08:06 It's kind of the orchestration plane essentially for that type of application.
0:08:11 so what it's responsible for is you kind of give it a pool of computers.
0:08:16 you tell it you want to start a specific process and it will find where on those
0:08:20 on that pool of computers to start that.
0:08:22 But it will also give that process a secure web accessible URL.
0:08:28 so it can give it a host name and a kind of a password, essentially,
0:08:32 then anything on the web, anything on the public Internet.
0:08:35 That can access the web, can use that URL to send and receive
0:08:39 messages from that process.
0:08:42 And as long as there's at least one open connection to that
0:08:45 process, Plane will keep it alive.
0:08:47 And then as soon as there's no more processes, Plane
0:08:50 will start a countdown timer.
0:08:51 And if nothing reconnects, it'll shut that process off.
0:08:54 Got it.
0:08:55 So in terms of use cases, you've clearly motivated that original use case that you
0:09:01 had while working, at a financial company.
0:09:04 are those also the kind of use cases that, you know, mostly face when talking to
0:09:09 people that are interested in Jamsocket or is there a wider set of applications
0:09:14 that Jamsocket is trying to serve.
0:09:17 yeah, largely not that actually hasn't, we haven't seen that many
0:09:20 use cases of kind of wanting to just modify massive data sets or deal with
0:09:25 massive data sets in the browser.
0:09:27 But one of the things we quickly realized was that the infrastructure we were
0:09:31 building had a lot of parallels to how Figma did things, how Google Docs
0:09:35 did things, how a lot of these kind of collaborative applications did things.
0:09:39 And so we decided to kind of lean into the sync engine hosting side of things.
0:09:45 Got it.
0:09:45 And when I'm looking at your, website, among a few other
0:09:49 companies, it looks like Rayon is built also on, on top of Jamsocket.
0:09:54 So I've happened to have seen their launch, I think, a while back.
0:09:57 if I recall correctly, it was sort of like a really interesting Figma esque,
0:10:03 application, I think for architects.
0:10:06 And, yeah, maybe you can share a little bit more about their specific scenario,
0:10:15 Rayon
0:10:15 Yeah, Rayon's one of my favorite use cases.
0:10:17 cause we've really grown with them.
0:10:19 They've been using us since we started the company, essentially.
0:10:22 they were one of the first users on the platform and we've seen them kind of grow
0:10:26 as they launched and, and everything.
0:10:27 essentially the way that they're using us is that we are the data backend for
0:10:32 these documents while they're open.
0:10:34 So I open a document, you open a document, they'll start a server
0:10:39 on Jamsocket for that document.
0:10:41 And as I make edits.
0:10:43 They get pushed up to that server.
0:10:45 They get sent back down to you.
0:10:46 And, that back end is also what's storing the data on S3.
0:10:51 So even if it's single player mode, that Jamsocket server still sits between their
0:10:56 end user and the source of truth on the data source, the durable data source.
0:11:02 Okay.
0:11:02 So you mentioning S3 and a durable data source.
0:11:06 maybe we can take a step back and motivate if someone wants to
0:11:11 build their own, little version of something like Jamsocket or Plane.
0:11:16 how would that look like?
0:11:17 So there seems to be something pretty beefy in the middle that
0:11:22 holds the necessary data in memory.
0:11:25 And as the name, as in memory suggests, that's pretty volatile.
0:11:30 So if someone stumbles over a power cord, that data might be gone.
0:11:34 And, that's also why it needs to stay some more, more durable, something like S3.
0:11:39 So can you walk us through like the rough architecture?
0:11:43 And what were sort of like the insides and deliberate trade offs that went into it?
0:11:49 Yeah, I mean, a common pattern that I see people use with Jamsocket is that the
0:11:53 source of truth for application data will kind of shift as the application is used.
0:11:58 So at rest, the source of truth of the application data is In durable storage
0:12:04 somewhere, usually S3, something like that, where you might want to, might
0:12:09 not want to write to that like 60 times a second, but you want it to
0:12:13 persist when that document is open.
0:12:16 Then that source of truth effectively of that document is in memory on Jamsocket.
0:12:21 And the nice thing about that is it's, you know, it's memory.
0:12:23 You can write to it very frequently.
0:12:25 You can write to it 100 times a second if you want to more than that.
0:12:29 and that can then be synced down to all of the connected clients and then.
0:12:33 In some sort of loop or, you could have some sort of write ahead log, but
0:12:37 as changes are made to that document, you are then durably persisting them.
0:12:41 some people really care about that being really low latency.
0:12:44 I think in general, unless it's really a bad thing for users to lose
0:12:47 like 5 seconds of data that just batching everything up into writing
0:12:53 just the edits every five seconds.
0:12:54 Something like that is pretty reasonable.
0:12:56 or you can, you know, what a lot of people do is they just say 60 seconds is fine.
0:13:00 I'm just going to write the entire document over what existed there before
0:13:04 every 60 seconds because the outage, you know, a server just failing out of the
0:13:10 blue is actually pretty rare these days.
0:13:12 Got it.
0:13:13 So if we compared to, a technology like Cloudflare durable objects, with
0:13:19 Cloudflare workers, that's a particularly distinct programming model where it
0:13:25 kind of gives you kind of Best of both worlds in that regard that you
0:13:28 only pay for the CPU cycles where you actually want the CPU to do things.
0:13:34 And otherwise it can hibernate while still keeping a web socket connection alive, for
0:13:39 example, or keep like some memory alive or rehydrated from some persistent storage.
0:13:46 is that sort of like a useful parallel way to think about
0:13:50 the programming model and also.
0:13:53 can I implement any sort of free web socket messages or request handlers, or is
0:13:59 there a more pre specified API, something like Redis, how I interact with data from
0:14:07 a client to the server and vice versa?
0:14:09 Yeah, good question.
0:14:10 I agree that like, I think durable objects is probably the closest kind of
0:14:14 parallel product out there right now.
0:14:17 when we started this up, durable objects, wasn't really a big thing and had may have
0:14:22 existed, but had a lot of limitations.
0:14:24 like, I think we, we came at things from a very different angle, but kind of
0:14:27 landed in a similar architectural space.
0:14:30 in terms of the servers though, we just, Really host anything that's HTTP.
0:14:35 So, when I talk about it as being for WebSocket servers, I think that
0:14:38 we kind of came at it at an angle of we want this to be the right
0:14:42 model for hosting WebSocket servers.
0:14:44 But, and we do, you know, we sit on the connection.
0:14:48 So we.
0:14:49 Work well with WebSockets where there's a long lived connection,
0:14:52 because then we know not to terminate the server with HTTP requests.
0:14:56 We have to rely a little bit more on heuristics.
0:14:58 we've got that WebSocket connection open.
0:15:00 really, just anything could be Socket.IO could be, your own WebSocket protocol.
0:15:05 we essentially just take a container from our customers that
0:15:07 will serve HTTP on port 8080.
0:15:10 And we expose that to the, the outside web through a proxy that we wrote.
0:15:14 Got it.
0:15:14 So in the specific case of Rayon, did they build their own from scratch sync engine?
0:15:20 Did they leverage any specific off the shelf technology, something like Yjs?
0:15:26 given that Jamsocket advertises as the platform where you build your
0:15:31 own sync engine on top of, maybe you can walk us through by this example.
0:15:36 how I should think about that.
0:15:37 Yeah.
0:15:38 they're one of a number of customers who have kind of built their own
0:15:41 sync engine on top of Jamsocket.
0:15:44 there's not like an SDK that you need to adopt or anything like
0:15:46 that on, on the server side.
0:15:48 It's, you're just writing a web server.
0:15:51 but one of the things that's specific about this model is that.
0:15:54 You are guaranteed by the infrastructure that only one, at most one server is
0:16:00 running per document or however, you want to fragment your kind of space of things,
0:16:06 but, in their case, it's per document.
0:16:08 And so, yeah, you get that guarantee from the system, and then it becomes much
0:16:13 easier to implement your own sync engine.
0:16:14 But we, at least at the Jamsocket level, are not opinionated about how
0:16:19 you actually go about implementing that.
0:16:21 but then you mentioned Y-Sweet.
0:16:22 Yeah, we.
0:16:23 So Rayon does not use Y Suite, but, some of our customers use Y Suite, which is a
0:16:27 Yjs backend that we wrote that we provide.
0:16:29 That's a much more opinionated path if they want to take that.
0:16:32 Got it.
0:16:33 Yeah.
0:16:33 I want to learn a lot more about Y Suite in a moment as well.
0:16:37 But given that you've already mentioned those 2 paths of Y Suite, which is a
0:16:42 off the shelf technology that you're building that basis, on top of Yjs,
0:16:47 which is a very well known, CRDT implementation, probably the most, common
0:16:53 and, longest technology that's out there.
0:16:55 so that being an example for an off the shelf technology.
0:16:59 Rayon, which has built their own sync engine.
0:17:03 you've probably seen many, many, decisions being made where people
0:17:07 choose to use an off the shelf technology or choose to build their own.
0:17:11 which sort of advice would you give to people who are thinking
0:17:14 whether they should buy, or, as an alternative to buying is like
0:17:19 adopting an off the shelf technology.
0:17:22 Building vs Adopting a technology
0:17:22 Yeah, I think that where.
0:17:25 It kind of comes down to for the kind of build versus off the shelf is whether
0:17:29 you want to have business logic live in the sync engine on the server side.
0:17:34 so where I think you generally don't need that is if you want to just
0:17:38 think text documents, things like that, where CRDTs are probably the
0:17:43 best way to do it right now, at least the most off the shelf way to do it.
0:17:48 You can do your own way, but it's sort of a research problem.
0:17:51 where, on the other hand, I think if you have a very simple data
0:17:54 model, but you want to do atomic transactions, you want to have, kind
0:17:57 of an event sourcing type approach.
0:17:59 you want to be able to do things like trees with reparenting and
0:18:04 and some of that Ends up being that you're working against the CRDT.
0:18:08 and in those cases, I think it makes more sense to implement
0:18:12 your own business logic.
0:18:13 The other thing that we see is if maybe you want some change to trigger
0:18:16 some action server side, you want actions to have some side effect.
0:18:19 You want to, maybe some piece of data changes and you want
0:18:21 to insert that into a queue.
0:18:23 So it becomes really nice to have some server side code that
0:18:26 Reacts to changes to the document.
0:18:28 that's another place that we find.
0:18:30 Building your own tends to be really nice because you can just
0:18:34 have that be one server that's responsible both for the sync and,
0:18:38 and for triggering some side effect.
0:18:40 Right so maybe to linger a little bit on that specific point, I think
0:18:45 with, local-first software you have, in this scenario where you build your
0:18:50 own sync engine, you have kind of two, approaches, how to deal with that.
0:18:54 And also for the off the shelf approach, if you use something like Yjs.
0:18:58 so if you build your own, you can basically just wherever you handle the
0:19:02 messages, you can, possibly inspect the messages and see, okay, this
0:19:06 seems to be like a user signup event.
0:19:09 And so here let's send out that confirmation email or something like that.
0:19:15 but another approach could also be that you basically have a server
0:19:19 side client instance that listens to the same sync messages and you.
0:19:25 Based on the state that you have, on that server side client, you
0:19:30 could then basically React to that.
0:19:33 have you thoughts on one approach versus the other?
0:19:37 Maybe, one is like a lot more, expensive to run or, more, complex to model.
0:19:44 What thoughts do you have on the different approaches here?
0:19:47 I think that where I've tended to see this breakdown because we've, we've seen
0:19:50 it both ways and we've seen, we've seen customers do it both ways is that if it's.
0:19:56 Purely just sort of Reacting to a side effect.
0:19:58 And it's something that you want to that your model of it is that it's
0:20:02 like a server triggered type of thing.
0:20:05 Like, if it's that, you know, that send email example, send
0:20:07 some sort of notification.
0:20:09 I think that that makes more sense to just do in the server, just in
0:20:13 terms of architectural complexity.
0:20:15 you could certainly listen for the events.
0:20:16 And if there's architectural reasons that that makes sense for you, I don't see
0:20:20 any problems with it, but where I think that the server being a client can make a
0:20:26 lot of sense is like AI integration type things, where you want the server in this
0:20:32 case that, you know, it's code running on the server, but it, your application
0:20:36 should just treat it like another client.
0:20:38 This is something like maybe an agent's going out and modifying
0:20:40 a document based on some prompt.
0:20:42 Then I think it does make sense if you want to run it through the same
0:20:46 kind of code paths that a user edit would go through, then it makes
0:20:51 sense to, to kind of treat that as a distinct client of the data.
0:20:55 Got it.
0:20:55 So to dig a little bit more and towards that, server as a client, when I'm
0:21:03 thinking more about like a browser client, or like using my, phone,
0:21:08 or there's like a concrete point in time where I'm starting a session.
0:21:14 I'm opening a tab.
0:21:15 I'm opening an app.
0:21:16 I'm doing things afterwards.
0:21:18 Like I'm closing it.
0:21:19 So there's like a concrete start stop.
0:21:22 Maybe there's like some background stuff, but, let's pretend there's just
0:21:26 like a clear start, stop 30 seconds.
0:21:29 And that's it.
0:21:30 how should I think about that in a server context?
0:21:34 let's say I'm trying to offer that to a thousand customers.
0:21:38 Would I have a thousand separate, like, but let's go crazy.
0:21:42 Let's say we have a thousand VMs, one per customer.
0:21:47 that strikes me as very expensive.
0:21:49 So what is like a useful programming model, like a useful deployment model.
0:21:54 To, deploy those sort of server side clients.
0:21:57 so the way that Jamsocket does this is that we run a process
0:22:01 for every service, essentially.
0:22:04 So when, when you and I are connected to a document, we're running a
0:22:07 process, not a full fledged VM, but it's, using some Cisco interception
0:22:11 through something called G visor.
0:22:12 So it's a little bit more secure than sort of just.
0:22:14 Yeah.
0:22:15 Containerized workloads.
0:22:16 so the nice thing about that is that processes are pretty good at giving
0:22:20 resources back to the system when they're, when they're not actively in use.
0:22:24 So we've seen is that when you want the server to kind of first Okay.
0:22:29 Class of interactions where it's sort of definitely want it to
0:22:32 be processed by the service.
0:22:34 in those cases, it makes sense to run directly in the
0:22:36 sync engine when it comes to.
0:22:38 multiple clients, we tend to see those run off of Jamsocket.
0:22:42 So these are running on an end user server talking to Jamsocket and the
0:22:47 pattern that I've seen their work is that client will maybe trigger something
0:22:54 directly through like a web endpoint on that remote server that's not running
0:22:58 on Jamsocket, that server will then talk to Jamsocket to say, fetch some data or,
0:23:03 connect and sort of trigger something.
0:23:05 so it might synchronize data, but it's not, a long live client.
0:23:08 It's kind of a client that spun up based on a specific action.
0:23:12 That's usually triggered by the client.
0:23:14 Got it.
0:23:14 That makes a lot of sense.
0:23:15 So instead of like being super long running, and that's times
0:23:19 and for each possible instance, you make it more event based.
0:23:25 So, let's say there is a new sync message that you want to React
0:23:29 to, or there's like some other.
0:23:31 maybe like a webhook that's coming in from Stripe and then, so you, you do your
0:23:37 thing as a response to the event and, then you go, yield again, back to the runtime.
0:23:44 and I think a model that also comes to mind that could fit
0:23:47 really well together here.
0:23:49 Is, our durable long running workflows, something like Temporal.
0:23:54 And there's also other options as well, I think could work really well
0:23:57 together here that you have a workflow.
0:24:00 That's essentially a participant in a sync system where it's
0:24:05 just a long running workflow.
0:24:07 It's just like another client happens to live on a server and not in a browser.
0:24:12 yeah, I'm, I'm really excited to see more folks explore this since I think it will.
0:24:17 open the door for a whole bunch of different application topologies, really.
0:24:22 One of the things, things that we found with Y Suite is that, we had people ask
0:24:26 for, like, I want a Python client to this.
0:24:28 And it was for exactly that reason.
0:24:30 Like they want to run some server side code that interacts with a document.
0:24:34 same with the node on the node side.
0:24:37 We support kind of the built in WebSocket client in the browser, but
0:24:40 we also support a shimmed in WebSocket client so that you can run it in Node.
0:24:45 Very cool.
0:24:45 Yeah, I'm really looking forward to like, whether it's Python or well,
0:24:50 I'm a native person in JavaScript and JavaScript has this amazing, Aspect to
0:24:55 it that supposedly runs everywhere and we're getting more and more there with
0:25:00 like ESM now, being really, the default.
0:25:03 And, I'm really excited about bringing the same business logic, the same code
0:25:09 to all sorts of different platforms.
0:25:11 And I think sync engines are.
0:25:14 or like a huge lever that gets us closer towards that since like, otherwise
0:25:19 we can have, the code there, but if we don't have the data there, that
0:25:24 is only good for so many use cases.
0:25:27 So maybe.
0:25:28 transitioning towards Y-Sweet, what you've already mentioned.
0:25:32 before we get into what Y-Sweet is, can you share more about
0:25:42 Y-Sweet
0:25:42 Yeah.
0:25:42 so we'd already been working on Jamsocket for a while by the time we started Y-Sweet
0:25:46 and we sort of started to see for one thing, You know, we thought from the
0:25:50 get go that, well, people are going to want to write their own sync engines.
0:25:53 one of the things we saw was that a lot of people were sort of using Yjs
0:25:57 and other CRDTs and running those on Jamsocket and finding advantages, even
0:26:03 though they don't need the authoritative kind of model of Jamsocket that they were
0:26:08 still finding advantages to having that.
0:26:10 so we started thinking like, what would a Yjs server kind of built
0:26:14 to run on Jamsocket look like?
0:26:16 And one of the things that.
0:26:17 It's nice if we're, you know, running a lot of a process is that
0:26:20 it's really memory lightweight.
0:26:22 So we wrote Y-Sweet in Rust and it's pretty memory efficient.
0:26:26 another thing that we became really opinionated about is that you shouldn't
0:26:30 really start document data in a database.
0:26:32 I think it's just a bad fit.
0:26:33 I think with something like a, you know, if you're building something
0:26:37 like Figma, like Figma uses S3.
0:26:40 As where they store the document, they store the document metadata in Postgres
0:26:44 and started to see a lot of use cases of like patterns like that, because if
0:26:49 you're writing the document each document that's open many times a minute, If
0:26:54 you're using a Postgres database, that Postgres database is in the bottleneck.
0:26:57 Every, every edit is coming through that.
0:26:59 Whereas S3 is a more distributed kind of file system where if you
0:27:03 have a server that is the authority of what's in a document at that
0:27:07 given point in time, it can just write to S3 and you can horizontally
0:27:11 scale that out as much as you want.
0:27:13 So we kind of became opinionated about, okay, that should be rust.
0:27:16 It should be lightweight.
0:27:17 It should write to S3.
0:27:18 and it should be, Okay.
0:27:20 As simple as possible to just use, like, I really like software like Caddy, where
0:27:25 it is, which is web server written in Go, if people aren't familiar with it, where
0:27:28 you like that has really sane defaults.
0:27:31 It's somewhat opinionated about just doing things right.
0:27:34 You don't have to
0:27:35 fantastic.
0:27:36 It even gives you like, SL certificates that work locally works with tail scales.
0:27:41 Fantastic.
0:27:42 Definitely check it out.
0:27:43 If you're not using it yet.
0:27:45 Yeah, so Caddy just like simplifies so much and just like does things right.
0:27:50 And so we wanted to build a piece of software that was felt like
0:27:52 that to use, that it was, we wanted something that you could use in a
0:27:56 CICD process and it would be the same API as if you were using it at scale,
0:28:01 horizontally scaled out on the cluster.
0:28:03 so it was like, because the other thing, I mean, the things that we were thinking
0:28:06 about at the time were like, what would an open source document sync engine
0:28:11 look like, if we were to write it from scratch and we kind of kept landing on,
0:28:17 it would look something like, you know, pretty close to Yjs, even if we didn't
0:28:22 have the distributed constraints of Yjs.
0:28:25 So we're like, well, Yjs exists.
0:28:28 It has great community.
0:28:30 Great people involved with it.
0:28:32 this looks like what we would want to build.
0:28:34 So let's just build a sync engine around this.
0:28:37 Got it.
0:28:37 In terms of the, behavior or like what makes it a little bit more like Caddy
0:28:43 in terms of opinionated, but like, very well motivated opinions baked
0:28:48 into it, if you compare it to the Yjs.
0:28:51 Default server, any sort of thing that stands out there where you lean a little
0:28:56 bit more heavy into some opinions?
0:28:59 Yeah, I mean, I think the default Yjs server is built to be very modular
0:29:03 and suit a bunch of use cases The Yjs community in general embraces this
0:29:08 idea of providers where a provider.
0:29:10 So Yjs itself is just a data structure and then providers are what will synchronize
0:29:15 it to another client or synchronize it to a database or things like that.
0:29:19 the kind of official way to do things in the.
0:29:22 Yjs world is to kind of compose a bunch of providers together.
0:29:26 so you might have an index db provider on the client, synchronizing the index db.
0:29:30 You might have a web socket provider, synchronizing to other clients.
0:29:34 And then you might have a database provider on the server.
0:29:37 We wanted to just have a single stack that was kind of our opinionated stack.
0:29:41 So we have an index db implementation on the client.
0:29:45 We have our s3 storage, which we've Decided is, you know, the only storage
0:29:49 that will support will support S3 compatible storage, but it's, it's
0:29:52 ultimately our opinion was object storage is the right way to do storage for this.
0:29:58 and then we have our, our wire protocol as well.
0:30:00 WebSocket.
0:30:01 Got it.
0:30:02 That makes sense.
0:30:02 Yeah.
0:30:03 And I haven't managed yet to.
0:30:05 Have, Kevin Jans here on the podcast, but he happens to also live in Berlin,
0:30:10 and I've just seen him, for the, last local-first meetup that we've done here.
0:30:14 So I think it's, well, about time that we hear from Kevin, about YJS, there's
0:30:20 been, it's been such a rich ecosystem of different things around it, so I
0:30:26 think we gotta make that happen as well.
0:30:28 Yeah, you should.
0:30:29 So I'm actually, I've been procrastinating editing a podcast that I did with Kevin.
0:30:33 so we'll have that soon.
0:30:35 There you go.
0:30:36 we should put it in the show notes.
0:30:38 So, YJS you've built, just, as you've seen that this is a, flavor of Sync
0:30:45 server that can be hosted on, Jamsocket.
0:30:48 So, is my understanding correct that, if I want to use YJS with Y Suite, I can just
0:30:55 deploy that off the shelf on Jamsocket
0:30:59 yeah, so you could deploy that.
0:31:01 We have like a off the shelf offering that deploys it on Jamsocket.
0:31:04 you can run it on your own servers as well.
0:31:07 and it's one of the things we decided was like, regardless of how it's
0:31:10 hosted, it should be the same API.
0:31:12 So we have kind of the, what I call it, the document management API where
0:31:15 you're, you know, create a document, give somebody an access token to that document.
0:31:19 that is sort of just universal, no matter how it's deployed.
0:31:22 Got it, so I think Yjs is one of the most mature options right now for
0:31:28 people who want to build local-first apps, for people who are just, who've
0:31:32 heard it a bunch of times, but maybe haven't yet come around to, fully.
0:31:37 Implement their app using it.
0:31:39 what are questions that people should ask themselves?
0:31:42 Whether Yjs is a useful foundation for the app and in which scenarios
0:31:47 would you say, actually, you probably want to build your own sync engine.
0:31:51 When to choose Yjs
0:31:51 yeah, so I, I think the, one of the first dimensions to think about here is
0:31:54 I see this sort of, there's two worlds.
0:31:56 There's like database.
0:31:58 sync engines and then there's document sync engines.
0:32:01 So for database sync engine, I think of things like Linear, like things
0:32:04 where you have some relational model data, you probably don't
0:32:08 want the client to have all of it.
0:32:10 You kind of have, the client storing some subset of a Database
0:32:14 for each, for each account.
0:32:16 Maybe you're sharing this data across multiple people in that account.
0:32:20 database sync world where there's, Elastic SQL and, zero and, power sync
0:32:25 and kind of a number of players there.
0:32:28 instant DB and triplet and a number of others.
0:32:31 on the document sync side, that's where you kind of have, you're sending
0:32:36 all of the data down to the client.
0:32:38 So you're dealing with kind of the unit of data that gets synced
0:32:41 is in memory size on the browser.
0:32:43 You're not dealing with like a terabyte of data here.
0:32:45 you're not taking a subset of it.
0:32:46 You're synchronizing the entire document.
0:32:48 This would be kind of things like Figma or Google Docs, where
0:32:52 there's a full local copy of.
0:32:54 Some self standing piece of data.
0:32:57 and generically in, Yjs, that's essentially like a
0:33:00 JSON style or JSON shape data.
0:33:03 So things like nested maps, things like nested lists, and
0:33:07 text, and then JSON primitives.
0:33:10 Is it fair to say that, so you've mentioning Figma, Google Docs, if I think
0:33:15 about Figma and Google Docs, there is like a distinct boundary of a document.
0:33:20 So I have a Google Docs document open.
0:33:23 I have a Figma document open.
0:33:25 is it wherever a product experience has sort of like for a given part of
0:33:32 the experience is all centered around a document or tl draw comes to mind?
0:33:38 is that a great fit for embracing the document model and anything that
0:33:44 is more, rich in terms of, like a relational database where you can
0:33:49 just freely join between things.
0:33:51 That's where you would choose the other approach is that's a useful rule of thumb.
0:33:56 Yeah.
0:33:57 I think the words, that you use distinct boundary, I think that's really nails it.
0:34:00 a, if there's kind of like a document with This is like self contained.
0:34:04 It's distinct.
0:34:05 you mentioned TL draw like, and actually, I mean, I think this gets
0:34:08 to another point is that you can use both in the same application.
0:34:13 So TL draw uses zero and their own document sync engine.
0:34:17 Figma has built their own sync engine for both.
0:34:21 and they're distinct sync engines.
0:34:23 They can be used in tandem as well,
0:34:25 right I mean, that gets us to a really interesting, topic more generally, which
0:34:29 is combining multiple sync engines.
0:34:32 And I think for people who've been dabbling in local-first, that might be
0:34:37 more intuitive, but I think for, people who are just very new to, the local-first
0:34:42 space, it's hard enough to wrap your head around, choosing the right sync engine.
0:34:47 Now you're telling us, wait, you should choose multiple.
0:34:51 Can you motivate a little bit more of like, how to think about that?
0:34:55 Choosing multiple Sync Engines
0:34:55 so I think of it as like the app layer and the document layer.
0:34:57 If you have a document based application, there's, you know, if you have a
0:35:00 file viewer, for example, I think of that as app layer, you're not in
0:35:03 a specific document at that moment,
0:35:05 like in Figma where I'm on the home screen and I see my various projects.
0:35:09 Yeah, exactly.
0:35:11 and I think there's nothing that forces that part to be real time synced.
0:35:16 In a lot of cases, I think a traditional Postgres database
0:35:20 goes a long way for that.
0:35:21 and then, but then once you're in the document, that's where I think you, you
0:35:25 do kind of need a sync engine because, it's the type of thing that if you have
0:35:30 two Google Docs open in two different tabs, you expect them to be in sync,
0:35:33 even if you're just a single user.
0:35:35 I think that actually motivates like 98 percent of the value of
0:35:38 local-first is just somebody who has the same document open in two
0:35:41 tabs and they've got 100 tabs open.
0:35:43 I think that that's less of a given expectation these days for like a
0:35:48 project view or something like that.
0:35:50 I think that It's a nice surprise when that is in sync.
0:35:52 And I think it is becoming the status quo, but I think that overall it's.
0:35:57 less of an expectation that, Oh, you might have to refresh your Figma
0:36:00 project, to sort of see the new assets that come up or that kind of thing.
0:36:06 so yeah, but it is, I do think, and there's been a bit of Twitter debate
0:36:10 about this lately, but like whether the same sync engine can handle both.
0:36:14 I think that there are things that you are going to need transactions for, and if you
0:36:18 need transactions, you're going to need a database with a single that is effectively
0:36:21 a single bottleneck on updates.
0:36:23 At the same time, if you have lots of documents, you don't want those documents
0:36:27 to be bottlenecked in a single point.
0:36:29 So I think unless there's a solution that offers both distributed and centralized
0:36:33 with transactions, you kind of need both.
0:36:36 Got it.
0:36:37 So, if you're thinking more about the leaning into the document aspect of
0:36:42 it, or even, when you say like, that something is bottleneck, let's say we
0:36:47 also embrace the, database aspect of it.
0:36:50 Maybe you have different.
0:36:52 Workspaces, and, I think there's still like one aspect of like drawing
0:36:58 boundaries around some body of data, where you say like, Hey, within
0:37:03 that boundary, I care about certain constraints, maybe that there shouldn't
0:37:08 be more than 10 documents ever.
0:37:11 Or maybe you want to enforce some constraints around like
0:37:16 users, access control, et cetera.
0:37:19 can you share any sort of learnings or advice about how
0:37:23 to approach this entire topic?
0:37:25 Like, how do you decide this is a useful boundary about like how data should be
0:37:31 modeled at and fragmented or petitioned.
0:37:39 Boundaries
0:37:40 So I think in general, if it's not obvious what a document
0:37:43 should be in an application, then it's probably the document model
0:37:47 is probably not the right fit.
0:37:49 I think things like Figma where, you know, you're, in a document at a time, like.
0:37:53 You might have a different document in another tab, but
0:37:55 you don't have two documents in the same tab concurrently open.
0:37:59 it's taking up the whole screen.
0:38:01 Like, I think that there's certain heuristics like that, that just
0:38:05 tell you, like, this is definitely a document model application.
0:38:09 Same with Google Drive or Google Docs.
0:38:11 you kind of have one thing.
0:38:12 Open at once,
0:38:13 where would you put Linear?
0:38:15 Since you could, for example, put each Linear issue into its own document.
0:38:21 Why might that be a reasonable approach?
0:38:23 where is this?
0:38:24 Where might have not?
0:38:25 I think I could see.
0:38:28 That being reasonable, if there, if you really care about the tickets themselves
0:38:33 being, you know, multiple people editing a ticket at one time and seeing the text.
0:38:37 And, if you really wanted to make that kind of a first class experience.
0:38:41 But in general, I think that, Linear just screams kind of database approach to me.
0:38:47 although I do, I know they are, I believe using Yjs, for some of the issue.
0:38:51 text now could be wrong, but I think they do use it or a CRDT.
0:38:55 it might be a different CRDT, but I think they're using some sort
0:38:58 of collaborative text editor.
0:39:00 so given that you've seen quite a couple of different customers and products
0:39:05 build their own sync engines, any sort of interesting, almost second order effects
0:39:11 that you've seen there, unexpected things, new challenges that you didn't see in,
0:39:18 in previous applications, things like.
0:39:21 Database migrations or other things, which sort of challenges
0:39:27 Challenges in building Sync Engines
0:39:27 Yeah, I think whenever you're dealing with data on S3, data migrations do
0:39:32 become really interesting because you're not just sort of writing a
0:39:36 database query and issuing an update.
0:39:39 Usually some form of gradual lazy migration.
0:39:43 So it's kind of like the application that's reading the data has to know
0:39:46 how to transition from version one to two and two to three and then
0:39:50 kind of apply those consecutively.
0:39:52 And so that logic tends to linger around in the application for as long
0:39:57 as you have old documents to support.
0:39:59 and I think there's ways to do schema migrations or schema changes that
0:40:04 don't require a migration as well.
0:40:06 Like, I think that the, It was at Google and we, you know, there
0:40:09 were certain rules about what you could do with protocol buffers.
0:40:12 that would ensure that they were always backward and forward compatible.
0:40:16 and so I think, you know, things like a required field always has to be required.
0:40:20 And so.
0:40:21 Deciding being delicate of when you call a field required.
0:40:25 there's certain kind of things you can do at the schema design level and schema
0:40:28 migration or schema change migration level that you can avoid kind of having
0:40:34 to implement any sort of migration.
0:40:36 It can kind of be more access time oriented.
0:40:39 So I think doing that has been where I've seen.
0:40:43 It will be successful with that, in terms of second order effects, I
0:40:46 think kind of goes back to like, once you have the sync server, people are
0:40:50 like, oh, this is now a place where I can trigger this notification or I
0:40:54 can do this check or I can, you know, so I think we've sort of seen these,
0:41:00 these backends kind of grow in scope.
0:41:02 you know, we want that to be first class part of the application that
0:41:04 can do whatever you want it to do.
0:41:07 That makes a lot of sense.
0:41:08 And yeah, I think this is, an area that, has already caused a lot of, headaches,
0:41:14 schema migrations, data migrations in general, but now that we are rethinking
0:41:18 the data architectures at large here, we also need to rethink that part and
0:41:24 like you've mentioned, when you have all the data in a single Postgres database,
0:41:29 then you can at least like apply like your old playbooks there, but now if all
0:41:33 of your data is in an S3 bucket, laid out in whatever way, now you do need a
0:41:39 different new approach to deal with that.
0:41:41 And, That is one way to deal with it, to bake in the migration
0:41:46 logic into your app logic.
0:41:48 But, that is also, I think that also comes with its own downsides.
0:41:52 This way you're like litter some of that code that was once very clear.
0:41:58 and now you make it less clear because you need to account for.
0:42:02 That historical evolution, a project that I want to shout out here is the
0:42:07 project, Cambria by the folks at In I've actually studied this project myself quite
0:42:12 intensively and I've rebuilt it, myself a few times once even on a type level
0:42:17 just to, provide a nice type save API.
0:42:21 Given that the original implementation rather lets you specify those
0:42:25 sort of projection rules in YAML.
0:42:28 But, I've heard some rumors that they're thinking of like rebooting
0:42:31 that project at some point.
0:42:32 So fingers crossed for that.
0:42:34 And yeah, another approach that I'm investigating heavily myself,
0:42:39 given I have my fair share of like.
0:42:42 Database migration traumas, that I tried to remedy with, starting Prisma,
0:42:48 but now I'm, trying a different approach with event sourcing.
0:42:52 Where if you basically split up your documents your database
0:42:58 into a dedicated write model and derive the read model from it.
0:43:02 The core insight here is basically that if you split this up into two
0:43:07 parts, the schema for your read model, that is typically the thing.
0:43:11 That changes orders of magnitudes more often where you have different kind of
0:43:16 queries that you want to do different sort of aggregations and where you want
0:43:21 to maybe change the database layout to make certain queries faster and more
0:43:25 efficient and then the write operations.
0:43:29 Those are much more bound to the domain of when stuff actually happens.
0:43:34 So, and that's changes way less over time.
0:43:37 Like, maybe you want to capture, someone's preference on email,
0:43:42 marketing emails on, on signup, but historically you can way easier say,
0:43:47 like, actually we default to no.
0:43:49 but a user signup event.
0:43:52 Is always valid and way easier to upgrade.
0:43:56 And then you can basically reapply all prior events into the new read
0:44:00 model that you can change very easily.
0:44:03 And you can even have like multiple read models all at once.
0:44:06 So, that is what I'm exploring right now on the umbrella of Livestore.
0:44:12 But, that also comes then requires that rigor to split it
0:44:16 up into a read and write model.
0:44:18 But yeah, curious whether you have thoughts on that.
0:44:20 Yeah, that's really interesting.
0:44:21 I think that event sourcing in general does sort of simplify migrations.
0:44:27 If you're willing to kind of go back over the event source log and regenerate,
0:44:30 because then as long as you represented all of the data that matters, then you
0:44:35 can essentially just add fields as.
0:44:38 As needed to
0:44:39 another problem that emerges in that world is like, if your domain produces
0:44:44 a lot of events, so let's say you build a TL draw and whenever you move
0:44:50 a rectangle, that creates, you could model it in a way that when you let go
0:44:55 of the rectangle that creates an event, but you could even model it in a way.
0:44:59 Where, like whenever the browser registered a new move event, dragging
0:45:03 it can cause 5, 000 events and that can lead to a very long history of events.
0:45:10 So now you gotta keep that in mind as well.
0:45:13 And, whereas in the traditional mixed read and write model approach, you
0:45:18 would basically just overwrite the position and it would not necessarily
0:45:24 cause the database to explode.
0:45:26 because you have too much data.
0:45:27 but yeah, it's all about trade offs that that is like, what
0:45:30 data management is all about.
0:45:32 maybe a slightly different aspect about data that, you've also written about,
0:45:37 which is in regards to encrypting data.
0:45:41 So, you've written a great blog post about that.
0:45:48 Data encryption
0:45:49 Yeah, so this came out of when we were with Y-Sweet.
0:45:52 We wanted to do We wanted to have store the data locally in the
0:45:56 client, at least as an option.
0:45:58 so we looked at the options that were available or, you know, local
0:46:01 storage, indexed db, opfs, origin, private file system, realized that
0:46:06 indexed db was really the kind of the right way to go for this right now.
0:46:10 have high hopes on opfs, but they're still, I mean, they
0:46:14 all kind of have flaws, but.
0:46:16 Index DB is like the best people know the flaws the best, I guess,
0:46:20 and how to work around them.
0:46:21 So, looked at index DB.
0:46:23 But the problem that we found with all of them is that all of them store
0:46:27 the data in plain text, and that's not just a theoretical problem.
0:46:32 There is at least a couple months ago.
0:46:34 Now, there was some, you know, NPM and pie pie modules out there that
0:46:39 would read some application data from these plain text sources.
0:46:43 it's kind of a real problem that people have identified.
0:46:46 And has been exploited.
0:46:48 so we wanted to make sure that we provided an option that at least as,
0:46:51 as best as possible would prevent that.
0:46:54 so we said, okay, well, browsers have web crypto.
0:46:57 We can encrypt all this.
0:46:58 but then there's this problem of where do you store the key?
0:47:01 because you could start on on the server, but then kind of defeats
0:47:04 the purpose if you're offline, of then accessing that data.
0:47:07 So realize that.
0:47:10 don't really have a good way to store a key kind of credential.
0:47:16 we've got like WebAuthn, but WebAuthn is a bit more secure, like, which is where
0:47:21 you have pass keys and things like that.
0:47:23 It's a bit more opinionated.
0:47:24 It uses the operating systems key chain, but it, doesn't really expose that to
0:47:29 you as any sort of low level API that you can store your own secrets in.
0:47:34 What has started happening is that some browsers, particularly Chromium
0:47:37 based browsers, Google Chrome, Edge, Rave, have built in something called
0:47:44 App-Bound encryption, and they're just using this for cookies, but the idea
0:47:48 is that the browser will store, cookies in, you know, on disk as they always
0:47:54 have, but they'll be encrypted on disk, and then the symmetric key to that will
0:47:59 be stored in the, Operating systems keychain and the operating system is set
0:48:05 up to at least in theory, and there's been some vulnerabilities here, too.
0:48:09 But, at least in theory, only give that private key back to
0:48:14 the browser process itself not to another process that attempts to
0:48:18 impersonate, the browser process.
0:48:20 So what we landed on, which was pretty surprising to me, that this was kind
0:48:24 of the best available path right now.
0:48:27 But if you enable local storage, we encrypt it stored in index DB and
0:48:33 then store the key in a cookie and.
0:48:35 Kind of piggyback on that being App-Bound encrypted in at least
0:48:39 in browsers to support it.
0:48:41 That is very interesting.
0:48:42 Yeah, I've been studying, cryptography, particularly in a browser context,
0:48:46 also a bit more for various reasons.
0:48:49 I am, trying to see what would it take to, do the entire, sync.
0:48:55 messages for Livestore, what would it be, for them to be enter and encrypted,
0:49:01 but the hard part is not the encryption, but the hard part is the end to end
0:49:07 where, the various ends own their keys.
0:49:11 And there's a, we should do an entire episode just about that.
0:49:14 what's difficult about it, but, it can all be distilled down to
0:49:18 the hard part about, anything cryptography related as key management.
0:49:23 And you can either around the side of like being a little bit more loose with like
0:49:27 how you manage keys, but that defies a lot of the, purposes and the benefits here.
0:49:32 but then also the, browser makes that really, really tricky because it has very
0:49:38 constrained APIs and historically it's always been rather a web document viewer
0:49:43 than a fully fledged application platform and, we're getting the building blocks.
0:49:49 I mean, you can, use the, web crypto API.
0:49:52 I'm also using the Libsodium projects, compiled to WASM, which is very
0:49:57 powerful and gives you a couple.
0:50:00 of advanced, algorithms, et cetera, that you can use for, symmetric or
0:50:05 asymmetric encryption, signing, et cetera.
0:50:08 and pass keys, I think are also like, a super important foundation.
0:50:13 But, they also get you just so far.
0:50:16 And I think they don't really help you for the encryption as such,
0:50:20 but rather for signing messages.
0:50:23 So I think we're still lacking a few building blocks.
0:50:25 So very excited to hear about this what, what it was again, App-Bound.
0:50:31 App-Bound encryption, so ideally at some point, this goes even beyond cookies that,
0:50:37 this can be applied for other storage mechanisms, but I like the approach
0:50:41 to, basically encrypt it and then you reduce it to the key management problem
0:50:46 and that you put into a cookie, which also, there's another question, which is
0:50:52 what happens if that cookie goes away?
0:50:55 did you figure out a, an answer for that?
0:50:58 we don't.
0:50:58 We just set it to a long expiration, but it's the thinking there was like,
0:51:02 if the user is clearing their cookies on that tab or on that hosting, they
0:51:08 probably want to destroy the data.
0:51:10 And so are they, you know, they want to be logged out.
0:51:13 so we actually saw it as the right thing to do to, bind it.
0:51:17 The other nice thing about that is like, unlike indexed DB cookies can
0:51:21 actually have an expiration date.
0:51:22 So we could set an expiration of a week.
0:51:25 we're still relying on the browser to enforce that, but if the browser enforces
0:51:28 that, and then, you know, two weeks later, that person is fully hacked, including
0:51:33 their operating system key chain, the browser, at least in theory, will have
0:51:36 deleted that private key and then the data that's in IndexedDB will be gone.
0:51:40 So that's actually, funny enough, additional functionality.
0:51:43 It was just incidental to the, to using cookies for that.
0:51:46 Right.
0:51:46 I like this trick a lot and I got to look into it.
0:51:49 One thing to point out still is, you've mentioned that this mechanism is only
0:51:54 available in Chromium browsers anyway, but, cookies and IndexedDB, OPFS, et
0:52:00 cetera, all of that is available in other browsers and namely Safari as well.
0:52:05 One thing that, people find out the hard way about Safari is that it automatically
0:52:11 deletes a user's data after seven days if they haven't visited that website.
0:52:16 So if you're building a fully local-first web experience where
0:52:21 someone, creates some precious data, in Safari and maybe doesn't sync it
0:52:26 yet to somewhere else, go on vacation, come back and poof, the data is gone.
0:52:32 So I think as app builders, we need to be aware of that and
0:52:36 detect, Hey, is this Safari?
0:52:38 And in Safari, make this part of the product experience show sort of like
0:52:42 a message, like, Hey, be careful.
0:52:44 Your data might go away.
0:52:46 There are ways to remedy that.
0:52:48 And, to, for example, if you make the Safari app, a, progressive web
0:52:53 app by adding it to the home screen.
0:52:56 That limitation goes away.
0:52:58 but app builders need to be aware that they can make the app users aware.
0:53:04 it's just something that, I think is important to, note.
0:53:08 Yeah, I think that's an example of a number of cases where the
0:53:11 browsers are just not optimized for local-first apps, unfortunately.
0:53:15 you know, the, I think the ability to just store low level access to the operating
0:53:21 systems key chain is another, where.
0:53:23 Browsers have improved a ton in terms of what they expose of the APIs, but I
0:53:28 think they're still lagging when it comes to that storage and encrypted storage.
0:53:32 Yeah, totally.
0:53:34 So, maybe slightly, moving to another browser related topic.
0:53:39 you've been both through your work, Through your prior role, and
0:53:43 also as part of Jamsocket, you've been dealing with quite a bit
0:53:51 WebAssembly
0:53:51 guess I really wanted.
0:53:53 I wanted to build the company around WebAssembly.
0:53:55 I wanted WebAssembly to take off, particularly like server side,
0:53:59 client side, that kind of having isomorphic client side server
0:54:03 side code would be a big thing.
0:54:05 And I've, I guess, just generally soured on WebAssembly a little bit.
0:54:10 I think that it where I've seen it work really well is when it's
0:54:14 in the application layer and you kind of have an application.
0:54:18 there's a couple examples I like to go to that are like effectively the same model.
0:54:21 the same kind of architecture, Figma.
0:54:24 There's a company called Modify a few others that I'm, I'm blanking on, but the
0:54:28 architecture is essentially a JavaScript UI, with a, webAssembly, WebGL, WebGPU
0:54:36 kind of rendered canvas, behind it, so like Figma, you know, the core engine
0:54:41 is, I believe, in C, talking to WebGL, with Modify, it's in Rust and WebGPU,
0:54:47 but it's literally like, The application is layered that way that on screen,
0:54:52 there is the canvas behind the UI.
0:54:55 They're written in two different languages and they just talk to each other.
0:54:58 so I think that is the most promising architecture that I see for WebAssembly,
0:55:02 where I think it's been harder.
0:55:05 To get right is building something like a library that is ultimately consumed
0:55:09 by JavaScript developers, but written in WebAssembly, I think there's just so much
0:55:15 friction still in the bundling that, I've kind of soured on that as an approach.
0:55:19 Right.
0:55:20 I mean, I agree in that regard that I wish there was already, we'd be further
0:55:25 along with WebAssembly, but I think it's a bit of a chicken egg problem that
0:55:30 we need more inspiring applications.
0:55:33 That makes people feel like, wow, that is possible.
0:55:36 I didn't recognize that this was the web.
0:55:39 it feels so fast.
0:55:40 And I think that is still true and, more true than, than ever
0:55:45 that WebAssembly, I think.
0:55:46 Can unlock whole new experiences.
0:55:49 And there is a few Lighthouse examples like Figma that stand out here.
0:55:53 Also a big shout out, to the folks building Makepad, which is
0:55:58 a super ambitious project, which is, basically the same way as like.
0:56:03 I'm probably going to do it, I don't do it justice by pitching it, but,
0:56:07 I just want to speak to the ambition where it's basically like Unreal Engine
0:56:12 is sort of like it's full engine.
0:56:14 They, they're building their own platform and including, like a, a rendering layer
0:56:20 and, sort of like as a few people think about, think that MakePad is an editor.
0:56:26 No, MakePad has just as an example app.
0:56:30 Build an editor in which they build make pad, which is just so phenomenal.
0:56:34 So, and make pad is just such an incredibly fast app.
0:56:38 So you should definitely check it out, go to make pad.
0:56:41 dev and then press the option key, to see like how the entire code editor expands.
0:56:47 So apps like that get me very excited about what's possible with
0:56:51 WASM, but, they're fully, they're building everything in Rust.
0:56:54 They're fully leaning into everything, there.
0:56:57 And I think the either or, where you want to like, combined one step at a time.
0:57:02 I think that's a. Tooling problem, partially it's also a trade off
0:57:07 problem where if you move a lot of data back and forth between WASM and
0:57:12 JavaScript, that doesn't come for free.
0:57:14 So I think, you got to keep that in mind.
0:57:17 I've seen a few, I think the, the RepliCache folks actually in the past have
0:57:22 written a lot of their stuff in Rust and then moved to JavaScript because of that
0:57:27 boundary crossing being, too expensive.
0:57:30 But, I think not every use case suffers from that problem, but, I want to turn
0:57:35 it around and, invite anyone who is excited about WebAssembly as, seeing
0:57:42 that as an opportunity to make things significantly better, like working
0:57:46 on projects like WasmBindGen or other things, I think the Deno folks are
0:57:52 pushing heavily on that, so I'm seeing this glass half full and I think the
0:57:56 glass is going to get full pretty soon.
0:57:58 Yeah, I think to your point about like, the JavaScript WebAssembly boundary
0:58:03 crossing, and I think that it comes down to just placing that boundary in the right
0:58:07 place when it comes to, applications like the Figma model of sort of JavaScript
0:58:12 front end with, renderer in WebAssembly, make pad is is a great example.
0:58:17 I think of going like all the way in on WebAssembly.
0:58:21 Another one's called Remix.
0:58:22 and I think what's notable about both cases is that to do that
0:58:26 well, they've had to basically be living in the GUI toolkit layer.
0:58:30 Like, they've been writing their own code or adapting a
0:58:33 lot of their own code for it.
0:58:34 So I think that's, Not for the faint of heart.
0:58:36 I think that people who have done it have built amazing software, but what comes
0:58:41 up more often when I talk to people is like they there's a scarcity of rust
0:58:45 developers and they want to optimize the rust developers to working on kind of the
0:58:50 engine component and then be able to hire React developers and Svelte developers and
0:58:56 kind of front end web developers to work on the GUI where it may not be Like, you
0:59:03 know, think about Figma's UI components.
0:59:04 Like they're not super performance sensitive in the way that the canvas is.
0:59:08 Yeah, totally.
0:59:09 I think it just takes, some bold thinkers and this is not something where you're
0:59:14 gonna rebuild the world in two weeks, this is really something you gotta,
0:59:19 put in the five, 10 years possibly.
0:59:22 to really build something phenomenal, but I think the, rewards are massive and,
0:59:28 I'm really looking forward to getting kind of alternatives to something like
0:59:33 React that provide different trade offs and that allow you to build like really,
0:59:37 really high performance applications and fundamentally React biases towards
0:59:41 simplicity and biases towards that you can, Prevent, not so experienced
0:59:47 engineers to, hurt themselves or others and drag the application down.
0:59:53 But I think there's a different, trade off space as well, where you bias more
0:59:57 towards performance and you need to know a little bit more what you're doing.
1:00:01 And particularly now with AI being on the horizon, I think we can rethink
1:00:06 a lot of trade offs significantly where engineering team sizes, maybe,
1:00:11 get reduced as well, but that's the topic for another conversation.
1:00:16 But, related though, in regards to AI, you've recently also launched a new
1:00:21 project, that is certainly adjacent to AI.
1:00:25 It's called ForeverVM.
1:00:29 ForeverVM
1:00:29 Yeah.
1:00:30 So pretty much from the beginning with Jamsocket, one of the ways we've seen
1:00:32 people use it because we run these sandbox processes on demand is people
1:00:37 have run LLM generated code in them.
1:00:39 actually.
1:00:40 Going back to the beginning, it wasn't even LLM generated.
1:00:42 This was sort of pre chat GPT.
1:00:44 it was things like Jupyter notebooks, but over time we see
1:00:47 more and more LLM generated code.
1:00:50 and it's, you know, it's good.
1:00:51 I think we're, we're like competitive with other products for that, but we
1:00:54 kind of realized, first of all, we're not really positioning the product that way.
1:00:58 but also.
1:00:59 We're not building the product necessarily to be like the best
1:01:02 for that from first principles.
1:01:04 Like if we were just say, like, I want an LLM to be able to execute code.
1:01:08 What would that look like from first principles?
1:01:10 And we kind of thought, well, we don't really care about the session.
1:01:13 We don't really care about, you know, we want it to From the LLM's point of
1:01:17 view, feel like it can always run code.
1:01:19 It doesn't have to start a sandbox and stop a sandbox when it's done to cut
1:01:22 down on costs and things like that.
1:01:24 We, we kind of like cut out the rest of it, make that into the abstraction and
1:01:29 build it, frankly, into something that we can position for those products so
1:01:32 that we're not confusing people who are like, I thought you did sync engines.
1:01:36 now you're telling me running AI code and it's like, architecturally, they
1:01:40 can actually be fairly similar, but, we wanted to build a product around that.
1:01:43 So We have forever VM, it is.
1:01:46 Way to think about it.
1:01:47 It's like an API that runs Python code in a unbounded session.
1:01:52 so by that, I mean, if you, kind of make an API call and get a machine
1:01:56 ID, maybe ABC 123, you can run instructions on that machine set a
1:02:02 equals three or something like that.
1:02:03 Two years from now, if you kept that machine around, you can query
1:02:07 the value of a, you know, a plus five, and then you get back a value.
1:02:12 and the way we're doing that behind the scenes is using, memory snapshotting
1:02:16 of the underlying Python process.
1:02:19 So we kind of from the ground up architected the whole system
1:02:23 around this and it's kind of neat
1:02:24 fascinating.
1:02:25 Yeah.
1:02:26 My mind is also going to other technologies like mentioned Temporal
1:02:30 before, but there's also really fascinating project called Golem VM,
1:02:35 which I think is also, Also employing some really interesting tricks, to use WASM
1:02:41 and knowledge about, the WASM memory to make sort of checkpoints where you can
1:02:47 restore and resume computation or retry.
1:02:51 And yeah, I love that, Yeah, we, we get some bolder ideas out there.
1:02:56 and particularly now when there is the cost of writing code has come
1:03:02 down so much and, now it's also people write that code who know even
1:03:07 less about whether it's good or not.
1:03:09 So we need to put it into boxes that are somewhat blast safe.
1:03:15 but also long, like durable in a way that doesn't break the bank.
1:03:19 And I love how that is like an entirely different product, but yet leverages all
1:03:24 the benefits and all the, foundations that he's built with Jamsocket, or with,
1:03:29 I guess with Plain for that matter.
1:03:31 That is very, very cool.
1:03:33 Yeah, thanks.
1:03:34 one of the things that's been really cool to see is that if we give an LLM
1:03:39 the ability to write this code and get responses back very quickly, like kind
1:03:42 of just treat it as a local-repl, that the AIs can kind of do more like they get
1:03:48 that fast feedback loop and they can make mistakes and correct them almost faster
1:03:52 than, and in some cases we've observed them doing this faster than a reasoning
1:03:56 model could kind of just generate the right code in the first place.
1:04:00 Outro
1:04:00 Nice.
1:04:02 Any other things that you would like to share with the audience?
1:04:06 if you want to find me online, I'm, paulgb on Twitter or X and paulbutler.
1:04:11 org on BlueSky.
1:04:13 jamsocket.
1:04:13 com is the site, jamsockethq on Twitter.
1:04:16 also on BlueSky is jamsocket.
1:04:18 com.
1:04:19 and, yeah, forevervm.
1:04:20 com is the product we were just talking about.
1:04:22 Perfect.
1:04:23 We're going to put links to all of those things in the show notes.
1:04:27 Paul, thank you so much for coming on the show today.
1:04:29 I've learned a lot about so many different topics and yeah, really enjoyed it.
1:04:34 Thank you.
1:04:35 Thank you, Johannes.
1:04:36 And really looking forward to seeing you at local-first in Berlin this year.
1:04:40 Perfect.
1:04:40 See you then.
1:04:41 See you then.
1:04:42 Thank you for listening to the localfirst.fm podcast.
1:04:45 If you've enjoyed this episode and haven't done so already, please
1:04:48 subscribe and leave a review.
1:04:50 Please also share this episode with your friends and colleagues.
1:04:53 Spreading the word about the podcast is a great way to support
1:04:56 it and to help me keep it going.
1:04:58 A special thanks again to Jazz for supporting this podcast.