December 3, 2024

#18 – James Arthur: ElectricSQL, read-path syncing, PGLite

All episodes

December 3, 2024

#18 – James Arthur: ElectricSQL, read-path syncing, PGLite

Transcript

Dowload transcript

0:00:00 Intro

0:00:00 I mean, another thing is like the operational characteristics of the

0:00:02 system, for this type of sync technology.

0:00:05 So comparing HTTP with WebSockets, like WebSockets are stateful, and

0:00:09 you do just keep things in memory.

0:00:11 If you look across most real time systems, They have scalability limits because

0:00:16 you will come to the point where if you have, say, 10, 000 concurrent users,

0:00:19 it's almost like the thing of don't have too many open Postgres connections.

0:00:22 But if you're holding open 10, 000 WebSockets, you may be able to do the

0:00:26 IO efficiently, but you will ultimately be sort of growing that kind of memory

0:00:30 and you'll hit some sort of barrier.

0:00:31 Whereas, with this approach, you can basically offload that

0:00:34 concurrency to the CDN layer.

0:00:37 Welcome to the localfirst.fm podcast.

0:00:39 I'm your host, Johannes Schickling, and I'm a web developer, a

0:00:42 startup founder, and love the craft of software engineering.

0:00:46 For the past few years, I've been on a journey to build a modern, high quality

0:00:50 music app using web technologies.

0:00:52 And in doing so, I've been falling down the rabbit hole of local-first software.

0:00:56 This podcast is your invitation to join me on that journey.

0:01:00 In this episode, I'm speaking to James Arthur.

0:01:03 Founder and CEO of Electric SQL, a Postgres centric sync

0:01:07 engine for local-first apps.

0:01:09 In this conversation, we dive deep into how Electric works and explore

0:01:14 its design decisions, such as read path syncing and using HTTP as a

0:01:18 network layer to improve scalability.

0:01:21 Towards the end, we're also covering PGLite, a new project by Electric

0:01:26 that brings Postgres to Wasm.

0:01:28 Before getting started, a big thank you to Rocicorp and PowerSync

0:01:32 for supporting this podcast.

0:01:34 And now, my interview with James.

0:01:37 Welcome James.

0:01:37 So good to have you on the podcast.

0:01:39 How are you doing?

0:01:40 Great.

0:01:41 Yeah, really good to be here.

0:01:42 Thank you for having me on.

0:01:43 So the two of us know each other for quite a while already.

0:01:47 And to be transparent, the two of us have actually already had quite

0:01:51 a couple of projects together.

0:01:53 The one big one among them is the first Local-First Conference that we

0:01:57 organized together this year in Berlin.

0:01:59 That was a lot of fun.

0:02:00 But for those in the audience who don't know who you are, would

0:02:05 you mind introducing yourself?

0:02:07 So, my name is James Arthur.

0:02:09 I am the CEO and one of the co-founder of Electric SQL.

0:02:14 So, Electric is a Postgres sync engine.

0:02:18 We sync little subsets of data out of Postgres into wherever you

0:02:24 want, like local apps and services.

0:02:26 and we do also have another, project which we developed called PGlite,

0:02:30 which is a lightweight WASM Postgres.

0:02:33 So we can sync out of Postgres in the cloud, into Postgres in the web browser,

0:02:39 or kind of into whatever you want.

0:02:40 Awesome.

0:02:41 So yeah, I want to learn a lot more about Electric as well as PGlite.

0:02:45 Maybe PGlite a little bit towards the end of this conversation.

0:02:49 So Electric, I've seen it a bunch of times.

0:02:53 I've been playing around with it, I think quite a bit last year, but

0:02:59 things seem to also change quite a bit.

0:03:01 Can you walk me through?

0:03:03 What was the history of like the last couple of years as you've been

0:03:08 working on Electric and help me inform the right mental model about

0:03:15 Electric SQL

0:03:15 Yeah, absolutely.

0:03:16 I think like Electric as a project, it started, in a way, building on a bunch

0:03:24 of research advances in distributed systems, CRDTs, transactional calls of

0:03:29 consistency, a bunch of these primitives that a lot of people are building

0:03:33 off in the local-first space, which actually a bunch of people on our team

0:03:38 developed in the kind of research stage.

0:03:41 And we wanted to create a developer tooling and a platform that allowed people

0:03:48 who weren't experts in distributed systems and didn't have PhDs in CRDTs to be able

0:03:53 to harness the same advances and build systems on the same types of guarantees.

0:03:58 So in a way, that's where we started from.

0:04:00 And we started building out on this research base into stronger consistency

0:04:05 models for distributed databases and doing sync, from like a central

0:04:11 cloud database out into whether it's to the edge or to the client.

0:04:16 And then we're a startup.

0:04:17 So like we built a small team and you go through this journey, building a

0:04:21 company of, you have ideas for what's going to be useful and valuable for

0:04:26 people, and you have a sense of sort of where the state of the art is and, what

0:04:29 doesn't exist yet, but as you then go and experiment, you just learn more and more.

0:04:33 And so you work out actually what people need and what

0:04:36 problems you can solve with it.

0:04:38 and so through that journey, we went from starting off thinking we were building

0:04:42 a next generation distributed database to using the replication technology

0:04:48 for that system behind existing open source databases like Postgres, SQLite,

0:04:53 into finding, local-first software as a pattern is really the killer app for

0:04:58 that type of replication technology.

0:05:00 So people looking to build local-first applications because of all of the

0:05:04 benefits around UX, DX, resilience, et cetera, but to do that, you

0:05:09 need this type of sync layer.

0:05:11 and then when we first focused on that, then we tried to build a

0:05:15 very optimal end to end integrated local-first software platform.

0:05:19 So for instance, if people saw Electric as a project, like this time last

0:05:23 year, that's what we were building.

0:05:25 And in a way we just found that we were having to solve too many problems and

0:05:30 there was too much complexity making a kind of optimal one-size-fits-all sort of

0:05:34 magic active active replication system.

0:05:37 We were doing things like, managing the way you did the database migrations

0:05:40 and schema revolution and generating a type safe client and doing the

0:05:44 client side reactivity as well as all this sort of core sync stuff.

0:05:47 So, as you know, there's a lot to that kind of end to end stack.

0:05:51 Because we had wanted to build a system that integrated with people's

0:05:55 existing software, like if you already had software built on Postgres or if

0:05:59 you already had a working stack, like building that sort of full system was

0:06:05 in a way sort of too complex and was difficult to adopt from existing software.

0:06:10 So more recently we have consolidated down on building a much simpler sync engine,

0:06:17 which is more like a composable tool that.

0:06:20 You can run in front of Postgres, any Postgres.

0:06:23 It works with any standard Postgres, any managed Postgres, any data model, any

0:06:28 data types, any extensions that you have.

0:06:30 And it just does this work of basically consuming the logical

0:06:35 replication stream from Postgres.

0:06:37 and then managing the way that the data is fanned out to clients,

0:06:41 doing partial replication.

0:06:42 So, because when you're syncing out, say, if you have

0:06:44 a larger database in the cloud.

0:06:47 And you're syncing out to like an app or a kind of edge service.

0:06:49 You don't want to sync all the data.

0:06:51 We have this sort of model of partial replication.

0:06:54 And basically what we're aiming to do with the sync engine is just make that, as

0:06:58 simple to use as bulletproof as possible.

0:07:01 And we're making it with standard web technologies that make it easy

0:07:06 to use with your existing systems and with your existing stack.

0:07:09 And so we went in a way from this sort of quite ambitious, tightly integrated

0:07:13 end to end local-first software platform to now building more like composable

0:07:18 tools that can be part of a local-first stack that you would assemble yourself

0:07:22 as a developer, that's designed to be.

0:07:25 Easier to adopt for production applications that work

0:07:28 with your existing code.

0:07:29 That makes a lot of sense.

0:07:31 And that definitely resonates with me personally as well, since maybe,

0:07:34 as you know, before I founded Prisma, Prisma actually came as a pivot out of

0:07:40 like a focusing effort from a previous product that was called GraphQL,

0:07:44 which was meant as a more ambitious next generation backend as a service.

0:07:48 Back then there was like Firebase and Parse and so we wanted to build the

0:07:52 next generation of that, but what we found back then in 2016, that, while

0:07:57 we've been making a lot of progress towards that very ambitious, holistic

0:08:02 vision, we had to basically oil, like, multiple oceans all at the same time.

0:08:06 And that takes a lot of time to fully get to all the different

0:08:10 ambitious things that we wanted to.

0:08:12 So the only way forward for us where we felt like, okay, we can actually

0:08:16 serve the kind of use cases that we want to serve in a realistic timeline

0:08:21 was to focus on a particular problem, which is what Prisma eventually became.

0:08:26 And by focusing just on the database tooling part and leaving the other

0:08:31 back-endy things to other people.

0:08:32 And it sounds like what you've been going through with Electric is a very comparable

0:08:36 exercise, like focusing exercise to trying to, from a starting point of

0:08:41 like, let's build the most ambitious, the best local-first stack, like end to

0:08:46 end by focusing more on like, okay, what we figured out where our expertise is,

0:08:52 is around Postgres, is about, existing applications wanting to adopt local-first

0:08:58 ideas, syncing approaches, et cetera.

0:09:01 And that is what now led to the new version of Electric.

0:09:04 did I summarize that correctly?

0:09:06 Yeah, exactly.

0:09:07 Right.

0:09:07 It sounds like a very similar journey.

0:09:09 And I think it's interesting as well that as you focus in and you learn

0:09:13 more about a problem space, you both discover in a way, more of the

0:09:17 complexity in the sort of aspects of it.

0:09:19 So you realize there's actually more challenges to solve in a smaller sort

0:09:23 of part of it or a smaller scope.

0:09:26 And also it's interesting that I think for instance, when we started the

0:09:29 project, I would have thought coming into this as a software developer,

0:09:32 I'd go, Is a read path sync solved?

0:09:34 I'd be like, well, there's quite a lot of read path kind of sync stuff.

0:09:37 You can kind of do this.

0:09:38 There's various real time solutions, but actually as you dig into it, you find

0:09:42 that there's a whole bunch of weaknesses of those solutions and they're actually

0:09:45 hard to adopt or they have silos or they can't handle the data throughput.

0:09:48 And so you realize that actually you don't necessarily need to bite

0:09:53 off all of the more ambitious scope because actually you can deliver

0:09:57 value by doing something simpler.

0:10:00 And I think also for me personally, learning about stewarding this

0:10:03 type of product, understanding that you can build out still towards

0:10:08 that more ambitious objective.

0:10:09 So in the long run, you know, we want to sort of build back a whole bunch

0:10:12 of capabilities into this platform.

0:10:14 probably a sort of loosely coupled kind of composable tools.

0:10:18 So you mentioned the term read path syncing.

0:10:21 Can you elaborate a little bit what that means?

0:10:24 So let's say I have an existing application.

0:10:26 Let's say I've built an API layer at some point.

0:10:29 I have a React front end and I have all of my data sitting in Postgres.

0:10:34 I've been inspired by products such as Linear, et cetera, who seem to

0:10:38 wield a superpower called syncing.

0:10:40 And now I found ElectricSQL, which seems to connect the ingredients

0:10:45 that I already have, such as Postgres and a front end with my

0:10:50 desirable approach, which is syncing.

0:10:52 So how does Electric fit into that?

0:10:55 And what do you mean by.

0:10:59 Read and Write Path Syncing

0:10:59 Yeah.

0:10:59 I mean, the sort of read path and write path when it comes to

0:11:02 sync, the read path is syncing data, like onto the local device.

0:11:06 So it's a bit like kind of data fetching from the server.

0:11:09 And then the write path would be when like a user makes a write, and then

0:11:12 you want to sync that data typically back to the cloud so that's sort

0:11:16 of how we talk about them there.

0:11:19 I think there's something unique about local-first software compared to

0:11:25 more sort of traditional web service systems where you explicitly have

0:11:31 a local copy of the data on device.

0:11:34 And one of the challenges with that is because of course you can just like load

0:11:39 some data from the server and keep it in a cache, but if you do that Then you

0:11:44 immediately actually lose, any information about whether that data is stale.

0:11:49 So say a user goes to a route on your application and then clicks

0:11:54 to go to another route and then comes back to the original one.

0:11:57 So to load that original route, say you did a data fetch, but

0:12:01 now you've navigated back to it.

0:12:02 Can you display that data?

0:12:04 Can you render the route or is the data stale?

0:12:08 And so you have this sort of thing where I don't really know, and you tend to sort

0:12:12 of build systems with like REST APIs and data fetching where you might show the

0:12:15 data and go and try and fetch new data.

0:12:17 but in a way it's that problem of you want the data locally so that your application

0:12:23 code can just talk to it locally and you're not having to code across the

0:12:26 network with local-first software.

0:12:28 But that means that you need a solution to keep the data that is local fresh.

0:12:33 Like you don't want stale data.

0:12:35 And if you build a sort of ad-hoc system.

0:12:38 As we've all done across like many generations of software applications,

0:12:41 it's one of these things where you always end up kind of building some sort

0:12:44 of system to keep the data up to date.

0:12:46 But what you really want is a kind of properly engineered system

0:12:49 that does it systemically for you.

0:12:51 It is really a sort of an aspect of your applications architecture that kind of

0:12:56 can be abstracted away by a sync engine.

0:12:58 And so for us, for this focusing on the read path sync is about saying,

0:13:02 okay, what data should be on the device and let's just keep it.

0:13:06 fresh for you.

0:13:07 And then with the write path, one of the things that we learned through

0:13:11 the project is that there are a lot of valid patterns for handling how, when

0:13:17 you do local writes on the device, how you would get those back to the cloud.

0:13:22 You can do through the database sync, you can do optimistic writes.

0:13:26 You could be happy with online writes and you have different models of

0:13:30 like, can your writes be rejected?

0:13:32 Are they local writes with finality?

0:13:34 Or do you have a server authoritative system where when the write

0:13:37 somehow syncs, it can be rejected and how do you handle that?

0:13:40 And so there's actually a lot of different patterns for those writes,

0:13:43 which are often relatively simple because different applications can

0:13:48 be happy with certain trade offs and you could pick a model like.

0:13:51 Okay.

0:13:51 I'm going to show some optimistic state and make a request to an API server.

0:13:56 And it's fine.

0:13:57 And you get a kind of, you get a local-first, experience with just a

0:14:00 sort of simple model that says, okay, if the write is rejected when it

0:14:03 syncs, then, I'll just sort of roll it back and the user loses that work.

0:14:07 And for many applications, that's fine.

0:14:09 For other applications, you might have a much more complex conflict resolution or

0:14:13 you're trying not to lose local writes and there's different collaborative workloads.

0:14:16 And so.

0:14:17 Building a generic system that can give you a write path that gives you

0:14:21 the best developer experience and user experience for all of those variety of

0:14:25 scenarios is very, very hard, whereas building it on an application by

0:14:28 application basis on the write path is actually often fairly straightforward.

0:14:32 It can be like post your API and use the React use optimistic hook.

0:14:37 And so, with building local-first applications that have both read and

0:14:40 write path with Electric, the idea is that we do this core read path

0:14:45 with partial replication, but then as you're building your application, you

0:14:49 can choose out of a variety, whichever pattern fits your, what you need the

0:14:53 most for sort of how you would choose to get the writes back into the server.

0:14:57 That makes a lot of sense.

0:14:58 So basically the more general purpose.

0:15:01 building block that can be used across a wide range of different applications.

0:15:05 It's actually how you read data, how you distribute the data that you

0:15:09 want to have locally available in your applications that would kind of

0:15:13 replace the API get requests before.

0:15:17 But now what needs to happen in those Put, post, delete requests,

0:15:21 this is where it depends a lot more.

0:15:24 And this is where you basically, what you're arguing is there are different

0:15:28 sort of write patterns that heavily depends on the kind of application.

0:15:32 So that is where you're kind of leaning out.

0:15:34 And previously with Electric, you tried to provide the silver bullet there.

0:15:39 But actually, it's really hard, maybe impossible to find the silver

0:15:43 bullet that applies to all use cases.

0:15:45 However, for the read path, it is very possible to provide a great building

0:15:50 block that works for many use cases.

0:15:52 So, can you provide a bit of a better spectrum of the different write

0:15:56 patterns that you've seen so far?

0:15:58 Maybe map them to canonical applications?

0:16:02 that illustrate those use cases.

0:16:04 And maybe if you know, maybe you can also compare analogies to something

0:16:08 like Automerge, et cetera, which sort of write patterns that would

0:16:14 Read Path use cases

0:16:14 Yeah.

0:16:15 So I think the simplest pattern for writes with an application would be to

0:16:19 just, for instance, send a write to a server and require you to be online.

0:16:24 So, because there's many applications that are happy, for instance, with read

0:16:27 only, like there's a lot of people who are building, data analytics applications,

0:16:31 data visualization, dashboards, et cetera.

0:16:33 And so if you have a sort of read heavy application, then in some cases

0:16:37 it may just be a perfectly valid trade off, not to really deal with the

0:16:40 complexity of say offline writes at all.

0:16:42 But you still have a lot of benefits by having local data on device for the read

0:16:46 path, because all the way you can kind of explore the application and the data is

0:16:50 all just instant and local and resilient, then the sort of simplest pattern to

0:16:56 layer on, support for offline writes.

0:16:59 On top of that as a sort of starting point where imagine that you have like a

0:17:03 standard REST API and you're just doing put and post requests to it as normal is

0:17:08 to add this concept of optimistic state.

0:17:10 So optimistic state is just basically you're saying, okay, I'm going to go and

0:17:14 try and send this write to the API server.

0:17:16 And whilst I do so, I'm going to be optimistic and imagine that

0:17:20 that write is going to succeed.

0:17:22 And in two seconds later, it's going to sync back into the state that I have here.

0:17:25 But in the meantime, I'm going to Add this bit of local optimistic state to

0:17:30 display it immediately to the user, and because in most cases that of happy path

0:17:34 is what happens, then you end up with what just feels like a perfect local-first

0:17:39 experience because it's an instantly displayed local write, and that sort

0:17:43 of data is resolved in the background.

0:17:45 Now, You know, immediately with that, you do then just introduce like a layer

0:17:49 of complexity with like, well, what happens when the write is rejected?

0:17:54 And so you have both the challenge of, for instance, say you stacked up three writes.

0:18:01 Did they depend on each other?

0:18:03 So if one of them is rejected, should you reject all of them?

0:18:06 and different applications and different parts of the application would have

0:18:09 different answers to that question.

0:18:11 In some cases, like it's very simple to just go, if there's any problem with

0:18:14 this optimistic state, just wipe it.

0:18:16 And for instance, like the React use optimistic hook, like its approach is just

0:18:20 like, it waits for a promise to resolve.

0:18:22 And when the promise resolves, it wipes the optimistic state.

0:18:25 And so it's very much just like, if anything happens at all,

0:18:28 it's like, And so it's only.

0:18:30 Interestingly enough, there's also a lot of people coming from React Query and so

0:18:35 on, from those sort of more traditional front end state management things.

0:18:40 and that brings them to local-first in the first place, because they're like

0:18:44 layering optimistic, one optimistic state handler on top of the next one.

0:18:49 And if there's a little flaw inside of there, everything collapses

0:18:53 since you don't really know have principled way to reason about things.

0:18:57 So that makes a lot of sense.

0:18:59 Exactly right.

0:19:00 And so like a framework like TanStack, for instance, with TanStack query, it has like

0:19:05 slightly more sophisticated optimistic state primitives than just say the kind

0:19:10 of a primitive use of optimistic hook.

0:19:12 And one of the thing, one of the challenges that you have is that for

0:19:15 say, a simple approach to, to just using optimistic state to display an immediate

0:19:20 write is like, is that optimistic state global to your application?

0:19:24 Shared between components?

0:19:25 Is it scoped within the component?

0:19:27 And so, as you say, like there's an approach where you could come along

0:19:30 and say, okay, I've got three or four different components and so far I've

0:19:33 just been able to sort of render the optimistic state within the component.

0:19:37 But now I've got two components that are actually displaying the same information.

0:19:40 And suddenly I've got like stale data.

0:19:42 It's like the old days of manual DOM manipulation and you forgot

0:19:45 to update a state variable.

0:19:47 And so.

0:19:48 Yeah, in a way that's where you come to a more proper local-first solution

0:19:53 where your optimistic state would be, stored in some sort of shared store.

0:19:58 So it could just be like a JavaScript object store, or it

0:20:01 could be an embedded database.

0:20:03 And so you get a slightly more sophisticated models of

0:20:07 managing optimistic state.

0:20:08 And the great thing is there are, like TanStack Query and others, there's

0:20:11 like, there's a bunch of existing client side frameworks that can handle

0:20:14 that kind of management for you.

0:20:17 Once you go, for instance, like to an embedded database for the state.

0:20:21 So one of the kind of really nice, points in the design space for this is to have a

0:20:27 model where you sync data onto the device and you treat that data as immutable.

0:20:32 And then you can have, for instance, so, so say, for instance, you're syncing a

0:20:37 database table, say it's like a log viewer application, and you're just syncing the

0:20:41 logs in, and it goes into a logs table.

0:20:44 Now, say the user can interact with the logs and delete them,

0:20:47 or change the categorization.

0:20:49 And so you can have a shadow logs table, which is where you would

0:20:52 save the local optimistic state.

0:20:54 And then.

0:20:55 You can do a bunch of different techniques to, for example, create a view or a live

0:20:59 query where you combine those two on read.

0:21:02 So the application just sort of feels like it's interacting with the table,

0:21:05 but actually it's split in the storage layer into a mutable table for the sync

0:21:09 state and a kind of local mutable table.

0:21:12 And the great thing about that is you can have persistence for the, both the

0:21:15 sync state and the, local mutable state.

0:21:18 And of course it can be shared.

0:21:19 So you can have multiple components, which are all sorts of just going

0:21:22 through that unified data store.

0:21:24 and there's some nice stuff that you can do in SQL world, for instance, to use

0:21:27 like instead of triggers to combine it.

0:21:29 So it just feels like you're working with a single table.

0:21:32 Now it's a little bit additional complexity on something like defining

0:21:35 a client side data model, but what it gives you is it gives you a

0:21:39 very solid model to reason about.

0:21:42 So like, You can go, okay, basically the sync state is always golden.

0:21:46 It's immutable.

0:21:46 Whenever it syncs in, it's correct.

0:21:48 If I have a problem with this local state, that's just, that's like mutable stuff.

0:21:53 Worst case, I can get rid of it, or I can develop more sophisticated strategies for

0:21:57 dealing with rollbacks and edge cases.

0:22:00 So it in a way it can give you a nice developer experience.

0:22:04 with that model, you could choose then whether your writes are, whether you're

0:22:08 writing to the database, detecting changes, and then sending those to

0:22:11 some sort of like replication ingest point, or whether you're still just

0:22:15 basically talking to an API and writing the local optimistic state separately.

0:22:21 So, so at that point you can have, again, you can have, you have this

0:22:24 fundamental model of like, Are you writing directly to the database and

0:22:27 all the syncing happens magically?

0:22:29 Or are you just using that database as a sort of unified, local optimistic store?

0:22:34 So this is the sort of type of like progression of patterns.

0:22:36 And once you start to go through something where you would, for instance, have a

0:22:42 synced state that is mutable, or you are writing directly to the database,

0:22:46 that's really where you start to get a little bit more into the world of like

0:22:49 convergence logic and kind of merge logic and CRDTs and sort of what's commonly

0:22:54 understood as proper local-first systems.

0:22:57 And I think that's the point where almost the complexity of those

0:22:59 systems does become very real.

0:23:01 Like, as you well know, from building LiveStore and as we see from the

0:23:04 kind of, quality of libraries like AutoMerge, Yjs, et cetera.

0:23:08 so that's probably where as a developer, it makes sense to reach for a framework.

0:23:12 And you certainly could reach for a framework for that sort of like.

0:23:15 Combine on read, sync, sync into a mutable kind of persist local mutable state.

0:23:21 But what we find is that it is actually if you want to, it's actually

0:23:25 relatively straightforward to develop yourself, you can reason about it

0:23:28 fairly simply, and so it's not too much extra work to just basically go

0:23:32 as long as you've got that read sync primitive, you can build like a kind of

0:23:36 proper locally persistent, consistent local-first app yourself, basically.

0:23:42 Just using fairly standard front end primitives.

0:23:44 Right.

0:23:45 Okay.

0:23:46 Maybe sharing a few reflections on this, since I like the way how you,

0:23:50 portrayed this sort of spectrum of this different kind of write patterns.

0:23:54 in a interview that I did with Matthew Weidner, I learned a lot there

0:23:58 about the way, how he thinks about different categorizations of like state

0:24:02 management, and particularly when it comes to distributed synchronization.

0:24:07 and I think one pattern that got clear there was that there's either you're

0:24:12 working directly manipulating the state, which is what like Automerge, et

0:24:16 cetera, are de facto doing for how you as a developer interact with the state.

0:24:21 So you have like a document and you manipulate it directly.

0:24:25 You could also apply the same logic of like, you have a Database table, for

0:24:30 example, that's how CR SQLite works, where you have a SQLite table and you

0:24:35 manipulate a row directly and that is being synchronized as the state and

0:24:41 you're ideally modeling this with a way where the state itself converges and

0:24:46 through some mechanisms, typically CRDTs.

0:24:49 But then there's another approach, which might feel a little bit more

0:24:53 work, but it can actually be concealed quite nicely by systems, for example,

0:24:58 like LiveStore, in this case, unbiased, and where you basically separate

0:25:02 out the reads from the writes.

0:25:05 And often enough, you can actually fully, re compute your

0:25:10 read model from the write model.

0:25:12 So, if you then basically express everything that has happened, that

0:25:16 has meaningfully happened for your application as a log of events.

0:25:20 Then you can often kind of like how Redux used to work or still works, you can

0:25:24 fully recompute your view, your read model from all the writes that have happened.

0:25:29 And I think that would work actually really, really well together in tandem

0:25:33 with Electric, where if you're replicating what has happened in your Postgres

0:25:39 database as like a log of historic events, then you can actually fully, recreate

0:25:45 Whatever derived state you're interested in and what is really interesting about

0:25:49 that approach, but that particular write pattern is that it's a lot easier to

0:25:54 model that and reason about that locally.

0:25:57 Did you say like, Hey, I got those events from the server, those

0:26:00 events, I am applying optimistically.

0:26:03 You can encode sort of even a causal order that doesn't really, If someone

0:26:09 is like confused about what does causal order mean, don't worry about it.

0:26:13 Like you can probably at the beginning, keep it simple, but once you layer

0:26:18 on like more and more dependent, optimistic state transitions, this is

0:26:22 where you want to have the information.

0:26:25 Okay.

0:26:25 If I'm doing that, and then the other thing depends on that, that's basically a

0:26:29 causal order and modeling that as events.

0:26:32 I think is a lot simpler and is a way to, to deal with that monstrosity of like,

0:26:38 losing control over your optimistic state.

0:26:41 Since I think one thing that's, that makes optimistic state management

0:26:44 even more tricky is that, like, how are things dependent on each other?

0:26:50 And then also like, when is it assumed to be good.

0:26:54 I think in a world where you use Electric, once you're from the

0:26:57 Electrics server, you've got sort of confirmation, like, Hey, those

0:27:01 things have now happened for real.

0:27:02 You can trust it.

0:27:04 but there's like some latency in between, and the latency might be

0:27:07 increased by many, many factors.

0:27:10 One way could be that you just, you are on a like slow connection or the server

0:27:15 is particularly far away from you and might take a hundred milliseconds, but

0:27:19 another one might be your have a spotty connection and like packages get lost and

0:27:25 it takes a lot longer or you're offline and being offline is just like a form

0:27:30 of like a very high latency form and so all of that, like if you're offline,

0:27:36 if it takes a long long time, and maybe you close your laptop, you reopen it.

0:27:41 Is the optimistic state still there?

0:27:43 Is it actually locally persisted?

0:27:45 So there are many, many more layers that make that more tricky.

0:27:49 But I like the way how you're like, how you split this up into the read

0:27:54 concerns and the write concerns.

0:27:56 And I think this way, it's also very easy to get started with new

0:28:00 apps that might be more read heavy and are based on existing data.

0:28:05 I think this is a very attractive trade off that you say like, Hey, with

0:28:09 that, I can just sink in my existing data and then step by step, depending

0:28:14 on what I need, if I need it at all.

0:28:16 Many apps don't even need to do writes at all, and then you

0:28:19 can just get started easily.

0:28:21 Yeah, I think, I mean, that's explicitly a design goal for us is like, yeah,

0:28:25 if you start off with an existing application and maybe it's using REST

0:28:29 APIs or GraphQL, it's like, well, what do you do to start to move that

0:28:32 towards a local-first architecture?

0:28:34 And exactly, you could just go, okay, well, just, let's just leave the way

0:28:37 that we do writes the same as it is.

0:28:39 And let's move to this model of like syncing in the data

0:28:41 instead of fetching the data.

0:28:43 And that can just be a first step.

0:28:45 And I think, I mean, Across all of these techniques for writes, there

0:28:48 is just something fundamental about keeping the history or the log

0:28:52 around as long as you need it, and then somehow materializing values.

0:28:58 So sort of internally, this is what a CRDT does, right?

0:29:01 it's clever and has a sort of lattice structure for the history, but basically

0:29:05 it keeps the information and allows you to materialize out a value.

0:29:09 if you just have like an event log of writes.

0:29:11 So as you were saying with, with LiveStore, when you have like a

0:29:14 record of all the write operations, you can just process that log.

0:29:17 so I think, you know, you can do it sort of within a data type.

0:29:21 And I think that fits as well for greenfield application where you're trying

0:29:25 to craft, kind of real time or kind of collaboration and concurrency semantics,

0:29:29 but like from our side of coming at it, from the point of saying, right, when

0:29:32 you've got applications that build on Postgres, you already have a data model.

0:29:35 You just sort of layer the same kind of history approach on top by like, keeping

0:29:39 a record of the local writes until you of sure you can compact them and actually

0:29:44 that same principle is exactly how the read path sync works with Electric.

0:29:49 So Postgres logical replication, it just basically, it emits a stream, it's like

0:29:56 transactions that contain write operations and it's basically inserts, updates,

0:30:00 and deletes with a bit of metadata.

0:30:02 And so we end up consuming that and basically writing

0:30:06 out what we call shape logs.

0:30:07 So we have a primitive called a shape, which is how we control the partial

0:30:10 replication, like which data goes to which client and a client can define multiple

0:30:14 shapes, and then you stream them out.

0:30:16 But that shape log comes through our replication protocol as just that

0:30:21 stream of logical update operations.

0:30:23 And so in the client, you can just, you can materialize the data immediately.

0:30:28 So like we provide, for instance, a shape stream primitive in a JavaScript client

0:30:32 that just omits the series of events.

0:30:34 And then we have a shape, which we'll just take care of materializing that

0:30:37 into a kind of map value for you.

0:30:39 but you could do what you want, whatever you wanted with that stream of events.

0:30:42 So if you found that you wanted to keep around a certain history of the

0:30:46 log in order to be able to reconcile some sort of causal dependencies,

0:30:49 that's just totally up to you.

0:30:51 And so, yeah, it's quite interesting that it's almost just the same approach,

0:30:54 which is the general sort of principle for handling concurrency on the

0:30:58 write path is also just exactly what we've ended up consolidating down on

0:31:02 exposing through the read path stream.

0:31:04 That makes a lot of sense.

0:31:05 So, Let's maybe go a little bit more high level.

0:31:08 Again, for the past couple of minutes, we've been talking a lot about like how

0:31:12 Electric happens to work under the hood.

0:31:14 And there's many commonalities with other technologies and

0:31:17 all the way to CRDTs as well.

0:31:19 But going back a little bit towards the perspective of someone who would

0:31:23 be using Electric and build something with Electric and doesn't maybe

0:31:28 peel off all the layers yet, but get started with one of the easier off the

0:31:32 shelf options that Electric provides.

0:31:35 So my understanding is that you have your existing Postgres database.

0:31:40 you already have your like tables, your schema, et cetera, or if it's

0:31:44 a greenfield app, you can design that however you still want.

0:31:47 And then you have your Postgres database.

0:31:50 Electric is that infrastructure component that you put in front

0:31:53 of your Postgres database that has access to your Postgres database.

0:31:58 In fact, it has access to the replication stream of Postgres.

0:32:02 So it knows everything that's going on in that database.

0:32:05 And then your client is talking to the Electric sync engine to

0:32:10 sync in whatever data you need.

0:32:12 And the way that's expressed what your client actually needs is through

0:32:17 this concept that you call shapes.

0:32:19 And my understanding is that a shape basically defines a subset

0:32:23 of data, a subset of a table that you want in your client.

0:32:28 since often like tables are so huge and you just need a particular

0:32:32 subset for your given user, for your given document, whatever.

0:32:38 The role of Shapes

0:32:38 Yeah, that's just exactly how it works.

0:32:40 And.

0:32:41 the Electric Sync Engine it's a web service.

0:32:44 It's a Docker container, like technically it's an Elixir application.

0:32:47 And it just connects to your Postgres as a normal Postgres client would.

0:32:52 So you have to run your Postgres with logical replication enabled.

0:32:57 And then we just connect in over a database URL.

0:32:59 And so it's just as if you were like, imagine you're deploying a Heroku app,

0:33:03 and it's sort of Heroku Postgres, and it just provisions a database URL, and your

0:33:06 back end application can connect to it.

0:33:08 So it's the same way that a sort of Rails app would talk to, talk to Postgres.

0:33:12 And then Electric does some stuff internally to of route data into

0:33:16 these shape logs, which are the sort of logs of update operations for each

0:33:21 kind of unit of partial replication.

0:33:23 And then we actually just provide a HTTP API, which is quite key to a whole

0:33:28 bunch of the, affordances of the system.

0:33:31 So I can dive into that if it's interesting.

0:33:33 But then, yeah, you basically have a client, Which pulls data

0:33:37 by just making HTTP requests.

0:33:39 and so HTTP gives you back pressure and the client's in control of

0:33:44 which data it pulls when, and then how you process that stream.

0:33:48 Yeah, we do provide some primitives to make it simple.

0:33:51 Like we give you React hooks to just sort of bind a shape to a state variable,

0:33:55 but Basically, you can do what you like with the data as it streams it.

0:33:59 So, yeah, I would love to learn more about that design decision of choosing HTTP

0:34:03 for that network layer, for that API.

0:34:05 Since I think most people think about local-first, think about real time

0:34:10 syncing, et cetera, that reactivity.

0:34:13 And for most people, I think particularly in the web, the mind goes to web sockets.

0:34:17 So why HTTP?

0:34:19 Wouldn't that be very inefficient?

0:34:21 How does reactivity work?

0:34:23 Can you walk me through that?

0:34:25 Why using HTTP for network layer?

0:34:25 Yeah, so.

0:34:26 I mean, exactly.

0:34:27 We, went on that journey with the product where with the earlier, slightly more

0:34:30 ambitious Electric that I was describing, we built out a custom binary WebSocket

0:34:36 protocol to do the replication, and it's just what you sort of immediately

0:34:39 think you're like, let's make it efficient over the wire and obviously

0:34:41 it should be a WebSocket connection because you're just having these sorts

0:34:44 of ongoing data streams, but, So one of the things that happened with the,

0:34:48 focusing of the product strategy was that, Kyle Matthews joined the team.

0:34:52 So Kyle was actually the founder of Gatsby, which is like the React framework.

0:34:57 And through Gatsby, he did a lot of work around basically data

0:35:01 delivery into CDN infrastructure.

0:35:04 And so one of the insights that Kyle brought into the team was if

0:35:08 we re engineered the replication protocol on plain HTTP, and we just

0:35:13 do like plain HTTP, plain JSON.

0:35:16 And we replicate over an old fashioned long polling protocol.

0:35:20 So you just, basically we have a model where the client makes a request to a

0:35:24 shape endpoint, and then we just return the data that the server knows about.

0:35:28 So we'll sort of chunk it up sometimes over multiple requests, but it's

0:35:31 just a standard, like load and load a JSON in a document request.

0:35:35 And then once you get a message to say that the client is up to date

0:35:38 with the server, then you trigger into a long polling mode where basically

0:35:41 the server holds the connection open until any new data arrives.

0:35:45 and yes, you kind of think instinctively like, okay, it's say JSON instead of

0:35:50 binary, so it'll be less efficient and you're having to make these

0:35:52 sort of extra requests that surely they add latency over some sort of

0:35:56 more optimized, WebSocket protocol.

0:35:58 But the key thing is that by doing that, it allows us to deliver the data

0:36:02 through existing CDN infrastructure.

0:36:05 So those initial data loading requests, like typically when you're building

0:36:10 applications on this shape primitive, you can find ways of defining your shapes

0:36:15 so that they're shared across users.

0:36:16 You might have some unique data that's unique to a user, but Like say you have a

0:36:21 project management app and there's various users who are all in the same project,

0:36:24 you could choose to like sync the kind of project data down rather than just

0:36:28 sort of syncing all the user's data down.

0:36:30 And so that way you get shapes being shared across users.

0:36:33 And so the first user to request it hits the Electric service, we

0:36:37 generate these responses, but then they go through Cloudflare or Fastly

0:36:41 or CloudFront or what have you.

0:36:43 And every subsequent request is just served out of like

0:36:46 essentially Nginx or Varnish.

0:36:48 And so it's just super efficient.

0:36:50 All of this infrastructure is just like super battle tested

0:36:52 and as optimized as it can be.

0:36:54 That is very interesting.

0:36:56 It reminds me a little bit of like how modern bundlers, and I think even like

0:37:00 all the way back to Webpack, used to split up larger things into little chunks.

0:37:06 And those chunks would be content hashed.

0:37:08 And that would be then often, be cached by the browser across

0:37:12 different versions of the same app.

0:37:15 In this case, it would be beneficial to the individual user who would reload it.

0:37:20 And also of course, like to other people who visit this, but now you

0:37:24 take the same idea, even further and apply it to data shared across users

0:37:29 by applying the same infrastructure, HTTP servers, CDNs, et cetera, to make,

0:37:35 things cheaper and faster, I guess.

0:37:38 Well, and, and the local browser c or client cache as well.

0:37:41 So you have this sort of shared caching within a CDN layer where you

0:37:45 might have multiple clients, which are like, literally it's a sort of

0:37:48 shared cache in the HTTP cache control.

0:37:50 That makes a lot of sense.

0:37:50 Since like, on a website level, I'm not sure whether you

0:37:53 have clear caching semantics.

0:37:55 I don't think so.

0:37:57 Yeah, you'd have to do some very sort of custom stuff to

0:37:59 sort of achieve the same things.

0:38:01 but also because, so with the browser, when you're loading data, like HTTP

0:38:05 requests with the write cache headers can just be stored in the local file cache.

0:38:09 So one of the really nice things with just, like loading shape data

0:38:12 through the Electric API is you can achieve an offline capable app without

0:38:16 even having to implement any kind of local persistence for the data

0:38:20 that's loaded into the file cache.

0:38:23 So that sort of model, if like say you've gone to a page and you've just

0:38:26 loaded the data through Electric, even if you didn't store the data, if you

0:38:30 navigate back to the same page, the data's just there out of the file cache.

0:38:34 So the application can work offline without even having

0:38:37 any kind of persistence.

0:38:38 So you almost get like, I mean, there's some sort of edge cases on this stuff,

0:38:41 but it's the thing, because you're just working with the standard primitives,

0:38:44 you've just got the integration with the existing tooling and you get a

0:38:47 whole bunch of these things for free.

0:38:49 That is very elegant and I guess that is being unlocked now because like

0:38:54 you embrace the semantics of change of like how the data changes more and by

0:39:00 modeling and this is where it now gets relevant again why everything here is

0:39:04 modeled as a log under the hood since like to the log you just append and so

0:39:08 you can safely cache everything that has happened up until a point in time,

0:39:12 and from there on, you just add things on top, but that doesn't make the stuff

0:39:16 that has happened before less valid.

0:39:18 So you can cache it immutably.

0:39:20 That makes it super fast.

0:39:21 You can cache it everywhere on the edge, on your local device, et cetera.

0:39:25 And that gives you a checkpoint that at least once in a point in time was

0:39:31 valid, and now there might be more stuff that should be applied on top of

0:39:34 it, but that's already a better user experience than not getting anything.

0:39:38 I mean, another thing is like the operational characteristics of the

0:39:41 system, for this type of sync technology.

0:39:44 So, for instance, again, comparing HTTP with WebSockets, like

0:39:47 WebSockets are stateful, and you do just keep things in memory.

0:39:51 And so across, if you look across most real time systems, They have scalability

0:39:55 limits because you will come to the point where if you have, say, 10, 000

0:39:57 concurrent users, it's almost like, you know, it's like the thing of don't have

0:40:01 too many open Postgres connections.

0:40:03 But if you're holding open 10, 000 WebSockets, you may be able to do the

0:40:07 IO efficiently, but you will ultimately be growing that kind of memory and

0:40:11 you'll hit some sort of barrier.

0:40:12 Whereas, with this approach, you can basically offload that

0:40:15 concurrency to the CDN layer.

0:40:17 So, it's not just about, being, basically taking away the query workload of the

0:40:23 cached initial sync requests, but these kind of reverse proxies or CDNs have

0:40:27 a really nice feature called request collapsing or request coalescing, which

0:40:31 means that when they have a cache of requests come in on a URL, if they have

0:40:36 Two clients making a request to the same URL at the same time, they sort of hold

0:40:41 both of them at the cache layer and only send one request onto the origin server.

0:40:45 And so basically we've been able to scale out now to 10 million concurrent clients

0:40:51 receiving real time data out of Electric on top of a single single Postgres.

0:40:56 And there is literally no CPU overhead on the Postgres or the Electric layer.

0:41:01 It's just entirely handled out of the CDN CDN serving.

0:41:05 And so it's sort of remarkable that the combination of the initial data

0:41:09 load caching means that we, like one of our objectives is we want to be

0:41:13 as fast as just querying the database directly for an initial data load

0:41:17 and then orders of magnitude faster for anything that then subsequent

0:41:21 requests coming out of the cache, but also this sort of challenge with.

0:41:25 Almost like the, this thing about saying, okay, you're building an application.

0:41:29 You maybe want some of the user experience or developer experience

0:41:32 affordances of local-first, but if to do that, I need a sync engine and a

0:41:36 sync engine is kind of a complex thing.

0:41:39 And so you end up either going, okay, maybe I'll sort of use an external system.

0:41:44 And then you get like, A siloed real time database in your main database

0:41:47 and you get operational complexity, or you get some sort of system where

0:41:51 you have, yeah, you're basically of stewarding these web sockets and

0:41:54 it's very easy for it to fall over.

0:41:56 And I think actually, like, if you just sort of honestly view that

0:42:00 type of, architectural decision from the lens of like somebody trying to

0:42:04 build a real project, which is their day job, trying to get stuff done.

0:42:08 You're just going to avoid that as much as you can, because like

0:42:10 you'd far rather just like, I just want to serve this with Nginx.

0:42:13 I know how that's going to work.

0:42:14 I'm not going to stay up at night worrying about it.

0:42:17 Whereas I have 10, 000 concurrent users going through some crazy WebSocket stuff.

0:42:20 I'm going to get pager alerts.

0:42:22 And so like the whole approach here with what we're trying to do is to

0:42:26 change that sense that sync is a complex technology that you sort of.

0:42:31 Play with on the weekend and only adopt when you have to.

0:42:34 So going, look, you can actually do sync in such a way that it is

0:42:37 just as simple and standard as normal web service technology.

0:42:41 And then suddenly you can actually unlock the ability for kind of real

0:42:44 projects you know, you can take this stuff into a day job and not, get it

0:42:47 shouted down at the design meeting.

0:42:49 Cause it just feels like too much black box complexity.

0:42:52 You're using the word simple here.

0:42:54 And I think that really speaks to me now, because it's both simple in terms of

0:43:01 architecturally, like, how does data flow?

0:43:04 so I think this is where Electric provides a very simple and I think

0:43:09 easy to use and easy to work with trade off, like, how does data flow,

0:43:14 but then it's also gives a very simple answer of like, how does it scale?

0:43:19 Since you can throw at it like all the innovations and all the hard

0:43:23 work that has now gone into the like our web infrastructure for the last

0:43:27 decades, you can run on the latest and greatest and all the innovations that

0:43:33 Nginx and HAProxy and Cloudflare and like all the work that has into that.

0:43:39 You can just piggyback on top of that without having to innovate on the

0:43:44 networking side as well, since like you, you're really doing the hard work

0:43:48 on the more semantic and data side.

0:43:51 And that's a really, really elegant trade off to me.

0:43:54 Yeah.

0:43:54 And it's, it's fun because like our benchmarking testing at the

0:43:56 moment, like we break CloudFlare before we break Electric.

0:43:59 if something is battle tested, it's CloudFlare.

0:44:02 It again, it carries on because it's not just about this sort of

0:44:05 scalability or operational stuff.

0:44:06 It's also about then how you can achieve, like we talked about the write patterns.

0:44:10 And so this sort of pattern of how do you do writes?

0:44:12 And it's like, well, actually you can do the sync like this, use

0:44:14 your existing API to do writes.

0:44:16 And it can work with your existing stack.

0:44:19 But you have other obvious concerns with this type of architecture, like

0:44:22 say, authentication, authorization, data security, encryption.

0:44:27 But HTTP.

0:44:29 just has proxies and it works with the sort of middleware stack.

0:44:33 And so for us, a shape endpoint as a sync endpoint is just a HTTP resource.

0:44:40 So if you want to just put like an authorization service in front of it,

0:44:43 you just proxy the request through and you like, you have the context from

0:44:47 the user, you can have the context about the shape and you can just

0:44:49 authorize it using your existing stack.

0:44:52 If you want to do encryption, then you can do that.

0:44:54 It's just a stream of messages.

0:44:55 And yeah, a bit like you were saying that, like with Electric, you could

0:44:58 just use it as a transport layer to like, say, route a log of messages.

0:45:03 That can be ciphertext or plaintext.

0:45:05 So you could just like encrypt on device, sync it through.

0:45:08 You can just decrypt whenever you're consuming the stream.

0:45:11 And again, you could do that, like in the client, you could

0:45:13 do that in HTTP middleware.

0:45:15 So a lot of the sort of concerns, which, like certainly our experience of trying

0:45:20 to build a more integrated end to end local-first stack, you know, you go,

0:45:24 okay, we need to, we need to solve this.

0:45:25 I need a security rule system because suddenly there is no API and how am

0:45:29 I going to authorize the data access?

0:45:30 And it's like, we don't need a security rule system.

0:45:33 Because you can just use, you can just use normal API middleware

0:45:37 in front of an HTTP service.

0:45:39 And so you just sort of take that problem out of scope and like the

0:45:42 system doesn't need to do encryption.

0:45:44 It doesn't need to provide like a kind of hooks mechanism or some sort

0:45:47 of framework extensibility because the protocol is extensible and just,

0:45:51 you just have all of this ecosystem of existing tooling built around it.

0:45:55 So it is, I mean, it's been fantastic for us because it, because it

0:45:59 simplifies all of this aspects.

0:46:01 And allows us to go, look, this is how you can achieve, say

0:46:03 authorization with Electric, but again, it pushes it out of scope.

0:46:07 So we get to focus our engineering resources on just doing the core stuff

0:46:11 to deliver on this core proposition.

0:46:13 So which sort of things would you say are particularly tricky from a application

0:46:19 of all perspective with Electric, where it might be not as much of a good fit?

0:46:23 I think, One of the things is that we sync through the database and that has latency.

0:46:31 And so if you're trying to craft a really low latency real time multiplayer

0:46:36 experience, like, or even doing things where in a way it doesn't really

0:46:41 make sense to be, synchronizing that information through the database layer,

0:46:46 then it's maybe not the best solution.

0:46:49 So sort of for like presence features, let's say Infignar, where

0:46:54 you see my mouse cursor moving around, those sort of things.

0:46:57 yes, it would be nice if it was in real time shared across the various

0:47:01 collaborators, but you don't need a persistent trace of that for

0:47:05 eternity in your Postgres database.

0:47:07 So I think a common approach for that as well is just to have like

0:47:11 two kind of different channels for how your data flows, like your,

0:47:15 persisted data that you want to actually keep around as a fixed trail.

0:47:19 Like, did I create this GitHub issue or not?

0:47:22 But like how my mouse cursor has moved around, it's fine that that's

0:47:26 being broadcasted, but if someone opens it an hour later, it's fine

0:47:30 that that person would never know.

0:47:32 So for this sort of use case, it's an overkill basically

0:47:38 to pipe that trough Postgres

0:47:38 Yeah.

0:47:39 And you know, it's.

0:47:39 For us, Postgres is a big qualifier.

0:47:41 It's like, if you, if you want to use Postgres, if you have an existing Postgres

0:47:46 backed system, like Electric shines where like, yeah, you have, you already use

0:47:51 Postgres or you know that you want to be using Postgres, maybe you already have a

0:47:54 bunch of integrations on the data model already, maybe you do have existing API

0:47:58 code, like this is the scenario where we're really trying to say, well, look,

0:48:02 in that scenario, this is a great, pathway to move towards these more advanced

0:48:07 local-first sync based architectures, where, whereas if you look at it from a

0:48:11 sort of more greenfield development point of view, and you're trying to craft a

0:48:15 particular concurrency semantics, say, you would reach for Automerge and you

0:48:20 would get custom data types, which you can craft advanced kind of invariant

0:48:24 support with your kind of data types.

0:48:27 But of course, you know, so that's a slightly different sort of world.

0:48:30 And, and I think so almost probably for sort of a lot of people in the local-first

0:48:35 space dive into CRDTs and so forth, you know, it's really, it's fascinating

0:48:39 to try to sort of craft these sort of optimized, kind of, present style,

0:48:44 immediate real time streaming experiences.

0:48:47 And so whilst we do real time sync, it's almost more about keeping the data fresh

0:48:52 and just sort of making sure that the clients are sort of eventually consistent

0:48:56 rather than making that more sort of game kind of experience where, you

0:49:00 know, where maybe peer to peer matters more or of finding clever hacks to have

0:49:03 very low latency kind of interactions.

0:49:06 PGlite

0:49:06 That makes a lot of sense.

0:49:07 So now we've talked a lot about Electric and Electric is the name of the company.

0:49:12 It's the name of your main product.

0:49:14 But there's also been a project that I'm not sure whether you

0:49:17 originally created, but it's certainly in your hands at this point.

0:49:21 It's called PGlite.

0:49:23 That made the rounds on Hacker News, etc.

0:49:26 Also through a joint launch with the folks at Superbase.

0:49:29 What is PGlite?

0:49:31 What is that about?

0:49:33 Yeah, so I mean, interestingly with Electric, we started off, building

0:49:37 a stack, which was sinking out of Postgres into SQLite because it

0:49:42 made sense as the sort of main like embeddable relational database.

0:49:45 and I remember, speaking to Nikita, who is the CEO at Neon, the Postgres database

0:49:50 company, and some of his advice from building SingleStore or MemSQL was the

0:49:57 impedance or the mismatch between the two database systems and the data type systems

0:50:02 will continue to just be a source of pain for as long as you build that system.

0:50:06 And so we were just having these conversations about going, how do we

0:50:09 make this Postgres to Postgres sync?

0:50:11 And then, You can just eliminate any mismatch.

0:50:15 You just, you don't even need to do any kind of like serialization of the data.

0:50:19 You can just literally take it exactly as it comes out of like the binary

0:50:23 format that comes through in a query or the replication stream from Postgres,

0:50:26 put that into the client and like, you can have exactly the same data

0:50:29 types and exactly the same extensions.

0:50:31 So this was a sort of motivation for us.

0:50:32 And co founder Stas, the CTO at Neon had done an experiment.

0:50:37 to try and make a more efficient Wasm builder Postgres that could

0:50:41 potentially run in the client.

0:50:43 So previously there'd been some really cool work by Superbase, by Snaplet, a

0:50:47 few teams, which had developed these sorts of VM based, Wasm Postgreses.

0:50:52 But they were pretty big.

0:50:53 they didn't really have persistence.

0:50:54 They weren't, they were sort of more of a kind of proof of concept.

0:50:57 and the approach that Stas took was to do a pure Wasm build and

0:51:02 run Postgres in single user mode.

0:51:04 And that allowed you to basically remove a whole bunch of the concurrency

0:51:09 stuff within Postgres, which allowed us to make a much, much smaller build.

0:51:13 So they shared that repo.

0:51:15 And we sort of, played with it for a little while.

0:51:18 Didn't quite manage to kind of make it work.

0:51:20 And then one of the guys on our team, Sam Willis, just picked it up one week

0:51:23 and put in some concerted efforts and basically managed to pull it together

0:51:27 with persistence as a three meg build.

0:51:30 And it worked, and so suddenly we had this project which was like a three meg like

0:51:34 SQLite for context is like a one meg WASM build, and so Postgres is much kind of

0:51:39 larger system and you think it would be much bigger, but suddenly actually it's

0:51:41 not that far off in terms of the download speed, and it could just run as a fully

0:51:46 featured Postgres inside the browser.

0:51:48 and so we sort of tweeted that out and it's gone a bit crazy.

0:51:50 I think it's like, it's the fastest growing database project ever on GitHub.

0:51:54 It's like 250, 000 downloads a week nowadays.

0:51:57 There's a huge, there's lots and lots of people using it.

0:51:59 Superbase are using it in production.

0:52:00 Google are using it in production.

0:52:02 Lots of people are building tooling around it, like drizzle integrations, et cetera.

0:52:06 And it's the sort of thing that just should exist, right?

0:52:08 There should be a WASM built at Postgres, just being able to have it

0:52:11 like the same database system instead of mapping into an alternative one

0:52:15 has these fundamental advantages, and also a lot of people have just been

0:52:21 coming up with like a whole range of interesting use cases for it as a project.

0:52:25 So some people are interested in running it inside Edgeworkers.

0:52:28 As a sort of data layer that you can hydrate data into

0:52:31 for kind of background jobs.

0:52:33 Some people are interested in running it as just like a development database.

0:52:37 So you can just NPM install Postgres.

0:52:38 And if you're running like an application stack, you don't have to

0:52:41 run Postgres as an external service.

0:52:43 The same thing in your testing environment.

0:52:46 So there's a whole bunch of different use cases.

0:52:48 And in fact, like some of the work, for instance, the Superbase have

0:52:51 done is they built a very cool project called database.build,

0:52:55 which is a sort of AI driven database backed application builder.

0:53:00 So it's sort of AI app builder for building Postgres backed

0:53:02 applications, and it just runs purely on PGlite in the client.

0:53:07 And so that's a demonstration where.

0:53:09 this sort of database infrastructure for running software, you had

0:53:13 centralized databases, and then you had this sort of move to serverless

0:53:16 with separation of compute and storage.

0:53:18 And now you sort of have this model where actually you can run the compute,

0:53:21 with a whole range of different storage patterns in the client.

0:53:24 And you don't even need to deploy any infrastructure on the server.

0:53:28 to run database driven applications.

0:53:30 it really reminds me of that time when JavaScript was

0:53:34 getting more and more serious.

0:53:35 And at some point there was no JS and suddenly you could run the same sort of

0:53:40 JavaScript code that you were running in your browser, now also on the server.

0:53:45 And well, the rest is history, right?

0:53:47 Like that changed the web forever.

0:53:50 It has like changed dramatically how JavaScript just become like

0:53:54 the default full stack foundation for almost every app these days.

0:53:59 And there seemed to be a lot of like similar characteristics.

0:54:02 This time, the other way around, like going from the server into the world,

0:54:07 Node, it was rather the other way around, but, that seems like a huge deal.

0:54:11 Yeah, you know, you sort of step forward and we of see, I guess, some

0:54:15 of these trends in data architecture and just, you know, it can just

0:54:19 be the same database everywhere.

0:54:20 And in a way, it's just sort of almost logically extended to wherever you want.

0:54:23 And you almost like, you can just have this idea of like

0:54:28 declarative configuration of what data should sit where.

0:54:31 AI systems can optimize transfer and placement, and it is just

0:54:35 all the same kind of data types.

0:54:37 and I think, this is sort of where systems are moving to, but also

0:54:40 just like some of these things we've been learning with PGlite, like for

0:54:44 instance, if you're running a system that relies on having say a database

0:54:48 behind your application and say it's a SAS system and you're spinning up some

0:54:51 infrastructure for a client, With PGlite, you don't necessarily need to spin up a

0:54:55 database in order to serve that client.

0:54:57 So if you think about something like the free tier of like SaaS platform like that,

0:55:01 it can just change the economics of it.

0:55:04 it can do that on the server by just allowing you to have

0:55:06 the Postgres in process.

0:55:08 So you're not deploying additional infrastructure.

0:55:10 But also you move it all the way into the client and there just is

0:55:13 no compute kind of running on this.

0:55:15 It just moves even more of the compute onto the client.

0:55:18 And I think it like, it obviously aligns with sort of local-first in

0:55:21 general, but I know some of the stuff we've talked about before around the

0:55:24 concept of like local only first.

0:55:27 And as a developer experience for building software, so one of the

0:55:30 things that LiveStore is specifically designed to support is this ability

0:55:35 to Build an application locally with very fast, feedback and iteration.

0:55:40 And then you progressively add on, say, sync or persistence and

0:55:43 sharing and things when you need to.

0:55:45 And I think this sort of model of being able to build the software on

0:55:48 a database like, PGlite and then go, okay, I've played with this enough.

0:55:52 I want to save my work.

0:55:53 And it's at that point that you write out to blob storage, or you

0:55:57 maybe provision the database to be able to of save the data into.

0:56:00 Yeah, I think you've touched on something really interesting and something really

0:56:04 profound, which I think is kind of two second order effects of local-first.

0:56:09 And so one of them is for the app users directly.

0:56:13 So ideally it should just become so cheap and so easy to offer the full

0:56:19 product experience as sort of like a taste, fully on the client that is

0:56:24 no longer sitting behind a paywall.

0:56:26 But if the product experience generally allows for that, if it's sort of like

0:56:30 a note, note taking tool or something like that, that I should be able to

0:56:35 like fully try out the app, on my device and doing the signup later and

0:56:41 being able to offer that economically.

0:56:44 That is basically with those new technologies, that's no longer

0:56:47 an argument, so you can offer it.

0:56:49 So hopefully that will be a second order effect where software is way easier to

0:56:54 offer, where it's way easier to just try it out from an end user perspective.

0:56:59 But then also from the second point, from an application developer

0:57:04 perspective, I think it makes a huge difference in terms of complexity.

0:57:08 How, when you build something, whether it is just a local script

0:57:12 without any infrastructure, whether you can just run it, has no infra

0:57:16 dependencies, you can just run it, maybe you run like your Vite dev server.

0:57:22 And that's it.

0:57:22 It's self contained and you can move on.

0:57:25 There's like no Docker thing you need to start, et cetera.

0:57:29 That's like your starting point.

0:57:31 And if the barrier to entry there, if like, if that threshold is lower,

0:57:35 that you can build a fully functional thing just for yourself, just in that

0:57:39 local session, and you can get started this way, and if you then see like,

0:57:44 Oh, actually, there's a case here that I want to make this a multiplayer

0:57:48 experience or a multi tenant experience, then you can take that next step.

0:57:53 But right now, like, you can't really, leap ahead there.

0:57:56 You need to start from that multi tenant, that multi player experience,

0:58:00 and that makes the, the entry point already so much more tricky that many

0:58:04 projects are never getting started.

0:58:06 And I think both of those, I think can be second order effects and

0:58:10 improvements that local-first inspired architectures and software can provide.

0:58:16 So, I love those observations.

0:58:18 Yeah, yeah, totally.

0:58:20 And I mean, I think, for instance, with, it's interesting as well that a

0:58:23 lot of people do define their database schema using tools like Prisma, Drizzle,

0:58:29 like Effect Schema is a great example that obviously you're working on.

0:58:33 the more layers or indirection between where you're, say, iterating on the

0:58:37 user experience in the interface, and you want to be able to, say, customize

0:58:40 a data model to adapt to trying to sort of iterate there quickly.

0:58:44 But if you have to sort of go all the way into some other language, another

0:58:47 system, it just sort of takes you out of context and slows everything down.

0:58:50 So that's somehow the ability to like, yeah, apply that sort of schema into

0:58:54 the local database, not have to sort of work against these sort of different

0:58:59 legacy layers of the stack in order to actually be able to build out

0:59:03 The relation between Electric and PGlite

0:59:03 So going back to PGlite for a moment, how does PGlite and Electric, Electric

0:59:09 as a product and Electric as a company, how do those things fit together?

0:59:14 Yeah.

0:59:14 I mean, there basically are sort of two main products.

0:59:18 We have two products.

0:59:19 They're both open source, Apache licensed.

0:59:22 One is the Electric Sync Engine, and one is PGlite.

0:59:26 And so you can use them together, or you can just use them independently,

0:59:31 so it's not like the Electric system is designed only to sync into PGlite,

0:59:35 you don't have to have an embedded Postgres to use it Electric, and

0:59:38 you can use PGlite just standalone.

0:59:41 There's a range of different mechanisms to do things like data

0:59:44 loading, data persistence, et cetera, virtual file system layers,

0:59:48 loading in, unpacking Parquet files.

0:59:51 But if you do like have an application with this local database and you wanted to

0:59:56 then be able to sync that data with other users or into your Postgres database,

0:59:59 then Electric is just a great fit.

1:00:01 And obviously we make a kind of first class integration.

1:00:04 So I think for us, I mean, as a, as a company, as a startup, Electric is the

1:00:09 main product that we aim to build the business around, because in a way that

1:00:14 type of operational data infrastructure is just slightly more natural to build

1:00:18 a commercial offering around, like you have to run servers to move the data

1:00:21 around, we can do that efficiently, it sort of makes sense and adds value.

1:00:25 Whereas with PGlite as a open source embedded database, it's not

1:00:29 something that we're aiming to sort of monetize in quite the same way.

1:00:32 And potentially, maybe it could be upstreamed into Postgres, like, you know,

1:00:37 there should be a Wasm build to Postgres.

1:00:39 or, you know, maybe it kind of moves into a, a foundation and sort

1:00:42 of develops more governance, like certainly already with, PGlite.

1:00:47 So like Superbase, co sponsored one of the engineering roles with

1:00:50 us, there's been contributions from a whole bunch of companies.

1:00:53 So it is already a sort of, wide attempt in terms of the.

1:00:56 The stakeholders who are sort of stewarding the development of the project.

1:01:00 That is very cool to see.

1:01:01 I'm a big fan of those sort of like multi organizational approaches where you

1:01:06 share the effort of building something.

1:01:09 And, yeah, I love that.

1:01:11 I'm very excited to get my own hands on PGlite as well.

1:01:14 I'm mostly dealing with SQLite these days just because I think it is

1:01:18 still a tad faster for like, those single threaded embedded use cases.

1:01:23 But if you need the raw power of Postgres, which often you do, then

1:01:27 you can just run it in a worker thread and you get the full power of Postgres

1:01:31 in your local app, which is amazing.

1:01:34 So maybe rounding out this conversation on something you just touched on,

1:01:38 which is a potential commercial offering that Electric provides.

1:01:42 can you share more about that?

1:01:47 Electric commercial offering

1:01:47 Yep, so we're building, a cloud offering, which is basically

1:01:51 hosting the Electric sync service.

1:01:53 So like we, we, for instance, we don't host the Postgres database.

1:01:57 We don't host your application.

1:01:59 We just sort of host that kind of core sync layer, and then that can integrate

1:02:03 with other Postgres hosts like Superbase, Neon, et cetera, and kind of other

1:02:07 platforms for deploying applications.

1:02:09 that's our sort of first commercial offering.

1:02:12 And we of see that as like a almost sort of utility data infrastructure

1:02:17 play, where we've put a lot of effort in being able to run the software

1:02:22 very resource efficiently, and with sort of flat resource usage, so

1:02:26 it doesn't you know, scale up with memory with concurrent users, etc.

1:02:30 So we want to be able to run that very efficiently.

1:02:32 And so, we, we sort of see that that's kind of, low cost usage based pricing

1:02:36 based basically on the sort of data flows running through the software.

1:02:39 I think, you know, monetizing open source software is quite a sort of,

1:02:43 it's an interesting topic, but it's also sort of, there are a lot of,

1:02:45 common patterns that are well known.

1:02:47 And like, ultimately our aim as a company is, We want people building real

1:02:54 applications with this technology, and we want developers to enjoy doing it

1:02:58 and become advocates of the technology.

1:03:01 And then, there is a pathway when, imagine that you're a large company

1:03:05 and say you have like five projects and they're all using Electric sync.

1:03:09 It's very common for those sort of larger companies to need

1:03:12 additional tooling around that.

1:03:13 So governance, compliance, data locality.

1:03:17 There's a whole bunch of sort of considerations there.

1:03:19 So, it's quite common to be able to build out a sort of enterprise offering

1:03:22 on top of the core open source product.

1:03:25 And so, you know, there are various routes like that, that we

1:03:27 could choose to pursue in future.

1:03:29 and maybe that's how it plays out as we build a cloud, we focus on, making

1:03:33 this sync engine and these components bulletproof, make sure people are being

1:03:37 successful building applications on them.

1:03:39 And then we can look at maybe some sort of, value added tooling to help you

1:03:42 operate them successfully at scale, or help you operate them within sort of

1:03:48 Outro

1:03:49 That makes a lot of sense.

1:03:50 Great.

1:03:51 James, is there anything that you would want from the audience?

1:03:55 Anything that you want to leave them with?

1:03:57 anything to give a try over the next weekend?

1:03:59 The holidays are upon us.

1:04:01 what should people take a look at?

1:04:03 Yeah, I know that, You may be listening to this at any time in

1:04:05 future, but, we're recording this in the lead up to kind of December.

1:04:09 So if you have some time to experiment with tech over the holiday period,

1:04:12 just take a look at Electric.

1:04:14 you know, it's ready for production use.

1:04:16 It's well documented.

1:04:17 There's a whole bunch of example applications.

1:04:19 So there's a lot that you can of get stuck into there.

1:04:21 So please do come along and check it like our website is electric-sql.com.

1:04:26 we have a Discord community.

1:04:28 There's about 2000 developers in there.

1:04:30 So that's linked from the site.

1:04:32 we're on GitHub at, Electric SQL.

1:04:34 so you can see the Electric and the PGlite repos there.

1:04:37 and so those are the kind of the main things.

1:04:39 And if you're interested, for instance, in building applications, we already

1:04:43 have a wait list for the new cloud service, and we're starting now to

1:04:46 work with, some companies to help manually onboard them onto the cloud.

1:04:50 So if a cloud offering for hosted Electric is important, let us know,

1:04:54 and there's a pathway there to work with us if you're interested in being

1:04:57 an early adopter of the cloud product.

1:04:59 But also just, we spend a whole bunch of time talking to teams

1:05:02 and people trying to use Electric.

1:05:04 So our whole goal as a company is to help people be successful building on this.

1:05:09 And so if you've got questions about.

1:05:11 how best to approach it, challenges with certain application architecture.

1:05:14 We're very happy to hop onto a call and chat stuff through.

1:05:16 So if you come into the Discord channel, say hi and just ask any questions, and

1:05:21 we're happy to help as much as we can.

1:05:22 That sounds great.

1:05:24 Well, I can certainly plus one that anyone who I've interacted with from your

1:05:28 company has been A, very helpful and B, very, very pleasant to interact with.

1:05:34 And also at this point, a big thank you to Electric, not just for building what

1:05:38 you're building, but also for supporting me and helping me build LiveStore.

1:05:42 You've been sponsoring the project for a little while as well, which I really

1:05:46 much appreciate, and there's actually a really cool Electric LiveStore syncing

1:05:51 integration on the horizon as well.

1:05:53 That might be, some potential topic for a future episode, but I think with

1:05:58 that, now we've covered a lot of ground.

1:06:00 James, thank you so much for coming on the podcast, sharing a lot of knowledge

1:06:05 about Electric and about PGlite.

1:06:07 thank you so much.

1:06:08 Yeah.

1:06:09 Thanks for having me.

1:06:10 Thank you for listening to the Local First FM podcast.

1:06:13 If you've enjoyed this episode and haven't done so already, please

1:06:16 subscribe and leave a review.

1:06:18 Please also share this episode with your friends and colleagues.

1:06:21 Spreading the word about this podcast is a great way to support

1:06:24 it and help me keep it going.

1:06:26 A special thanks again to Rosicorp and PowerSync for supporting this podcast.