localfirst.fm
All episodes
December 3, 2024

#18 – James Arthur: ElectricSQL, read-path syncing, PGLite

#18 – James Arthur: ElectricSQL, read-path syncing, PGLite
Sponsored byPowerSyncRocicorp
Show notes

Transcript

0:00:00 Intro
0:00:00 I mean, another thing is like the operational characteristics of the
0:00:02 system, for this type of sync technology.
0:00:05 So comparing HTTP with WebSockets, like WebSockets are stateful, and
0:00:09 you do just keep things in memory.
0:00:11 If you look across most real time systems, They have scalability limits because
0:00:16 you will come to the point where if you have, say, 10, 000 concurrent users,
0:00:19 it's almost like the thing of don't have too many open Postgres connections.
0:00:22 But if you're holding open 10, 000 WebSockets, you may be able to do the
0:00:26 IO efficiently, but you will ultimately be sort of growing that kind of memory
0:00:30 and you'll hit some sort of barrier.
0:00:31 Whereas, with this approach, you can basically offload that
0:00:34 concurrency to the CDN layer.
0:00:37 Welcome to the localfirst.fm podcast.
0:00:39 I'm your host, Johannes Schickling, and I'm a web developer, a
0:00:42 startup founder, and love the craft of software engineering.
0:00:46 For the past few years, I've been on a journey to build a modern, high quality
0:00:50 music app using web technologies.
0:00:52 And in doing so, I've been falling down the rabbit hole of local-first software.
0:00:56 This podcast is your invitation to join me on that journey.
0:01:00 In this episode, I'm speaking to James Arthur.
0:01:03 Founder and CEO of Electric SQL, a Postgres centric sync
0:01:07 engine for local-first apps.
0:01:09 In this conversation, we dive deep into how Electric works and explore
0:01:14 its design decisions, such as read path syncing and using HTTP as a
0:01:18 network layer to improve scalability.
0:01:21 Towards the end, we're also covering PGLite, a new project by Electric
0:01:26 that brings Postgres to Wasm.
0:01:28 Before getting started, a big thank you to Rocicorp and PowerSync
0:01:32 for supporting this podcast.
0:01:34 And now, my interview with James.
0:01:37 Welcome James.
0:01:37 So good to have you on the podcast.
0:01:39 How are you doing?
0:01:40 Great.
0:01:41 Yeah, really good to be here.
0:01:42 Thank you for having me on.
0:01:43 So the two of us know each other for quite a while already.
0:01:47 And to be transparent, the two of us have actually already had quite
0:01:51 a couple of projects together.
0:01:53 The one big one among them is the first Local-First Conference that we
0:01:57 organized together this year in Berlin.
0:01:59 That was a lot of fun.
0:02:00 But for those in the audience who don't know who you are, would
0:02:05 you mind introducing yourself?
0:02:07 So, my name is James Arthur.
0:02:09 I am the CEO and one of the co-founder of Electric SQL.
0:02:14 So, Electric is a Postgres sync engine.
0:02:18 We sync little subsets of data out of Postgres into wherever you
0:02:24 want, like local apps and services.
0:02:26 and we do also have another, project which we developed called PGlite,
0:02:30 which is a lightweight WASM Postgres.
0:02:33 So we can sync out of Postgres in the cloud, into Postgres in the web browser,
0:02:39 or kind of into whatever you want.
0:02:40 Awesome.
0:02:41 So yeah, I want to learn a lot more about Electric as well as PGlite.
0:02:45 Maybe PGlite a little bit towards the end of this conversation.
0:02:49 So Electric, I've seen it a bunch of times.
0:02:53 I've been playing around with it, I think quite a bit last year, but
0:02:59 things seem to also change quite a bit.
0:03:01 Can you walk me through?
0:03:03 What was the history of like the last couple of years as you've been
0:03:08 working on Electric and help me inform the right mental model about
0:03:15 Electric SQL
0:03:15 Yeah, absolutely.
0:03:16 I think like Electric as a project, it started, in a way, building on a bunch
0:03:24 of research advances in distributed systems, CRDTs, transactional calls of
0:03:29 consistency, a bunch of these primitives that a lot of people are building
0:03:33 off in the local-first space, which actually a bunch of people on our team
0:03:38 developed in the kind of research stage.
0:03:41 And we wanted to create a developer tooling and a platform that allowed people
0:03:48 who weren't experts in distributed systems and didn't have PhDs in CRDTs to be able
0:03:53 to harness the same advances and build systems on the same types of guarantees.
0:03:58 So in a way, that's where we started from.
0:04:00 And we started building out on this research base into stronger consistency
0:04:05 models for distributed databases and doing sync, from like a central
0:04:11 cloud database out into whether it's to the edge or to the client.
0:04:16 And then we're a startup.
0:04:17 So like we built a small team and you go through this journey, building a
0:04:21 company of, you have ideas for what's going to be useful and valuable for
0:04:26 people, and you have a sense of sort of where the state of the art is and, what
0:04:29 doesn't exist yet, but as you then go and experiment, you just learn more and more.
0:04:33 And so you work out actually what people need and what
0:04:36 problems you can solve with it.
0:04:38 and so through that journey, we went from starting off thinking we were building
0:04:42 a next generation distributed database to using the replication technology
0:04:48 for that system behind existing open source databases like Postgres, SQLite,
0:04:53 into finding, local-first software as a pattern is really the killer app for
0:04:58 that type of replication technology.
0:05:00 So people looking to build local-first applications because of all of the
0:05:04 benefits around UX, DX, resilience, et cetera, but to do that, you
0:05:09 need this type of sync layer.
0:05:11 and then when we first focused on that, then we tried to build a
0:05:15 very optimal end to end integrated local-first software platform.
0:05:19 So for instance, if people saw Electric as a project, like this time last
0:05:23 year, that's what we were building.
0:05:25 And in a way we just found that we were having to solve too many problems and
0:05:30 there was too much complexity making a kind of optimal one-size-fits-all sort of
0:05:34 magic active active replication system.
0:05:37 We were doing things like, managing the way you did the database migrations
0:05:40 and schema revolution and generating a type safe client and doing the
0:05:44 client side reactivity as well as all this sort of core sync stuff.
0:05:47 So, as you know, there's a lot to that kind of end to end stack.
0:05:51 Because we had wanted to build a system that integrated with people's
0:05:55 existing software, like if you already had software built on Postgres or if
0:05:59 you already had a working stack, like building that sort of full system was
0:06:05 in a way sort of too complex and was difficult to adopt from existing software.
0:06:10 So more recently we have consolidated down on building a much simpler sync engine,
0:06:17 which is more like a composable tool that.
0:06:20 You can run in front of Postgres, any Postgres.
0:06:23 It works with any standard Postgres, any managed Postgres, any data model, any
0:06:28 data types, any extensions that you have.
0:06:30 And it just does this work of basically consuming the logical
0:06:35 replication stream from Postgres.
0:06:37 and then managing the way that the data is fanned out to clients,
0:06:41 doing partial replication.
0:06:42 So, because when you're syncing out, say, if you have
0:06:44 a larger database in the cloud.
0:06:47 And you're syncing out to like an app or a kind of edge service.
0:06:49 You don't want to sync all the data.
0:06:51 We have this sort of model of partial replication.
0:06:54 And basically what we're aiming to do with the sync engine is just make that, as
0:06:58 simple to use as bulletproof as possible.
0:07:01 And we're making it with standard web technologies that make it easy
0:07:06 to use with your existing systems and with your existing stack.
0:07:09 And so we went in a way from this sort of quite ambitious, tightly integrated
0:07:13 end to end local-first software platform to now building more like composable
0:07:18 tools that can be part of a local-first stack that you would assemble yourself
0:07:22 as a developer, that's designed to be.
0:07:25 Easier to adopt for production applications that work
0:07:28 with your existing code.
0:07:29 That makes a lot of sense.
0:07:31 And that definitely resonates with me personally as well, since maybe,
0:07:34 as you know, before I founded Prisma, Prisma actually came as a pivot out of
0:07:40 like a focusing effort from a previous product that was called GraphQL,
0:07:44 which was meant as a more ambitious next generation backend as a service.
0:07:48 Back then there was like Firebase and Parse and so we wanted to build the
0:07:52 next generation of that, but what we found back then in 2016, that, while
0:07:57 we've been making a lot of progress towards that very ambitious, holistic
0:08:02 vision, we had to basically oil, like, multiple oceans all at the same time.
0:08:06 And that takes a lot of time to fully get to all the different
0:08:10 ambitious things that we wanted to.
0:08:12 So the only way forward for us where we felt like, okay, we can actually
0:08:16 serve the kind of use cases that we want to serve in a realistic timeline
0:08:21 was to focus on a particular problem, which is what Prisma eventually became.
0:08:26 And by focusing just on the database tooling part and leaving the other
0:08:31 back-endy things to other people.
0:08:32 And it sounds like what you've been going through with Electric is a very comparable
0:08:36 exercise, like focusing exercise to trying to, from a starting point of
0:08:41 like, let's build the most ambitious, the best local-first stack, like end to
0:08:46 end by focusing more on like, okay, what we figured out where our expertise is,
0:08:52 is around Postgres, is about, existing applications wanting to adopt local-first
0:08:58 ideas, syncing approaches, et cetera.
0:09:01 And that is what now led to the new version of Electric.
0:09:04 did I summarize that correctly?
0:09:06 Yeah, exactly.
0:09:07 Right.
0:09:07 It sounds like a very similar journey.
0:09:09 And I think it's interesting as well that as you focus in and you learn
0:09:13 more about a problem space, you both discover in a way, more of the
0:09:17 complexity in the sort of aspects of it.
0:09:19 So you realize there's actually more challenges to solve in a smaller sort
0:09:23 of part of it or a smaller scope.
0:09:26 And also it's interesting that I think for instance, when we started the
0:09:29 project, I would have thought coming into this as a software developer,
0:09:32 I'd go, Is a read path sync solved?
0:09:34 I'd be like, well, there's quite a lot of read path kind of sync stuff.
0:09:37 You can kind of do this.
0:09:38 There's various real time solutions, but actually as you dig into it, you find
0:09:42 that there's a whole bunch of weaknesses of those solutions and they're actually
0:09:45 hard to adopt or they have silos or they can't handle the data throughput.
0:09:48 And so you realize that actually you don't necessarily need to bite
0:09:53 off all of the more ambitious scope because actually you can deliver
0:09:57 value by doing something simpler.
0:10:00 And I think also for me personally, learning about stewarding this
0:10:03 type of product, understanding that you can build out still towards
0:10:08 that more ambitious objective.
0:10:09 So in the long run, you know, we want to sort of build back a whole bunch
0:10:12 of capabilities into this platform.
0:10:14 probably a sort of loosely coupled kind of composable tools.
0:10:18 So you mentioned the term read path syncing.
0:10:21 Can you elaborate a little bit what that means?
0:10:24 So let's say I have an existing application.
0:10:26 Let's say I've built an API layer at some point.
0:10:29 I have a React front end and I have all of my data sitting in Postgres.
0:10:34 I've been inspired by products such as Linear, et cetera, who seem to
0:10:38 wield a superpower called syncing.
0:10:40 And now I found ElectricSQL, which seems to connect the ingredients
0:10:45 that I already have, such as Postgres and a front end with my
0:10:50 desirable approach, which is syncing.
0:10:52 So how does Electric fit into that?
0:10:55 And what do you mean by.
0:10:59 Read and Write Path Syncing
0:10:59 Yeah.
0:10:59 I mean, the sort of read path and write path when it comes to
0:11:02 sync, the read path is syncing data, like onto the local device.
0:11:06 So it's a bit like kind of data fetching from the server.
0:11:09 And then the write path would be when like a user makes a write, and then
0:11:12 you want to sync that data typically back to the cloud so that's sort
0:11:16 of how we talk about them there.
0:11:19 I think there's something unique about local-first software compared to
0:11:25 more sort of traditional web service systems where you explicitly have
0:11:31 a local copy of the data on device.
0:11:34 And one of the challenges with that is because of course you can just like load
0:11:39 some data from the server and keep it in a cache, but if you do that Then you
0:11:44 immediately actually lose, any information about whether that data is stale.
0:11:49 So say a user goes to a route on your application and then clicks
0:11:54 to go to another route and then comes back to the original one.
0:11:57 So to load that original route, say you did a data fetch, but
0:12:01 now you've navigated back to it.
0:12:02 Can you display that data?
0:12:04 Can you render the route or is the data stale?
0:12:08 And so you have this sort of thing where I don't really know, and you tend to sort
0:12:12 of build systems with like REST APIs and data fetching where you might show the
0:12:15 data and go and try and fetch new data.
0:12:17 but in a way it's that problem of you want the data locally so that your application
0:12:23 code can just talk to it locally and you're not having to code across the
0:12:26 network with local-first software.
0:12:28 But that means that you need a solution to keep the data that is local fresh.
0:12:33 Like you don't want stale data.
0:12:35 And if you build a sort of ad-hoc system.
0:12:38 As we've all done across like many generations of software applications,
0:12:41 it's one of these things where you always end up kind of building some sort
0:12:44 of system to keep the data up to date.
0:12:46 But what you really want is a kind of properly engineered system
0:12:49 that does it systemically for you.
0:12:51 It is really a sort of an aspect of your applications architecture that kind of
0:12:56 can be abstracted away by a sync engine.
0:12:58 And so for us, for this focusing on the read path sync is about saying,
0:13:02 okay, what data should be on the device and let's just keep it.
0:13:06 fresh for you.
0:13:07 And then with the write path, one of the things that we learned through
0:13:11 the project is that there are a lot of valid patterns for handling how, when
0:13:17 you do local writes on the device, how you would get those back to the cloud.
0:13:22 You can do through the database sync, you can do optimistic writes.
0:13:26 You could be happy with online writes and you have different models of
0:13:30 like, can your writes be rejected?
0:13:32 Are they local writes with finality?
0:13:34 Or do you have a server authoritative system where when the write
0:13:37 somehow syncs, it can be rejected and how do you handle that?
0:13:40 And so there's actually a lot of different patterns for those writes,
0:13:43 which are often relatively simple because different applications can
0:13:48 be happy with certain trade offs and you could pick a model like.
0:13:51 Okay.
0:13:51 I'm going to show some optimistic state and make a request to an API server.
0:13:56 And it's fine.
0:13:57 And you get a kind of, you get a local-first, experience with just a
0:14:00 sort of simple model that says, okay, if the write is rejected when it
0:14:03 syncs, then, I'll just sort of roll it back and the user loses that work.
0:14:07 And for many applications, that's fine.
0:14:09 For other applications, you might have a much more complex conflict resolution or
0:14:13 you're trying not to lose local writes and there's different collaborative workloads.
0:14:16 And so.
0:14:17 Building a generic system that can give you a write path that gives you
0:14:21 the best developer experience and user experience for all of those variety of
0:14:25 scenarios is very, very hard, whereas building it on an application by
0:14:28 application basis on the write path is actually often fairly straightforward.
0:14:32 It can be like post your API and use the React use optimistic hook.
0:14:37 And so, with building local-first applications that have both read and
0:14:40 write path with Electric, the idea is that we do this core read path
0:14:45 with partial replication, but then as you're building your application, you
0:14:49 can choose out of a variety, whichever pattern fits your, what you need the
0:14:53 most for sort of how you would choose to get the writes back into the server.
0:14:57 That makes a lot of sense.
0:14:58 So basically the more general purpose.
0:15:01 building block that can be used across a wide range of different applications.
0:15:05 It's actually how you read data, how you distribute the data that you
0:15:09 want to have locally available in your applications that would kind of
0:15:13 replace the API get requests before.
0:15:17 But now what needs to happen in those Put, post, delete requests,
0:15:21 this is where it depends a lot more.
0:15:24 And this is where you basically, what you're arguing is there are different
0:15:28 sort of write patterns that heavily depends on the kind of application.
0:15:32 So that is where you're kind of leaning out.
0:15:34 And previously with Electric, you tried to provide the silver bullet there.
0:15:39 But actually, it's really hard, maybe impossible to find the silver
0:15:43 bullet that applies to all use cases.
0:15:45 However, for the read path, it is very possible to provide a great building
0:15:50 block that works for many use cases.
0:15:52 So, can you provide a bit of a better spectrum of the different write
0:15:56 patterns that you've seen so far?
0:15:58 Maybe map them to canonical applications?
0:16:02 that illustrate those use cases.
0:16:04 And maybe if you know, maybe you can also compare analogies to something
0:16:08 like Automerge, et cetera, which sort of write patterns that would
0:16:14 Read Path use cases
0:16:14 Yeah.
0:16:15 So I think the simplest pattern for writes with an application would be to
0:16:19 just, for instance, send a write to a server and require you to be online.
0:16:24 So, because there's many applications that are happy, for instance, with read
0:16:27 only, like there's a lot of people who are building, data analytics applications,
0:16:31 data visualization, dashboards, et cetera.
0:16:33 And so if you have a sort of read heavy application, then in some cases
0:16:37 it may just be a perfectly valid trade off, not to really deal with the
0:16:40 complexity of say offline writes at all.
0:16:42 But you still have a lot of benefits by having local data on device for the read
0:16:46 path, because all the way you can kind of explore the application and the data is
0:16:50 all just instant and local and resilient, then the sort of simplest pattern to
0:16:56 layer on, support for offline writes.
0:16:59 On top of that as a sort of starting point where imagine that you have like a
0:17:03 standard REST API and you're just doing put and post requests to it as normal is
0:17:08 to add this concept of optimistic state.
0:17:10 So optimistic state is just basically you're saying, okay, I'm going to go and
0:17:14 try and send this write to the API server.
0:17:16 And whilst I do so, I'm going to be optimistic and imagine that
0:17:20 that write is going to succeed.
0:17:22 And in two seconds later, it's going to sync back into the state that I have here.
0:17:25 But in the meantime, I'm going to Add this bit of local optimistic state to
0:17:30 display it immediately to the user, and because in most cases that of happy path
0:17:34 is what happens, then you end up with what just feels like a perfect local-first
0:17:39 experience because it's an instantly displayed local write, and that sort
0:17:43 of data is resolved in the background.
0:17:45 Now, You know, immediately with that, you do then just introduce like a layer
0:17:49 of complexity with like, well, what happens when the write is rejected?
0:17:54 And so you have both the challenge of, for instance, say you stacked up three writes.
0:18:01 Did they depend on each other?
0:18:03 So if one of them is rejected, should you reject all of them?
0:18:06 and different applications and different parts of the application would have
0:18:09 different answers to that question.
0:18:11 In some cases, like it's very simple to just go, if there's any problem with
0:18:14 this optimistic state, just wipe it.
0:18:16 And for instance, like the React use optimistic hook, like its approach is just
0:18:20 like, it waits for a promise to resolve.
0:18:22 And when the promise resolves, it wipes the optimistic state.
0:18:25 And so it's very much just like, if anything happens at all,
0:18:28 it's like, And so it's only.
0:18:30 Interestingly enough, there's also a lot of people coming from React Query and so
0:18:35 on, from those sort of more traditional front end state management things.
0:18:40 and that brings them to local-first in the first place, because they're like
0:18:44 layering optimistic, one optimistic state handler on top of the next one.
0:18:49 And if there's a little flaw inside of there, everything collapses
0:18:53 since you don't really know have principled way to reason about things.
0:18:57 So that makes a lot of sense.
0:18:59 Exactly right.
0:19:00 And so like a framework like TanStack, for instance, with TanStack query, it has like
0:19:05 slightly more sophisticated optimistic state primitives than just say the kind
0:19:10 of a primitive use of optimistic hook.
0:19:12 And one of the thing, one of the challenges that you have is that for
0:19:15 say, a simple approach to, to just using optimistic state to display an immediate
0:19:20 write is like, is that optimistic state global to your application?
0:19:24 Shared between components?
0:19:25 Is it scoped within the component?
0:19:27 And so, as you say, like there's an approach where you could come along
0:19:30 and say, okay, I've got three or four different components and so far I've
0:19:33 just been able to sort of render the optimistic state within the component.
0:19:37 But now I've got two components that are actually displaying the same information.
0:19:40 And suddenly I've got like stale data.
0:19:42 It's like the old days of manual DOM manipulation and you forgot
0:19:45 to update a state variable.
0:19:47 And so.
0:19:48 Yeah, in a way that's where you come to a more proper local-first solution
0:19:53 where your optimistic state would be, stored in some sort of shared store.
0:19:58 So it could just be like a JavaScript object store, or it
0:20:01 could be an embedded database.
0:20:03 And so you get a slightly more sophisticated models of
0:20:07 managing optimistic state.
0:20:08 And the great thing is there are, like TanStack Query and others, there's
0:20:11 like, there's a bunch of existing client side frameworks that can handle
0:20:14 that kind of management for you.
0:20:17 Once you go, for instance, like to an embedded database for the state.
0:20:21 So one of the kind of really nice, points in the design space for this is to have a
0:20:27 model where you sync data onto the device and you treat that data as immutable.
0:20:32 And then you can have, for instance, so, so say, for instance, you're syncing a
0:20:37 database table, say it's like a log viewer application, and you're just syncing the
0:20:41 logs in, and it goes into a logs table.
0:20:44 Now, say the user can interact with the logs and delete them,
0:20:47 or change the categorization.
0:20:49 And so you can have a shadow logs table, which is where you would
0:20:52 save the local optimistic state.
0:20:54 And then.
0:20:55 You can do a bunch of different techniques to, for example, create a view or a live
0:20:59 query where you combine those two on read.
0:21:02 So the application just sort of feels like it's interacting with the table,
0:21:05 but actually it's split in the storage layer into a mutable table for the sync
0:21:09 state and a kind of local mutable table.
0:21:12 And the great thing about that is you can have persistence for the, both the
0:21:15 sync state and the, local mutable state.
0:21:18 And of course it can be shared.
0:21:19 So you can have multiple components, which are all sorts of just going
0:21:22 through that unified data store.
0:21:24 and there's some nice stuff that you can do in SQL world, for instance, to use
0:21:27 like instead of triggers to combine it.
0:21:29 So it just feels like you're working with a single table.
0:21:32 Now it's a little bit additional complexity on something like defining
0:21:35 a client side data model, but what it gives you is it gives you a
0:21:39 very solid model to reason about.
0:21:42 So like, You can go, okay, basically the sync state is always golden.
0:21:46 It's immutable.
0:21:46 Whenever it syncs in, it's correct.
0:21:48 If I have a problem with this local state, that's just, that's like mutable stuff.
0:21:53 Worst case, I can get rid of it, or I can develop more sophisticated strategies for
0:21:57 dealing with rollbacks and edge cases.
0:22:00 So it in a way it can give you a nice developer experience.
0:22:04 with that model, you could choose then whether your writes are, whether you're
0:22:08 writing to the database, detecting changes, and then sending those to
0:22:11 some sort of like replication ingest point, or whether you're still just
0:22:15 basically talking to an API and writing the local optimistic state separately.
0:22:21 So, so at that point you can have, again, you can have, you have this
0:22:24 fundamental model of like, Are you writing directly to the database and
0:22:27 all the syncing happens magically?
0:22:29 Or are you just using that database as a sort of unified, local optimistic store?
0:22:34 So this is the sort of type of like progression of patterns.
0:22:36 And once you start to go through something where you would, for instance, have a
0:22:42 synced state that is mutable, or you are writing directly to the database,
0:22:46 that's really where you start to get a little bit more into the world of like
0:22:49 convergence logic and kind of merge logic and CRDTs and sort of what's commonly
0:22:54 understood as proper local-first systems.
0:22:57 And I think that's the point where almost the complexity of those
0:22:59 systems does become very real.
0:23:01 Like, as you well know, from building LiveStore and as we see from the
0:23:04 kind of, quality of libraries like AutoMerge, Yjs, et cetera.
0:23:08 so that's probably where as a developer, it makes sense to reach for a framework.
0:23:12 And you certainly could reach for a framework for that sort of like.
0:23:15 Combine on read, sync, sync into a mutable kind of persist local mutable state.
0:23:21 But what we find is that it is actually if you want to, it's actually
0:23:25 relatively straightforward to develop yourself, you can reason about it
0:23:28 fairly simply, and so it's not too much extra work to just basically go
0:23:32 as long as you've got that read sync primitive, you can build like a kind of
0:23:36 proper locally persistent, consistent local-first app yourself, basically.
0:23:42 Just using fairly standard front end primitives.
0:23:44 Right.
0:23:45 Okay.
0:23:46 Maybe sharing a few reflections on this, since I like the way how you,
0:23:50 portrayed this sort of spectrum of this different kind of write patterns.
0:23:54 in a interview that I did with Matthew Weidner, I learned a lot there
0:23:58 about the way, how he thinks about different categorizations of like state
0:24:02 management, and particularly when it comes to distributed synchronization.
0:24:07 and I think one pattern that got clear there was that there's either you're
0:24:12 working directly manipulating the state, which is what like Automerge, et
0:24:16 cetera, are de facto doing for how you as a developer interact with the state.
0:24:21 So you have like a document and you manipulate it directly.
0:24:25 You could also apply the same logic of like, you have a Database table, for
0:24:30 example, that's how CR SQLite works, where you have a SQLite table and you
0:24:35 manipulate a row directly and that is being synchronized as the state and
0:24:41 you're ideally modeling this with a way where the state itself converges and
0:24:46 through some mechanisms, typically CRDTs.
0:24:49 But then there's another approach, which might feel a little bit more
0:24:53 work, but it can actually be concealed quite nicely by systems, for example,
0:24:58 like LiveStore, in this case, unbiased, and where you basically separate
0:25:02 out the reads from the writes.
0:25:05 And often enough, you can actually fully, re compute your
0:25:10 read model from the write model.
0:25:12 So, if you then basically express everything that has happened, that
0:25:16 has meaningfully happened for your application as a log of events.
0:25:20 Then you can often kind of like how Redux used to work or still works, you can
0:25:24 fully recompute your view, your read model from all the writes that have happened.
0:25:29 And I think that would work actually really, really well together in tandem
0:25:33 with Electric, where if you're replicating what has happened in your Postgres
0:25:39 database as like a log of historic events, then you can actually fully, recreate
0:25:45 Whatever derived state you're interested in and what is really interesting about
0:25:49 that approach, but that particular write pattern is that it's a lot easier to
0:25:54 model that and reason about that locally.
0:25:57 Did you say like, Hey, I got those events from the server, those
0:26:00 events, I am applying optimistically.
0:26:03 You can encode sort of even a causal order that doesn't really, If someone
0:26:09 is like confused about what does causal order mean, don't worry about it.
0:26:13 Like you can probably at the beginning, keep it simple, but once you layer
0:26:18 on like more and more dependent, optimistic state transitions, this is
0:26:22 where you want to have the information.
0:26:25 Okay.
0:26:25 If I'm doing that, and then the other thing depends on that, that's basically a
0:26:29 causal order and modeling that as events.
0:26:32 I think is a lot simpler and is a way to, to deal with that monstrosity of like,
0:26:38 losing control over your optimistic state.
0:26:41 Since I think one thing that's, that makes optimistic state management
0:26:44 even more tricky is that, like, how are things dependent on each other?
0:26:50 And then also like, when is it assumed to be good.
0:26:54 I think in a world where you use Electric, once you're from the
0:26:57 Electrics server, you've got sort of confirmation, like, Hey, those
0:27:01 things have now happened for real.
0:27:02 You can trust it.
0:27:04 but there's like some latency in between, and the latency might be
0:27:07 increased by many, many factors.
0:27:10 One way could be that you just, you are on a like slow connection or the server
0:27:15 is particularly far away from you and might take a hundred milliseconds, but
0:27:19 another one might be your have a spotty connection and like packages get lost and
0:27:25 it takes a lot longer or you're offline and being offline is just like a form
0:27:30 of like a very high latency form and so all of that, like if you're offline,
0:27:36 if it takes a long long time, and maybe you close your laptop, you reopen it.
0:27:41 Is the optimistic state still there?
0:27:43 Is it actually locally persisted?
0:27:45 So there are many, many more layers that make that more tricky.
0:27:49 But I like the way how you're like, how you split this up into the read
0:27:54 concerns and the write concerns.
0:27:56 And I think this way, it's also very easy to get started with new
0:28:00 apps that might be more read heavy and are based on existing data.
0:28:05 I think this is a very attractive trade off that you say like, Hey, with
0:28:09 that, I can just sink in my existing data and then step by step, depending
0:28:14 on what I need, if I need it at all.
0:28:16 Many apps don't even need to do writes at all, and then you
0:28:19 can just get started easily.
0:28:21 Yeah, I think, I mean, that's explicitly a design goal for us is like, yeah,
0:28:25 if you start off with an existing application and maybe it's using REST
0:28:29 APIs or GraphQL, it's like, well, what do you do to start to move that
0:28:32 towards a local-first architecture?
0:28:34 And exactly, you could just go, okay, well, just, let's just leave the way
0:28:37 that we do writes the same as it is.
0:28:39 And let's move to this model of like syncing in the data
0:28:41 instead of fetching the data.
0:28:43 And that can just be a first step.
0:28:45 And I think, I mean, Across all of these techniques for writes, there
0:28:48 is just something fundamental about keeping the history or the log
0:28:52 around as long as you need it, and then somehow materializing values.
0:28:58 So sort of internally, this is what a CRDT does, right?
0:29:01 it's clever and has a sort of lattice structure for the history, but basically
0:29:05 it keeps the information and allows you to materialize out a value.
0:29:09 if you just have like an event log of writes.
0:29:11 So as you were saying with, with LiveStore, when you have like a
0:29:14 record of all the write operations, you can just process that log.
0:29:17 so I think, you know, you can do it sort of within a data type.
0:29:21 And I think that fits as well for greenfield application where you're trying
0:29:25 to craft, kind of real time or kind of collaboration and concurrency semantics,
0:29:29 but like from our side of coming at it, from the point of saying, right, when
0:29:32 you've got applications that build on Postgres, you already have a data model.
0:29:35 You just sort of layer the same kind of history approach on top by like, keeping
0:29:39 a record of the local writes until you of sure you can compact them and actually
0:29:44 that same principle is exactly how the read path sync works with Electric.
0:29:49 So Postgres logical replication, it just basically, it emits a stream, it's like
0:29:56 transactions that contain write operations and it's basically inserts, updates,
0:30:00 and deletes with a bit of metadata.
0:30:02 And so we end up consuming that and basically writing
0:30:06 out what we call shape logs.
0:30:07 So we have a primitive called a shape, which is how we control the partial
0:30:10 replication, like which data goes to which client and a client can define multiple
0:30:14 shapes, and then you stream them out.
0:30:16 But that shape log comes through our replication protocol as just that
0:30:21 stream of logical update operations.
0:30:23 And so in the client, you can just, you can materialize the data immediately.
0:30:28 So like we provide, for instance, a shape stream primitive in a JavaScript client
0:30:32 that just omits the series of events.
0:30:34 And then we have a shape, which we'll just take care of materializing that
0:30:37 into a kind of map value for you.
0:30:39 but you could do what you want, whatever you wanted with that stream of events.
0:30:42 So if you found that you wanted to keep around a certain history of the
0:30:46 log in order to be able to reconcile some sort of causal dependencies,
0:30:49 that's just totally up to you.
0:30:51 And so, yeah, it's quite interesting that it's almost just the same approach,
0:30:54 which is the general sort of principle for handling concurrency on the
0:30:58 write path is also just exactly what we've ended up consolidating down on
0:31:02 exposing through the read path stream.
0:31:04 That makes a lot of sense.
0:31:05 So, Let's maybe go a little bit more high level.
0:31:08 Again, for the past couple of minutes, we've been talking a lot about like how
0:31:12 Electric happens to work under the hood.
0:31:14 And there's many commonalities with other technologies and
0:31:17 all the way to CRDTs as well.
0:31:19 But going back a little bit towards the perspective of someone who would
0:31:23 be using Electric and build something with Electric and doesn't maybe
0:31:28 peel off all the layers yet, but get started with one of the easier off the
0:31:32 shelf options that Electric provides.
0:31:35 So my understanding is that you have your existing Postgres database.
0:31:40 you already have your like tables, your schema, et cetera, or if it's
0:31:44 a greenfield app, you can design that however you still want.
0:31:47 And then you have your Postgres database.
0:31:50 Electric is that infrastructure component that you put in front
0:31:53 of your Postgres database that has access to your Postgres database.
0:31:58 In fact, it has access to the replication stream of Postgres.
0:32:02 So it knows everything that's going on in that database.
0:32:05 And then your client is talking to the Electric sync engine to
0:32:10 sync in whatever data you need.
0:32:12 And the way that's expressed what your client actually needs is through
0:32:17 this concept that you call shapes.
0:32:19 And my understanding is that a shape basically defines a subset
0:32:23 of data, a subset of a table that you want in your client.
0:32:28 since often like tables are so huge and you just need a particular
0:32:32 subset for your given user, for your given document, whatever.
0:32:38 The role of Shapes
0:32:38 Yeah, that's just exactly how it works.
0:32:40 And.
0:32:41 the Electric Sync Engine it's a web service.
0:32:44 It's a Docker container, like technically it's an Elixir application.
0:32:47 And it just connects to your Postgres as a normal Postgres client would.
0:32:52 So you have to run your Postgres with logical replication enabled.
0:32:57 And then we just connect in over a database URL.
0:32:59 And so it's just as if you were like, imagine you're deploying a Heroku app,
0:33:03 and it's sort of Heroku Postgres, and it just provisions a database URL, and your
0:33:06 back end application can connect to it.
0:33:08 So it's the same way that a sort of Rails app would talk to, talk to Postgres.
0:33:12 And then Electric does some stuff internally to of route data into
0:33:16 these shape logs, which are the sort of logs of update operations for each
0:33:21 kind of unit of partial replication.
0:33:23 And then we actually just provide a HTTP API, which is quite key to a whole
0:33:28 bunch of the, affordances of the system.
0:33:31 So I can dive into that if it's interesting.
0:33:33 But then, yeah, you basically have a client, Which pulls data
0:33:37 by just making HTTP requests.
0:33:39 and so HTTP gives you back pressure and the client's in control of
0:33:44 which data it pulls when, and then how you process that stream.
0:33:48 Yeah, we do provide some primitives to make it simple.
0:33:51 Like we give you React hooks to just sort of bind a shape to a state variable,
0:33:55 but Basically, you can do what you like with the data as it streams it.
0:33:59 So, yeah, I would love to learn more about that design decision of choosing HTTP
0:34:03 for that network layer, for that API.
0:34:05 Since I think most people think about local-first, think about real time
0:34:10 syncing, et cetera, that reactivity.
0:34:13 And for most people, I think particularly in the web, the mind goes to web sockets.
0:34:17 So why HTTP?
0:34:19 Wouldn't that be very inefficient?
0:34:21 How does reactivity work?
0:34:23 Can you walk me through that?
0:34:25 Why using HTTP for network layer?
0:34:25 Yeah, so.
0:34:26 I mean, exactly.
0:34:27 We, went on that journey with the product where with the earlier, slightly more
0:34:30 ambitious Electric that I was describing, we built out a custom binary WebSocket
0:34:36 protocol to do the replication, and it's just what you sort of immediately
0:34:39 think you're like, let's make it efficient over the wire and obviously
0:34:41 it should be a WebSocket connection because you're just having these sorts
0:34:44 of ongoing data streams, but, So one of the things that happened with the,
0:34:48 focusing of the product strategy was that, Kyle Matthews joined the team.
0:34:52 So Kyle was actually the founder of Gatsby, which is like the React framework.
0:34:57 And through Gatsby, he did a lot of work around basically data
0:35:01 delivery into CDN infrastructure.
0:35:04 And so one of the insights that Kyle brought into the team was if
0:35:08 we re engineered the replication protocol on plain HTTP, and we just
0:35:13 do like plain HTTP, plain JSON.
0:35:16 And we replicate over an old fashioned long polling protocol.
0:35:20 So you just, basically we have a model where the client makes a request to a
0:35:24 shape endpoint, and then we just return the data that the server knows about.
0:35:28 So we'll sort of chunk it up sometimes over multiple requests, but it's
0:35:31 just a standard, like load and load a JSON in a document request.
0:35:35 And then once you get a message to say that the client is up to date
0:35:38 with the server, then you trigger into a long polling mode where basically
0:35:41 the server holds the connection open until any new data arrives.
0:35:45 and yes, you kind of think instinctively like, okay, it's say JSON instead of
0:35:50 binary, so it'll be less efficient and you're having to make these
0:35:52 sort of extra requests that surely they add latency over some sort of
0:35:56 more optimized, WebSocket protocol.
0:35:58 But the key thing is that by doing that, it allows us to deliver the data
0:36:02 through existing CDN infrastructure.
0:36:05 So those initial data loading requests, like typically when you're building
0:36:10 applications on this shape primitive, you can find ways of defining your shapes
0:36:15 so that they're shared across users.
0:36:16 You might have some unique data that's unique to a user, but Like say you have a
0:36:21 project management app and there's various users who are all in the same project,
0:36:24 you could choose to like sync the kind of project data down rather than just
0:36:28 sort of syncing all the user's data down.
0:36:30 And so that way you get shapes being shared across users.
0:36:33 And so the first user to request it hits the Electric service, we
0:36:37 generate these responses, but then they go through Cloudflare or Fastly
0:36:41 or CloudFront or what have you.
0:36:43 And every subsequent request is just served out of like
0:36:46 essentially Nginx or Varnish.
0:36:48 And so it's just super efficient.
0:36:50 All of this infrastructure is just like super battle tested
0:36:52 and as optimized as it can be.
0:36:54 That is very interesting.
0:36:56 It reminds me a little bit of like how modern bundlers, and I think even like
0:37:00 all the way back to Webpack, used to split up larger things into little chunks.
0:37:06 And those chunks would be content hashed.
0:37:08 And that would be then often, be cached by the browser across
0:37:12 different versions of the same app.
0:37:15 In this case, it would be beneficial to the individual user who would reload it.
0:37:20 And also of course, like to other people who visit this, but now you
0:37:24 take the same idea, even further and apply it to data shared across users
0:37:29 by applying the same infrastructure, HTTP servers, CDNs, et cetera, to make,
0:37:35 things cheaper and faster, I guess.
0:37:38 Well, and, and the local browser c or client cache as well.
0:37:41 So you have this sort of shared caching within a CDN layer where you
0:37:45 might have multiple clients, which are like, literally it's a sort of
0:37:48 shared cache in the HTTP cache control.
0:37:50 That makes a lot of sense.
0:37:50 Since like, on a website level, I'm not sure whether you
0:37:53 have clear caching semantics.
0:37:55 I don't think so.
0:37:57 Yeah, you'd have to do some very sort of custom stuff to
0:37:59 sort of achieve the same things.
0:38:01 but also because, so with the browser, when you're loading data, like HTTP
0:38:05 requests with the write cache headers can just be stored in the local file cache.
0:38:09 So one of the really nice things with just, like loading shape data
0:38:12 through the Electric API is you can achieve an offline capable app without
0:38:16 even having to implement any kind of local persistence for the data
0:38:20 that's loaded into the file cache.
0:38:23 So that sort of model, if like say you've gone to a page and you've just
0:38:26 loaded the data through Electric, even if you didn't store the data, if you
0:38:30 navigate back to the same page, the data's just there out of the file cache.
0:38:34 So the application can work offline without even having
0:38:37 any kind of persistence.
0:38:38 So you almost get like, I mean, there's some sort of edge cases on this stuff,
0:38:41 but it's the thing, because you're just working with the standard primitives,
0:38:44 you've just got the integration with the existing tooling and you get a
0:38:47 whole bunch of these things for free.
0:38:49 That is very elegant and I guess that is being unlocked now because like
0:38:54 you embrace the semantics of change of like how the data changes more and by
0:39:00 modeling and this is where it now gets relevant again why everything here is
0:39:04 modeled as a log under the hood since like to the log you just append and so
0:39:08 you can safely cache everything that has happened up until a point in time,
0:39:12 and from there on, you just add things on top, but that doesn't make the stuff
0:39:16 that has happened before less valid.
0:39:18 So you can cache it immutably.
0:39:20 That makes it super fast.
0:39:21 You can cache it everywhere on the edge, on your local device, et cetera.
0:39:25 And that gives you a checkpoint that at least once in a point in time was
0:39:31 valid, and now there might be more stuff that should be applied on top of
0:39:34 it, but that's already a better user experience than not getting anything.
0:39:38 I mean, another thing is like the operational characteristics of the
0:39:41 system, for this type of sync technology.
0:39:44 So, for instance, again, comparing HTTP with WebSockets, like
0:39:47 WebSockets are stateful, and you do just keep things in memory.
0:39:51 And so across, if you look across most real time systems, They have scalability
0:39:55 limits because you will come to the point where if you have, say, 10, 000
0:39:57 concurrent users, it's almost like, you know, it's like the thing of don't have
0:40:01 too many open Postgres connections.
0:40:03 But if you're holding open 10, 000 WebSockets, you may be able to do the
0:40:07 IO efficiently, but you will ultimately be growing that kind of memory and
0:40:11 you'll hit some sort of barrier.
0:40:12 Whereas, with this approach, you can basically offload that
0:40:15 concurrency to the CDN layer.
0:40:17 So, it's not just about, being, basically taking away the query workload of the
0:40:23 cached initial sync requests, but these kind of reverse proxies or CDNs have
0:40:27 a really nice feature called request collapsing or request coalescing, which
0:40:31 means that when they have a cache of requests come in on a URL, if they have
0:40:36 Two clients making a request to the same URL at the same time, they sort of hold
0:40:41 both of them at the cache layer and only send one request onto the origin server.
0:40:45 And so basically we've been able to scale out now to 10 million concurrent clients
0:40:51 receiving real time data out of Electric on top of a single single Postgres.
0:40:56 And there is literally no CPU overhead on the Postgres or the Electric layer.
0:41:01 It's just entirely handled out of the CDN CDN serving.
0:41:05 And so it's sort of remarkable that the combination of the initial data
0:41:09 load caching means that we, like one of our objectives is we want to be
0:41:13 as fast as just querying the database directly for an initial data load
0:41:17 and then orders of magnitude faster for anything that then subsequent
0:41:21 requests coming out of the cache, but also this sort of challenge with.
0:41:25 Almost like the, this thing about saying, okay, you're building an application.
0:41:29 You maybe want some of the user experience or developer experience
0:41:32 affordances of local-first, but if to do that, I need a sync engine and a
0:41:36 sync engine is kind of a complex thing.
0:41:39 And so you end up either going, okay, maybe I'll sort of use an external system.
0:41:44 And then you get like, A siloed real time database in your main database
0:41:47 and you get operational complexity, or you get some sort of system where
0:41:51 you have, yeah, you're basically of stewarding these web sockets and
0:41:54 it's very easy for it to fall over.
0:41:56 And I think actually, like, if you just sort of honestly view that
0:42:00 type of, architectural decision from the lens of like somebody trying to
0:42:04 build a real project, which is their day job, trying to get stuff done.
0:42:08 You're just going to avoid that as much as you can, because like
0:42:10 you'd far rather just like, I just want to serve this with Nginx.
0:42:13 I know how that's going to work.
0:42:14 I'm not going to stay up at night worrying about it.
0:42:17 Whereas I have 10, 000 concurrent users going through some crazy WebSocket stuff.
0:42:20 I'm going to get pager alerts.
0:42:22 And so like the whole approach here with what we're trying to do is to
0:42:26 change that sense that sync is a complex technology that you sort of.
0:42:31 Play with on the weekend and only adopt when you have to.
0:42:34 So going, look, you can actually do sync in such a way that it is
0:42:37 just as simple and standard as normal web service technology.
0:42:41 And then suddenly you can actually unlock the ability for kind of real
0:42:44 projects you know, you can take this stuff into a day job and not, get it
0:42:47 shouted down at the design meeting.
0:42:49 Cause it just feels like too much black box complexity.
0:42:52 You're using the word simple here.
0:42:54 And I think that really speaks to me now, because it's both simple in terms of
0:43:01 architecturally, like, how does data flow?
0:43:04 so I think this is where Electric provides a very simple and I think
0:43:09 easy to use and easy to work with trade off, like, how does data flow,
0:43:14 but then it's also gives a very simple answer of like, how does it scale?
0:43:19 Since you can throw at it like all the innovations and all the hard
0:43:23 work that has now gone into the like our web infrastructure for the last
0:43:27 decades, you can run on the latest and greatest and all the innovations that
0:43:33 Nginx and HAProxy and Cloudflare and like all the work that has into that.
0:43:39 You can just piggyback on top of that without having to innovate on the
0:43:44 networking side as well, since like you, you're really doing the hard work
0:43:48 on the more semantic and data side.
0:43:51 And that's a really, really elegant trade off to me.
0:43:54 Yeah.
0:43:54 And it's, it's fun because like our benchmarking testing at the
0:43:56 moment, like we break CloudFlare before we break Electric.
0:43:59 if something is battle tested, it's CloudFlare.
0:44:02 It again, it carries on because it's not just about this sort of
0:44:05 scalability or operational stuff.
0:44:06 It's also about then how you can achieve, like we talked about the write patterns.
0:44:10 And so this sort of pattern of how do you do writes?
0:44:12 And it's like, well, actually you can do the sync like this, use
0:44:14 your existing API to do writes.
0:44:16 And it can work with your existing stack.
0:44:19 But you have other obvious concerns with this type of architecture, like
0:44:22 say, authentication, authorization, data security, encryption.
0:44:27 But HTTP.
0:44:29 just has proxies and it works with the sort of middleware stack.
0:44:33 And so for us, a shape endpoint as a sync endpoint is just a HTTP resource.
0:44:40 So if you want to just put like an authorization service in front of it,
0:44:43 you just proxy the request through and you like, you have the context from
0:44:47 the user, you can have the context about the shape and you can just
0:44:49 authorize it using your existing stack.
0:44:52 If you want to do encryption, then you can do that.
0:44:54 It's just a stream of messages.
0:44:55 And yeah, a bit like you were saying that, like with Electric, you could
0:44:58 just use it as a transport layer to like, say, route a log of messages.
0:45:03 That can be ciphertext or plaintext.
0:45:05 So you could just like encrypt on device, sync it through.
0:45:08 You can just decrypt whenever you're consuming the stream.
0:45:11 And again, you could do that, like in the client, you could
0:45:13 do that in HTTP middleware.
0:45:15 So a lot of the sort of concerns, which, like certainly our experience of trying
0:45:20 to build a more integrated end to end local-first stack, you know, you go,
0:45:24 okay, we need to, we need to solve this.
0:45:25 I need a security rule system because suddenly there is no API and how am
0:45:29 I going to authorize the data access?
0:45:30 And it's like, we don't need a security rule system.
0:45:33 Because you can just use, you can just use normal API middleware
0:45:37 in front of an HTTP service.
0:45:39 And so you just sort of take that problem out of scope and like the
0:45:42 system doesn't need to do encryption.
0:45:44 It doesn't need to provide like a kind of hooks mechanism or some sort
0:45:47 of framework extensibility because the protocol is extensible and just,
0:45:51 you just have all of this ecosystem of existing tooling built around it.
0:45:55 So it is, I mean, it's been fantastic for us because it, because it
0:45:59 simplifies all of this aspects.
0:46:01 And allows us to go, look, this is how you can achieve, say
0:46:03 authorization with Electric, but again, it pushes it out of scope.
0:46:07 So we get to focus our engineering resources on just doing the core stuff
0:46:11 to deliver on this core proposition.
0:46:13 So which sort of things would you say are particularly tricky from a application
0:46:19 of all perspective with Electric, where it might be not as much of a good fit?
0:46:23 I think, One of the things is that we sync through the database and that has latency.
0:46:31 And so if you're trying to craft a really low latency real time multiplayer
0:46:36 experience, like, or even doing things where in a way it doesn't really
0:46:41 make sense to be, synchronizing that information through the database layer,
0:46:46 then it's maybe not the best solution.
0:46:49 So sort of for like presence features, let's say Infignar, where
0:46:54 you see my mouse cursor moving around, those sort of things.
0:46:57 yes, it would be nice if it was in real time shared across the various
0:47:01 collaborators, but you don't need a persistent trace of that for
0:47:05 eternity in your Postgres database.
0:47:07 So I think a common approach for that as well is just to have like
0:47:11 two kind of different channels for how your data flows, like your,
0:47:15 persisted data that you want to actually keep around as a fixed trail.
0:47:19 Like, did I create this GitHub issue or not?
0:47:22 But like how my mouse cursor has moved around, it's fine that that's
0:47:26 being broadcasted, but if someone opens it an hour later, it's fine
0:47:30 that that person would never know.
0:47:32 So for this sort of use case, it's an overkill basically
0:47:38 to pipe that trough Postgres
0:47:38 Yeah.
0:47:39 And you know, it's.
0:47:39 For us, Postgres is a big qualifier.
0:47:41 It's like, if you, if you want to use Postgres, if you have an existing Postgres
0:47:46 backed system, like Electric shines where like, yeah, you have, you already use
0:47:51 Postgres or you know that you want to be using Postgres, maybe you already have a
0:47:54 bunch of integrations on the data model already, maybe you do have existing API
0:47:58 code, like this is the scenario where we're really trying to say, well, look,
0:48:02 in that scenario, this is a great, pathway to move towards these more advanced
0:48:07 local-first sync based architectures, where, whereas if you look at it from a
0:48:11 sort of more greenfield development point of view, and you're trying to craft a
0:48:15 particular concurrency semantics, say, you would reach for Automerge and you
0:48:20 would get custom data types, which you can craft advanced kind of invariant
0:48:24 support with your kind of data types.
0:48:27 But of course, you know, so that's a slightly different sort of world.
0:48:30 And, and I think so almost probably for sort of a lot of people in the local-first
0:48:35 space dive into CRDTs and so forth, you know, it's really, it's fascinating
0:48:39 to try to sort of craft these sort of optimized, kind of, present style,
0:48:44 immediate real time streaming experiences.
0:48:47 And so whilst we do real time sync, it's almost more about keeping the data fresh
0:48:52 and just sort of making sure that the clients are sort of eventually consistent
0:48:56 rather than making that more sort of game kind of experience where, you
0:49:00 know, where maybe peer to peer matters more or of finding clever hacks to have
0:49:03 very low latency kind of interactions.
0:49:06 PGlite
0:49:06 That makes a lot of sense.
0:49:07 So now we've talked a lot about Electric and Electric is the name of the company.
0:49:12 It's the name of your main product.
0:49:14 But there's also been a project that I'm not sure whether you
0:49:17 originally created, but it's certainly in your hands at this point.
0:49:21 It's called PGlite.
0:49:23 That made the rounds on Hacker News, etc.
0:49:26 Also through a joint launch with the folks at Superbase.
0:49:29 What is PGlite?
0:49:31 What is that about?
0:49:33 Yeah, so I mean, interestingly with Electric, we started off, building
0:49:37 a stack, which was sinking out of Postgres into SQLite because it
0:49:42 made sense as the sort of main like embeddable relational database.
0:49:45 and I remember, speaking to Nikita, who is the CEO at Neon, the Postgres database
0:49:50 company, and some of his advice from building SingleStore or MemSQL was the
0:49:57 impedance or the mismatch between the two database systems and the data type systems
0:50:02 will continue to just be a source of pain for as long as you build that system.
0:50:06 And so we were just having these conversations about going, how do we
0:50:09 make this Postgres to Postgres sync?
0:50:11 And then, You can just eliminate any mismatch.
0:50:15 You just, you don't even need to do any kind of like serialization of the data.
0:50:19 You can just literally take it exactly as it comes out of like the binary
0:50:23 format that comes through in a query or the replication stream from Postgres,
0:50:26 put that into the client and like, you can have exactly the same data
0:50:29 types and exactly the same extensions.
0:50:31 So this was a sort of motivation for us.
0:50:32 And co founder Stas, the CTO at Neon had done an experiment.
0:50:37 to try and make a more efficient Wasm builder Postgres that could
0:50:41 potentially run in the client.
0:50:43 So previously there'd been some really cool work by Superbase, by Snaplet, a
0:50:47 few teams, which had developed these sorts of VM based, Wasm Postgreses.
0:50:52 But they were pretty big.
0:50:53 they didn't really have persistence.
0:50:54 They weren't, they were sort of more of a kind of proof of concept.
0:50:57 and the approach that Stas took was to do a pure Wasm build and
0:51:02 run Postgres in single user mode.
0:51:04 And that allowed you to basically remove a whole bunch of the concurrency
0:51:09 stuff within Postgres, which allowed us to make a much, much smaller build.
0:51:13 So they shared that repo.
0:51:15 And we sort of, played with it for a little while.
0:51:18 Didn't quite manage to kind of make it work.
0:51:20 And then one of the guys on our team, Sam Willis, just picked it up one week
0:51:23 and put in some concerted efforts and basically managed to pull it together
0:51:27 with persistence as a three meg build.
0:51:30 And it worked, and so suddenly we had this project which was like a three meg like
0:51:34 SQLite for context is like a one meg WASM build, and so Postgres is much kind of
0:51:39 larger system and you think it would be much bigger, but suddenly actually it's
0:51:41 not that far off in terms of the download speed, and it could just run as a fully
0:51:46 featured Postgres inside the browser.
0:51:48 and so we sort of tweeted that out and it's gone a bit crazy.
0:51:50 I think it's like, it's the fastest growing database project ever on GitHub.
0:51:54 It's like 250, 000 downloads a week nowadays.
0:51:57 There's a huge, there's lots and lots of people using it.
0:51:59 Superbase are using it in production.
0:52:00 Google are using it in production.
0:52:02 Lots of people are building tooling around it, like drizzle integrations, et cetera.
0:52:06 And it's the sort of thing that just should exist, right?
0:52:08 There should be a WASM built at Postgres, just being able to have it
0:52:11 like the same database system instead of mapping into an alternative one
0:52:15 has these fundamental advantages, and also a lot of people have just been
0:52:21 coming up with like a whole range of interesting use cases for it as a project.
0:52:25 So some people are interested in running it inside Edgeworkers.
0:52:28 As a sort of data layer that you can hydrate data into
0:52:31 for kind of background jobs.
0:52:33 Some people are interested in running it as just like a development database.
0:52:37 So you can just NPM install Postgres.
0:52:38 And if you're running like an application stack, you don't have to
0:52:41 run Postgres as an external service.
0:52:43 The same thing in your testing environment.
0:52:46 So there's a whole bunch of different use cases.
0:52:48 And in fact, like some of the work, for instance, the Superbase have
0:52:51 done is they built a very cool project called database.build,
0:52:55 which is a sort of AI driven database backed application builder.
0:53:00 So it's sort of AI app builder for building Postgres backed
0:53:02 applications, and it just runs purely on PGlite in the client.
0:53:07 And so that's a demonstration where.
0:53:09 this sort of database infrastructure for running software, you had
0:53:13 centralized databases, and then you had this sort of move to serverless
0:53:16 with separation of compute and storage.
0:53:18 And now you sort of have this model where actually you can run the compute,
0:53:21 with a whole range of different storage patterns in the client.
0:53:24 And you don't even need to deploy any infrastructure on the server.
0:53:28 to run database driven applications.
0:53:30 it really reminds me of that time when JavaScript was
0:53:34 getting more and more serious.
0:53:35 And at some point there was no JS and suddenly you could run the same sort of
0:53:40 JavaScript code that you were running in your browser, now also on the server.
0:53:45 And well, the rest is history, right?
0:53:47 Like that changed the web forever.
0:53:50 It has like changed dramatically how JavaScript just become like
0:53:54 the default full stack foundation for almost every app these days.
0:53:59 And there seemed to be a lot of like similar characteristics.
0:54:02 This time, the other way around, like going from the server into the world,
0:54:07 Node, it was rather the other way around, but, that seems like a huge deal.
0:54:11 Yeah, you know, you sort of step forward and we of see, I guess, some
0:54:15 of these trends in data architecture and just, you know, it can just
0:54:19 be the same database everywhere.
0:54:20 And in a way, it's just sort of almost logically extended to wherever you want.
0:54:23 And you almost like, you can just have this idea of like
0:54:28 declarative configuration of what data should sit where.
0:54:31 AI systems can optimize transfer and placement, and it is just
0:54:35 all the same kind of data types.
0:54:37 and I think, this is sort of where systems are moving to, but also
0:54:40 just like some of these things we've been learning with PGlite, like for
0:54:44 instance, if you're running a system that relies on having say a database
0:54:48 behind your application and say it's a SAS system and you're spinning up some
0:54:51 infrastructure for a client, With PGlite, you don't necessarily need to spin up a
0:54:55 database in order to serve that client.
0:54:57 So if you think about something like the free tier of like SaaS platform like that,
0:55:01 it can just change the economics of it.
0:55:04 it can do that on the server by just allowing you to have
0:55:06 the Postgres in process.
0:55:08 So you're not deploying additional infrastructure.
0:55:10 But also you move it all the way into the client and there just is
0:55:13 no compute kind of running on this.
0:55:15 It just moves even more of the compute onto the client.
0:55:18 And I think it like, it obviously aligns with sort of local-first in
0:55:21 general, but I know some of the stuff we've talked about before around the
0:55:24 concept of like local only first.
0:55:27 And as a developer experience for building software, so one of the
0:55:30 things that LiveStore is specifically designed to support is this ability
0:55:35 to Build an application locally with very fast, feedback and iteration.
0:55:40 And then you progressively add on, say, sync or persistence and
0:55:43 sharing and things when you need to.
0:55:45 And I think this sort of model of being able to build the software on
0:55:48 a database like, PGlite and then go, okay, I've played with this enough.
0:55:52 I want to save my work.
0:55:53 And it's at that point that you write out to blob storage, or you
0:55:57 maybe provision the database to be able to of save the data into.
0:56:00 Yeah, I think you've touched on something really interesting and something really
0:56:04 profound, which I think is kind of two second order effects of local-first.
0:56:09 And so one of them is for the app users directly.
0:56:13 So ideally it should just become so cheap and so easy to offer the full
0:56:19 product experience as sort of like a taste, fully on the client that is
0:56:24 no longer sitting behind a paywall.
0:56:26 But if the product experience generally allows for that, if it's sort of like
0:56:30 a note, note taking tool or something like that, that I should be able to
0:56:35 like fully try out the app, on my device and doing the signup later and
0:56:41 being able to offer that economically.
0:56:44 That is basically with those new technologies, that's no longer
0:56:47 an argument, so you can offer it.
0:56:49 So hopefully that will be a second order effect where software is way easier to
0:56:54 offer, where it's way easier to just try it out from an end user perspective.
0:56:59 But then also from the second point, from an application developer
0:57:04 perspective, I think it makes a huge difference in terms of complexity.
0:57:08 How, when you build something, whether it is just a local script
0:57:12 without any infrastructure, whether you can just run it, has no infra
0:57:16 dependencies, you can just run it, maybe you run like your Vite dev server.
0:57:22 And that's it.
0:57:22 It's self contained and you can move on.
0:57:25 There's like no Docker thing you need to start, et cetera.
0:57:29 That's like your starting point.
0:57:31 And if the barrier to entry there, if like, if that threshold is lower,
0:57:35 that you can build a fully functional thing just for yourself, just in that
0:57:39 local session, and you can get started this way, and if you then see like,
0:57:44 Oh, actually, there's a case here that I want to make this a multiplayer
0:57:48 experience or a multi tenant experience, then you can take that next step.
0:57:53 But right now, like, you can't really, leap ahead there.
0:57:56 You need to start from that multi tenant, that multi player experience,
0:58:00 and that makes the, the entry point already so much more tricky that many
0:58:04 projects are never getting started.
0:58:06 And I think both of those, I think can be second order effects and
0:58:10 improvements that local-first inspired architectures and software can provide.
0:58:16 So, I love those observations.
0:58:18 Yeah, yeah, totally.
0:58:20 And I mean, I think, for instance, with, it's interesting as well that a
0:58:23 lot of people do define their database schema using tools like Prisma, Drizzle,
0:58:29 like Effect Schema is a great example that obviously you're working on.
0:58:33 the more layers or indirection between where you're, say, iterating on the
0:58:37 user experience in the interface, and you want to be able to, say, customize
0:58:40 a data model to adapt to trying to sort of iterate there quickly.
0:58:44 But if you have to sort of go all the way into some other language, another
0:58:47 system, it just sort of takes you out of context and slows everything down.
0:58:50 So that's somehow the ability to like, yeah, apply that sort of schema into
0:58:54 the local database, not have to sort of work against these sort of different
0:58:59 legacy layers of the stack in order to actually be able to build out
0:59:03 The relation between Electric and PGlite
0:59:03 So going back to PGlite for a moment, how does PGlite and Electric, Electric
0:59:09 as a product and Electric as a company, how do those things fit together?
0:59:14 Yeah.
0:59:14 I mean, there basically are sort of two main products.
0:59:18 We have two products.
0:59:19 They're both open source, Apache licensed.
0:59:22 One is the Electric Sync Engine, and one is PGlite.
0:59:26 And so you can use them together, or you can just use them independently,
0:59:31 so it's not like the Electric system is designed only to sync into PGlite,
0:59:35 you don't have to have an embedded Postgres to use it Electric, and
0:59:38 you can use PGlite just standalone.
0:59:41 There's a range of different mechanisms to do things like data
0:59:44 loading, data persistence, et cetera, virtual file system layers,
0:59:48 loading in, unpacking Parquet files.
0:59:51 But if you do like have an application with this local database and you wanted to
0:59:56 then be able to sync that data with other users or into your Postgres database,
0:59:59 then Electric is just a great fit.
1:00:01 And obviously we make a kind of first class integration.
1:00:04 So I think for us, I mean, as a, as a company, as a startup, Electric is the
1:00:09 main product that we aim to build the business around, because in a way that
1:00:14 type of operational data infrastructure is just slightly more natural to build
1:00:18 a commercial offering around, like you have to run servers to move the data
1:00:21 around, we can do that efficiently, it sort of makes sense and adds value.
1:00:25 Whereas with PGlite as a open source embedded database, it's not
1:00:29 something that we're aiming to sort of monetize in quite the same way.
1:00:32 And potentially, maybe it could be upstreamed into Postgres, like, you know,
1:00:37 there should be a Wasm build to Postgres.
1:00:39 or, you know, maybe it kind of moves into a, a foundation and sort
1:00:42 of develops more governance, like certainly already with, PGlite.
1:00:47 So like Superbase, co sponsored one of the engineering roles with
1:00:50 us, there's been contributions from a whole bunch of companies.
1:00:53 So it is already a sort of, wide attempt in terms of the.
1:00:56 The stakeholders who are sort of stewarding the development of the project.
1:01:00 That is very cool to see.
1:01:01 I'm a big fan of those sort of like multi organizational approaches where you
1:01:06 share the effort of building something.
1:01:09 And, yeah, I love that.
1:01:11 I'm very excited to get my own hands on PGlite as well.
1:01:14 I'm mostly dealing with SQLite these days just because I think it is
1:01:18 still a tad faster for like, those single threaded embedded use cases.
1:01:23 But if you need the raw power of Postgres, which often you do, then
1:01:27 you can just run it in a worker thread and you get the full power of Postgres
1:01:31 in your local app, which is amazing.
1:01:34 So maybe rounding out this conversation on something you just touched on,
1:01:38 which is a potential commercial offering that Electric provides.
1:01:42 can you share more about that?
1:01:47 Electric commercial offering
1:01:47 Yep, so we're building, a cloud offering, which is basically
1:01:51 hosting the Electric sync service.
1:01:53 So like we, we, for instance, we don't host the Postgres database.
1:01:57 We don't host your application.
1:01:59 We just sort of host that kind of core sync layer, and then that can integrate
1:02:03 with other Postgres hosts like Superbase, Neon, et cetera, and kind of other
1:02:07 platforms for deploying applications.
1:02:09 that's our sort of first commercial offering.
1:02:12 And we of see that as like a almost sort of utility data infrastructure
1:02:17 play, where we've put a lot of effort in being able to run the software
1:02:22 very resource efficiently, and with sort of flat resource usage, so
1:02:26 it doesn't you know, scale up with memory with concurrent users, etc.
1:02:30 So we want to be able to run that very efficiently.
1:02:32 And so, we, we sort of see that that's kind of, low cost usage based pricing
1:02:36 based basically on the sort of data flows running through the software.
1:02:39 I think, you know, monetizing open source software is quite a sort of,
1:02:43 it's an interesting topic, but it's also sort of, there are a lot of,
1:02:45 common patterns that are well known.
1:02:47 And like, ultimately our aim as a company is, We want people building real
1:02:54 applications with this technology, and we want developers to enjoy doing it
1:02:58 and become advocates of the technology.
1:03:01 And then, there is a pathway when, imagine that you're a large company
1:03:05 and say you have like five projects and they're all using Electric sync.
1:03:09 It's very common for those sort of larger companies to need
1:03:12 additional tooling around that.
1:03:13 So governance, compliance, data locality.
1:03:17 There's a whole bunch of sort of considerations there.
1:03:19 So, it's quite common to be able to build out a sort of enterprise offering
1:03:22 on top of the core open source product.
1:03:25 And so, you know, there are various routes like that, that we
1:03:27 could choose to pursue in future.
1:03:29 and maybe that's how it plays out as we build a cloud, we focus on, making
1:03:33 this sync engine and these components bulletproof, make sure people are being
1:03:37 successful building applications on them.
1:03:39 And then we can look at maybe some sort of, value added tooling to help you
1:03:42 operate them successfully at scale, or help you operate them within sort of
1:03:48 Outro
1:03:49 That makes a lot of sense.
1:03:50 Great.
1:03:51 James, is there anything that you would want from the audience?
1:03:55 Anything that you want to leave them with?
1:03:57 anything to give a try over the next weekend?
1:03:59 The holidays are upon us.
1:04:01 what should people take a look at?
1:04:03 Yeah, I know that, You may be listening to this at any time in
1:04:05 future, but, we're recording this in the lead up to kind of December.
1:04:09 So if you have some time to experiment with tech over the holiday period,
1:04:12 just take a look at Electric.
1:04:14 you know, it's ready for production use.
1:04:16 It's well documented.
1:04:17 There's a whole bunch of example applications.
1:04:19 So there's a lot that you can of get stuck into there.
1:04:21 So please do come along and check it like our website is electric-sql.com.
1:04:26 we have a Discord community.
1:04:28 There's about 2000 developers in there.
1:04:30 So that's linked from the site.
1:04:32 we're on GitHub at, Electric SQL.
1:04:34 so you can see the Electric and the PGlite repos there.
1:04:37 and so those are the kind of the main things.
1:04:39 And if you're interested, for instance, in building applications, we already
1:04:43 have a wait list for the new cloud service, and we're starting now to
1:04:46 work with, some companies to help manually onboard them onto the cloud.
1:04:50 So if a cloud offering for hosted Electric is important, let us know,
1:04:54 and there's a pathway there to work with us if you're interested in being
1:04:57 an early adopter of the cloud product.
1:04:59 But also just, we spend a whole bunch of time talking to teams
1:05:02 and people trying to use Electric.
1:05:04 So our whole goal as a company is to help people be successful building on this.
1:05:09 And so if you've got questions about.
1:05:11 how best to approach it, challenges with certain application architecture.
1:05:14 We're very happy to hop onto a call and chat stuff through.
1:05:16 So if you come into the Discord channel, say hi and just ask any questions, and
1:05:21 we're happy to help as much as we can.
1:05:22 That sounds great.
1:05:24 Well, I can certainly plus one that anyone who I've interacted with from your
1:05:28 company has been A, very helpful and B, very, very pleasant to interact with.
1:05:34 And also at this point, a big thank you to Electric, not just for building what
1:05:38 you're building, but also for supporting me and helping me build LiveStore.
1:05:42 You've been sponsoring the project for a little while as well, which I really
1:05:46 much appreciate, and there's actually a really cool Electric LiveStore syncing
1:05:51 integration on the horizon as well.
1:05:53 That might be, some potential topic for a future episode, but I think with
1:05:58 that, now we've covered a lot of ground.
1:06:00 James, thank you so much for coming on the podcast, sharing a lot of knowledge
1:06:05 about Electric and about PGlite.
1:06:07 thank you so much.
1:06:08 Yeah.
1:06:09 Thanks for having me.
1:06:10 Thank you for listening to the Local First FM podcast.
1:06:13 If you've enjoyed this episode and haven't done so already, please
1:06:16 subscribe and leave a review.
1:06:18 Please also share this episode with your friends and colleagues.
1:06:21 Spreading the word about this podcast is a great way to support
1:06:24 it and help me keep it going.
1:06:26 A special thanks again to Rosicorp and PowerSync for supporting this podcast.