June 12, 2024

#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance

All episodes

June 12, 2024

#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance

Transcript

Dowload transcript

0:00:00 Intro

0:00:00 So I think it's a lot easier to develop when your queries are

0:00:03 co located with your components.

0:00:05 So every component is responsible for getting the data it needs.

0:00:08 And this makes your app like super composable, right?

0:00:11 Like you can add and remove components to your app or your component tree.

0:00:16 And because they're responsible for getting their data, like, you

0:00:19 don't have to do any additional wiring or prop drilling.

0:00:21 Welcome to the localfirst.fm podcast.

0:00:24 I'm your host, Johannes Schickling, and I'm a web developer, a

0:00:27 startup founder, and love the craft of software engineering.

0:00:30 For the past few years, I've been on a journey to build a modern, high quality

0:00:34 music app using web technologies.

0:00:37 And in doing so, I've been falling down the rabbit hole of local-first software.

0:00:41 This podcast is your invitation to join me in that journey.

0:00:44 In this episode, I'm speaking to Matt Wanlaw, a prolific local-first tool

0:00:48 builder who's behind projects such as Vlcn, CR-SQLite, and Materialite.

0:00:53 Most recently, Matt also joined Rocicorp to work on their new product.

0:00:57 In this conversation, we go deep on his projects covering CRDTs, SQLite,

0:01:02 and incremental view maintenance.

0:01:04 Before getting started, also a big thank you to Rocicorp and

0:01:08 Expo for supporting this podcast.

0:01:10 And now my interview with Matt.

0:01:13 Hey Matt, welcome to the show.

0:01:14 How are you doing?

0:01:16 Good.

0:01:16 Yeah.

0:01:17 Thanks for having me.

0:01:17 I'm super excited to have you on the show.

0:01:20 You and I have been collaborating on a few projects now over the

0:01:23 course of the last few years.

0:01:25 Most notably, you've been working on CR-SQLite and also

0:01:29 Materialite most recently.

0:01:31 But for those of you who don't know Matt would you briefly

0:01:35 mind introducing yourself?

0:01:36 Yeah um, Matt Wonlaw, and I guess I've been involved in the local-first

0:01:41 community for about two years now.

0:01:43 It originally started with me trying to resurrect an ancient project

0:01:46 of mine and trying to add, like, multiplayer and sync between devices.

0:01:51 And I saw, like, that's actually a pretty hard problem, and there weren't, like,

0:01:54 many off the shelf solutions, so then I just, like, dove into this , and that's

0:01:59 where all these projects like CR-SQLite, Materialite and other things came from.

0:02:02 Yeah, and I think this is also how you and I have met quite a while ago.

0:02:07 And before that, you've been working at Facebook for many years.

0:02:10 And even before that, I think you had your first engineering role

0:02:13 working on some software that by definition had to be very local.

0:02:18 There was no local-first yet.

0:02:20 Would you mind telling us a little bit more about what those first apps were that

0:02:26 Lockheed Martin

0:02:26 Yeah, so like my first job out of college was at a defense contractor

0:02:31 called Lockheed Martin, but essentially what we were doing was we were building

0:02:35 software for submarines, so like U.

0:02:37 S.

0:02:37 Navy submarines, and yeah, obviously a submarine has no internet connection,

0:02:41 everything has to be self contained, but even beyond that, so the software I built

0:02:45 was for, Like these thick clients for the workstations that sonar operators used.

0:02:50 And, you know, there, there are servers on the sub, you know, servers that

0:02:54 provide storage and some that provide the signal processing and whatnot.

0:02:58 But like an extra requirement was, you know, a sub is in a hostile environment.

0:03:03 Things can break.

0:03:04 So the work, workstations need to be able to work even if all the servers were down.

0:03:08 So, you know, they need to hold the last 24 hours of data.

0:03:10 So the sonar operators can, you know.

0:03:12 If every other system's down, they can at least work with the last

0:03:16 24 hours of data that they had.

0:03:18 So that, you know, I guess was my first brief introduction to local

0:03:23 apps that eventually do need to do some sort of syncing once connections

0:03:26 are re established and, you know, reconcile the changes that were made.

0:03:29 That's fascinating.

0:03:30 Have you learned some patterns back then that you still see out in the wild today?

0:03:36 Or do you think the approach that you've taken back then is entirely

0:03:39 different compared to today?

0:03:41 Uh, Back then the data was very well segmented.

0:03:44 So there was like never really conflicts.

0:03:46 But event sourcing was the main pattern.

0:03:49 So just have an event log of everything that has occurred.

0:03:52 And when connectivity is reestablished, do you know.

0:03:55 Play that event log forward from, you know, the time you lost

0:03:57 connection to the current time.

0:03:59 Yeah.

0:04:00 I mean, event sourcing, I feel like is still a super active topic.

0:04:04 I'm actually looking into it much more closely myself for the use case of

0:04:09 building Overtone, and I'm also exploring to embrace it a bit more for like syncing

0:04:14 and migration strategy for LiveStore.

0:04:16 But I also think that there might be a renaissance for event sourcing.

0:04:20 We had it for a while with Redux in a much more self contained space, but I

0:04:26 think there's a, that's an interesting topic for research and production as well.

0:04:31 Curious to hear your thoughts on event sourcing and how it

0:04:34 could fit into local-first.

0:04:36 I mean, I really like it.

0:04:37 , you have a source of truth, like of all the facts, all

0:04:41 the things that ever happened.

0:04:43 Whereas, you know, a regular app or state or database that

0:04:46 doesn't do event sourcing.

0:04:48 You're losing all the things that happen, right?

0:04:51 And you just have this snapshot of what things are right at this instant.

0:04:55 So yeah there's no way to rewind.

0:04:57 There's no way to, merge other people's changes if they did come in the past.

0:05:01 , so I guess event sourcing, you know, probably fell out of favor in some

0:05:05 ways because, you know, storage requirements, like are you really

0:05:08 going to store all the event logs?

0:05:10 And then also, you know, processing those event logs.

0:05:12 You do eventually need to process them into some you know, snapshot of

0:05:16 state so your app can work with it.

0:05:18 So I think, you know, people seeing that, oh, I have to process the events

0:05:20 and turn it into like these tables anyway, like why even do the event logs?

0:05:23 Why not just mutate in place?

0:05:25 Meta

0:05:25 Yeah, I think in case anyone of the listeners is interested in local-first

0:05:30 and event sourcing, please get in touch.

0:05:32 I think this is a very interesting topic to research a little bit After

0:05:37 building those apps for like submarines which sounds super fascinating, you've

0:05:43 at some point also decided to move on and join Facebook nowadays, Meta.

0:05:49 I'm curious to hear more what you've been working on there and

0:05:52 whether there was any intersection with your local-first topics now.

0:05:58 Yeah.

0:05:58 So Meta,

0:05:59 , I guess, like, it shaped the way I approached the local-first problems after

0:06:04 I left, but the problems there were very , , not related to local-first at all.

0:06:08 So I guess, yeah, what I worked on, it was like three different teams, but

0:06:11 they all kind of shared the same theme.

0:06:12 And a lot of what I was doing, we had a system called Download Your Information.

0:06:16 So this is like any Facebook user can go click a button and get

0:06:19 an archive of all their data.

0:06:21 So we'd have to like crawl your entire Facebook graph

0:06:25 and find Everything you own.

0:06:27 So we call it the ownership graph.

0:06:29 And then another thing one of my team's built was the deletion framework.

0:06:33 So this is like you delete your account.

0:06:35 So go delete every post you've ever uploaded, every like you've ever

0:06:38 made, every comment, every like ad you've ever run every message you've

0:06:43 ever sent across all the products, Instagram, Messenger, Facebook.

0:06:46 What's I guess not WhatsApp.

0:06:47 And then, yeah, so that, that was like traversing this deletion graph.

0:06:50 And then a third product, like, variation on all these themes was

0:06:54 like investigations and human review.

0:06:56 So, like, obviously there's lots of abuse on Facebook.

0:06:59 And we had to have a way for, like, every product that's ever created, like,

0:07:03 Can we make their content reviewable by like a human review team at Meta?

0:07:08 So this is like, yeah, trying to fan out from the content and traverse the graph

0:07:12 of things associated with it for review.

0:07:14 So I guess like, all this like graph walking was very, goes

0:07:18 very hand in hand with query languages and also schematization.

0:07:23 So, I guess I mentioned like three different areas of like

0:07:27 quote unquote compliance, right?

0:07:28 Deletion, download your information, and content review.

0:07:32 Well, if you're a product team at Meta, you don't want to

0:07:35 be slowed down with having to integrate into all these systems.

0:07:37 You don't have to learn about them.

0:07:38 You just want to like ship your product.

0:07:40 So like our key focus was how can we make it so developers can

0:07:43 just make their product and not have to understand these things.

0:07:46 So we kind of, we built it into the schema language.

0:07:49 So as you're defining your schema for your product, you can say like for a

0:07:53 given edge, like this is an ownership edge, or this is like a deletion edge.

0:07:57 And then just like by declaratively specifying all this stuff all these

0:08:01 other systems could just work.

0:08:03 And.

0:08:03 Yeah, eventually when I pivoted to local-first problem, I was like, wow, this

0:08:06 like developer experience of everything being declarative and the schema layer

0:08:12 down, taking care of it for you maybe we can do this for local-first too.

0:08:16 I couldn't agree more.

0:08:17 Did you eventually land on something where you got those declarative

0:08:24 CR-SQLite

0:08:24 Yeah, so, the project I ended up working on was called CR-SQLite,

0:08:29 so I was, like, taking these ideas and adding them to SQLite.

0:08:32 So, yeah, SQLite's already, you know, a relational database, and it's already

0:08:36 pretty declarative in terms of you can define your schema for your table,

0:08:40 and you can specify an index, and you can specify foreign key relations, and

0:08:43 the database manages all that for you.

0:08:45 But it didn't have any primitives for collaboration, right?

0:08:49 So, like What if I give you a copy of my database and I have a copy of my

0:08:54 own database and we both go offline and we both make a bunch of changes?

0:08:58 What if we want to like sync our changes together or merge our DBs?

0:09:01 So what I started adding to SQLite was this concept of CRDTs.

0:09:05 So you could say for a given table, what type of CRDT should we modeled as?

0:09:10 Like if it's a grow only set, or if it's like an add or move set, like

0:09:15 technical terms in CRDT literature.

0:09:17 And then for specific columns in the table, you could also say

0:09:21 like, what sort of CRDTs should be used to merge those columns.

0:09:24 So yeah, the idea was any developer who has an app backed by SQLite, they can

0:09:30 just go in and make some schema changes.

0:09:33 And then that app can become collaborative by allowing merging

0:09:36 databases with other collaborators.

0:09:39 Got it.

0:09:39 And I think the parallel there is that many things about SQL can be also seen

0:09:44 as rather declarative and the schema modeling, et cetera, those is where

0:09:49 I think what we gain with local-first software is the the collaborative

0:09:53 nature of the apps working together.

0:09:56 And I think this is where you.

0:09:58 Brought some of the nice experiences that you've seen at Meta, where there's

0:10:02 such a strong engineering culture for having a great developer experience,

0:10:07 bringing that to the new topic that you were interested in, local-first.

0:10:12 Meta had a really interesting philosophy when I got there.

0:10:15 I think like one of my first questions I asked was like, oh, how do we like

0:10:18 make sure people use our stuff that our teams, our security infrastructure?

0:10:22 And they're building things to make sure Meta's code was more secure.

0:10:25 That was my first team.

0:10:26 And like, security at Lockheed was always Everybody hated it.

0:10:29 It was like, nobody wants to do it.

0:10:31 And there's always this like mandated thing.

0:10:33 So I got to Facebook.

0:10:35 I'm like, how are we, I'm on security for, how are we going to

0:10:37 make sure people use our stuff?

0:10:39 And it was interesting.

0:10:40 Like the manager at the time was just like, yeah, like we can't do anything.

0:10:45 Facebook's bottom's up.

0:10:46 We can't do anything.

0:10:47 Make sure anybody uses our stuff.

0:10:48 The only way we can do it is make the safe way, the default, and easiest way.

0:10:53 Like, people are going to choose the secure way because

0:10:55 we've made it the easiest.

0:10:57 And like, there's no other way we can make them choose it.

0:10:59 So yeah, there's a huge, always a huge focus on everything we did from DevX.

0:11:03 So, that like, developers would just pick that solution.

0:11:07 Because it was easiest, not necessarily because it was secure, just because

0:11:09 it was the easiest thing around.

0:11:10 And then it was also secure.

0:11:12 Which is like the nice added benefit.

0:11:14 Strut.io

0:11:14 So after having worked for many years at Facebook, you've later then started

0:11:19 working on CR-SQLite, but you didn't arrive at CR-SQLite immediately.

0:11:24 I think you were for a little bit also still working on

0:11:28 bringing back an app you've been working on before called Strut.

0:11:32 So I'd be curious to hear a little bit more about that and particularly

0:11:36 given that you worked on it quite a bit before you joined Facebook and then

0:11:41 you started working on it again after.

0:11:43 So tell me more about the app and how that led you to local-first.

0:11:47 Yeah.

0:11:47 So it was my first, I guess, web app I was building.

0:11:51 So yeah, at Lockheed, I was doing all these like Java rich clients,

0:11:54 and I wanted to get into web dev.

0:11:56 So I started building this thing called Strut.io.

0:11:58 It's a presentation editor.

0:12:00 And I knew people's devices, you know, they have storage and

0:12:03 compute, and the browser had, you know, some storage back then.

0:12:07 And I didn't want to, like, run servers for people.

0:12:09 I just wanted them to be able to work and edit presentations if

0:12:13 they're online or offline and not have to sign up for an account.

0:12:17 Like you go to the site and you can immediately start doing stuff.

0:12:20 So yeah, like I guess without knowing it, I was building a presentation editor that

0:12:25 was, you know, local-first in some ways.

0:12:28 And at that time, it was like 2011, 2012, there was, you know, I guess the prequel

0:12:33 to the local-first movement, there's like remoteStorage.io and ownCloud.

0:12:37 And I started getting involved with those, but then, yeah, eventually

0:12:40 I got the Facebook offer and I just put Strut on the shelf to be

0:12:43 resurrected sometime in the future.

0:12:45 And then, yeah, when I was leaving Meta or Facebook, yeah, I didn't

0:12:49 know exactly what I was going to do.

0:12:50 So I figured, oh, I can just resurrect Strut.io real quick and you know,

0:12:54 it'll bring in some side income.

0:12:56 In the early days before I abandoned it, it had like 10, 000 monthly actives.

0:12:59 So I was like, Oh maybe a decade later we can get back to that again.

0:13:03 But yeah, as I resurrected it, kind of the landscape had changed a bit.

0:13:06 Like people had a lot more devices and also multiplayer was an expectation where

0:13:12 like, You know, Google Slides, right?

0:13:13 You can have multiple people on the same deck.

0:13:16 So I was, you know, looking into how I was going to solve that problem for Strut.

0:13:19 And, you know, rather than I guess I've been building frameworks for so long at

0:13:23 Meta, like, rather than making this a one off thing in Strut to solve the problem.

0:13:29 I was like, Oh, like, can I solve it at a lower level?

0:13:32 And that's when I went down the rabbit hole of CRDTs investigating SQLite,

0:13:36 adding stuff to SQLite and so on.

0:13:40 So the data architecture you had for the first version of Strut around, like,

0:13:45 yeah, before Facebook, you mentioned like around 2012 to when you've left

0:13:50 Facebook and what you've arrived at.

0:13:53 Very different.

0:13:54 So I'm curious, like how you went from like whether you took rather

0:13:59 some smaller steps or some much bigger steps altogether to rethink the data

0:14:05 architecture and how you went about that.

0:14:07 Yeah, I mean, I guess beyond just collaboration, I wanted to support

0:14:12 Yeah, arbitrarily large presentations.

0:14:15 So right, the original version of Strut, it would load everything off disk and

0:14:19 into memory, and then when you're closing the tab, or periodically, it would save.

0:14:23 A big dump of memory to disk and yeah, I think that works fine, probably

0:14:27 for the vast majority of cases.

0:14:29 But I wanted to be able to handle presentations with hundreds of slides

0:14:33 and tons of binary content or like images and videos and all sorts of things.

0:14:39 So like at first I was like, okay, how do I like lazily load this stuff off of disk?

0:14:43 And, you know, as you're starting to implement that in your own data

0:14:46 model, you're like, Oh, well, this is kind of like a database, which,

0:14:50 you know, knows how to page in and out things in and out for me.

0:14:54 So I think, you know, that was also an indication that, okay, like, something

0:14:58 more like a database would be good.

0:15:00 And then I don't know how I found that SQLite was better.

0:15:04 I guess I saw James Long's Absurd-SQL and Actual Budget.

0:15:08 That's when I first realized that, yeah, like, I mean, I'd always known Wasm was

0:15:12 a thing but that like SQLite was being compiled to Wasm and could run in the

0:15:16 browser and it was actually faster than IndexedDB when you were, I guess there's

0:15:21 some caveats there, but SQLite would store to IndexedDB, but it would store pages.

0:15:26 So it would store four kilobyte chunks.

0:15:28 IndexedDB, when you're like.

0:15:29 Reading things in and out individually, like single objects

0:15:32 by key value is like super slow.

0:15:34 And the fact that like, okay, you're using SQLite and reading in chunks

0:15:38 rather than individual objects, like it would be faster than IndexedDB.

0:15:42 Like if you change your IndexedDB access to be paged, okay, it

0:15:45 could be faster in SQLite.

0:15:48 SQLite, you can still do point queries and be faster than a point query in IndexedDB.

0:15:52 So that was something else that turned me on, like, oh, maybe I should use SQLite

0:15:55 in the browser rather than IndexedDB, so I don't have to implement all this

0:15:58 crazy stuff to make IndexedDB go fast.

0:16:00 And then, you know, there's all the problem of, like, collaboration still

0:16:04 and my justification for picking SQLite then was, you know, not only is SQLite

0:16:09 running the browser now, if I ever want to take this app and make it native

0:16:14 well, SQLite, Everything I build for collaboration in the SQLite, like it'll

0:16:18 work on native, it'll work on desktop it'll work in the cloud if you want

0:16:24 some like cloud peer or something rather than like a JavaScript only solution.

0:16:29 Choosing the right CRDT

0:16:29 So you made the decision, SQLite is the way forward, SQLite on top of IndexDB or

0:16:34 on top of like some persistence mechanism, but you now also need collaboration.

0:16:39 You looked a little bit into CRDTs, but you still decided

0:16:42 instead of going CRDTs first in terms of the developer experience.

0:16:46 You still want to give a developer yourself SQLite as the primary way to

0:16:52 think about the developer experience, but you started to use CRDTs or you

0:16:57 wanted to use CRDTs as a implementation detail to bring into SQLite.

0:17:02 So how did you go about that?

0:17:03 How did you solve that puzzle of making SQLite collaborative out of the box?

0:17:09 Yeah, so SQLite has a bunch of mechanisms for extending it.

0:17:13 You can make new virtual tables.

0:17:16 yeah, a virtual table is essentially a table.

0:17:18 It presents itself as a table in SQLite, except that The implementation of that

0:17:24 table is any arbitrary code you want to write, so C code, Rust code, whatever.

0:17:28 You can make functions, new functions, like a mat, you know, like min or max

0:17:32 or whatever function you want to define.

0:17:34 You can make new virtual file systems, so it's pretty extensible

0:17:38 from the plugin side of things.

0:17:39 And yeah, one of the teams I had worked with at Meta, they built this thing called

0:17:44 osquery way back, I think, You know, some other company may have seen osquery

0:17:48 now, but it basically lets you query all details about your OS through SQLite,

0:17:53 and they did that through virtual tables.

0:17:55 So yeah, for adding CRDTs to SQLite,

0:17:57 it was kind of a combination of a bunch of special functions.

0:18:01 That like when you create a table so, plus special functions

0:18:06 and special virtual tables.

0:18:07 So you can like, create a table, and then you can call a function that

0:18:11 converts that table to a CRR, or Conflict-Free Replicated Relation.

0:18:16 So just a fancy name for a table that's a CRDT.

0:18:20 Or you could use this like virtual table syntax where you're like, I want

0:18:22 to create this table as a CRR and then you list all the columns and in the

0:18:31 columns, so when you're creating a virtual table, like you can customize that

0:18:34 syntax all you want in your extension.

0:18:37 So I could like have a slot for custom data types.

0:18:40 So you can say this is a string.

0:18:42 And then like another addition to the data type is like the CRDT type.

0:18:46 So it's not only text, it's like a perryText, or it's like, not just an

0:18:50 int, it's an int that's a counter.

0:18:53 Or it's not just a date, it's a date that slash right wins.

0:18:56 Yeah, and then and then I guess the last thing, right, like, it's great

0:19:00 to have a database that's, you know, a CRDT but how do you merge efficiently?

0:19:06 . There had been some earlier work, actually by a guy named Iver and we actually worked

0:19:11 together pretty early on, on CR-SQLite.

0:19:15 He did his master's thesis on essentially adding CRDTs to SQLite but some of the,

0:19:20 like, Downsides of like that original approach was there's, there was no

0:19:24 way to incrementally merge databases.

0:19:25 You had to literally send the entire copy of the SQLiteDB to

0:19:29 somebody else, and it would like scan every row and merge them all.

0:19:32 So yeah, one of the key problems we had to solve was like how do we

0:19:34 incrementally sync And that was done essentially like implementing a global

0:19:41 version that gets tagged on every row and a virtual table that knows how

0:19:45 to find all the rows past the given version and just send you those to sync.

0:19:51 Was there any prior art to combining a relational database and CRDTs or

0:19:58 another mechanism to synchronize?

0:20:00 Yeah, so there's Iver's work, and then, I guess, James Long's work on

0:20:05 well, Absurd-SQL, but I guess Yeah, I don't know if he ever gave a name to

0:20:08 the layer above it that did the CRDTs.

0:20:10 But yeah, James work, if I recall correctly, was all in JavaScript.

0:20:15 So it wasn't built into SQLite itself.

0:20:17 You couldn't just run it anywhere SQLite could run.

0:20:20 It had to be in a JavaScript environment.

0:20:22 And he took a different approach.

0:20:24 I think he used hybrid logical clocks and Merkle DAG.

0:20:29 So yeah, I guess we started getting some trade offs of like,

0:20:32 Strut was the first use case.

0:20:34 Which actually informed a lot of design decisions for CR-SQLite and why I

0:20:37 didn't want to use something like James Long's approach or, I don't know other

0:20:41 solutions that were available at the time.

0:20:43 Yeah.

0:20:44 I'm curious to hear more about your your judgment of the trade

0:20:48 offs and what made it a good versus not so good fit for your use case.

0:20:52 Yeah.

0:20:52 So I think James's approach yeah, I hope I'm remembering everything correctly.

0:20:55 This was like a couple of years ago, but when I looked into it, every

0:20:58 single message for all time was stored.

0:21:01 So this is, you know, kind of like event sourcing, right?

0:21:03 But like, one of my key requirements, like, I was using Strut to drive

0:21:08 requirements for CR-SQLite.

0:21:10 And I wanted the ability for people to be offline as long as they wanted.

0:21:14 For it to be able to run on constrained, low memory devices.

0:21:18 And so those two in combination with the fact that it's like a presentation

0:21:22 editor where you can like, you can drag and drop components, you can spin them,

0:21:26 rotate them type, like I didn't want every single event for every keystroke

0:21:31 and every drag and all these things to be logged for all time, right?

0:21:35 Especially if you're offline indefinitely.

0:21:37 So if you're offline for like, I don't know, a couple days editing

0:21:41 a really big presentation, you come back offline, I want it to

0:21:43 be able to sync almost instantly.

0:21:45 So the approach I chose was the set of CRDTs that allow you to collapse history.

0:21:51 So, yeah, if you've been off The database size never grows more than

0:21:56 a constant factor of the base data.

0:21:59 So no matter how much you're editing, you're always going to be within a

0:22:01 constant factor of the regular DB size.

0:22:04 I don't know, I think that constant factor was like three or four times,

0:22:07 I don't recall the exact number.

0:22:09 Whereas the other systems, if you're the longer offline, the more and

0:22:12 more the data grew which was, you know, not a trade off I liked.

0:22:15 Yeah, I guess the downside to my approach is there's a few less

0:22:18 transactional guarantees and you can't, you know, rewind history.

0:22:22 But if somebody really wanted rewinding history, I was like, Oh,

0:22:24 they could do it in user space.

0:22:26 Like, that's not the problem I'm trying to solve.

0:22:28 That makes perfect sense.

0:22:29 And I think I really like, you focusing on the use case that you're most familiar

0:22:34 with Strut and the requirements around that, Since we're dealing with like

0:22:38 distributed systems and programming is hard enough, that you can't really

0:22:43 focus on all use cases at the same time.

0:22:47 And this way you design for the use case that you know of.

0:22:51 And I think this is where you have like a canvas app or something like that

0:22:55 where you move things around constantly.

0:22:58 If you work for multiple days, you don't really care exactly about

0:23:02 the change you've made over like five seconds, three days ago.

0:23:06 You just care about, okay, this is the state it is in right now and that

0:23:10 other users converge to the same thing.

0:23:13 Whereas I think a different set of requirements might be for a chat app

0:23:19 where you might not send around, like, which sort of keystrokes someone has

0:23:25 taken to arrive at the final message.

0:23:27 You send the final message, but then those final messages, you

0:23:30 might actually care about, okay, this came at this point in time.

0:23:32 One message does rarely override another one.

0:23:35 So this is, I think, where that might be a better fit.

0:23:39 And that makes also sense that in James use case with Actual Budget that's all

0:23:44 about, like, historic transactions.

0:23:46 You want exactly those like records and time.

0:23:49 There's probably not so many, and even if there are many,

0:23:52 they're all kind of worth it.

0:23:54 They need to be accurate.

0:23:56 So I think that makes perfect sense, and I think there will be like many different

0:23:59 solutions for many different use cases.

0:24:02 And I think there's even, you know, an approach how you could apply a bit of like

0:24:06 that compaction, even to event sourcing.

0:24:08 I think there's like a concept called subsumption, I think.

0:24:12 And that allows you to also like flatten some of those historic events.

0:24:16 So that's super interesting topic of research, I think.

0:24:20 . Yeah, I know there for certain CRDT algorithms and certain

0:24:23 event sourcing things, like.

0:24:26 Yeah, there are ways to compact history, but it did seem like, still

0:24:29 a pretty evolving area of research.

0:24:32 I think now it's a lot more understood especially in the,

0:24:35 like, list based CRDT space.

0:24:37 Sure, like, I don't know, I guess, Yjs kind of pioneered a lot of

0:24:41 that work of, like, run length encoding, everything, and then stuff.

0:24:45 Yeah, I guess what I want to quickly mention, so there's, when

0:24:47 I was first getting started, there was a community called Braid.org

0:24:51 which I got involved with early on.

0:24:54 And yeah, I don't know, that was an amazing community in terms

0:24:56 of like learning and being open.

0:24:59 So I just want to call them out, like they helped me a ton to understand

0:25:03 all these CRDT algorithms and when history can be compacted, when it can't.

0:25:07 What kind of weird circumstances you get into if you know you're emerging

0:25:11 changes peer-to-peer versus, you know, client server and whatnot.

0:25:14 Yeah.

0:25:15 There this entire community, I'm not yet a part of, but given that

0:25:19 this year I'm also looking much more into implementing, syncing this.

0:25:24 I feel like there's a lot I have to learn about and I would love

0:25:28 to also have some folks from that community here on the podcast.

0:25:32 Yeah they're trying to add.

0:25:34 Syncing to the HTTP protocol, so essentially adding a new verb rather

0:25:40 than get, put, and post to allow you to merge documents or resources together.

0:25:46 That sounds fascinating.

0:25:48 How far along is that effort?

0:25:49 They are part of IETF now.

0:25:52 I don't remember if they're a research, there's like two separate types of groups.

0:25:56 There's like research groups and maybe there's standards groups.

0:25:59 I think those are the two.

0:26:01 Yeah, I don't know if they're still in the research phase or if

0:26:03 they're on the standards track yet.

0:26:05 Challenges

0:26:05 Very interesting.

0:26:06 I have to look more into that.

0:26:08 So, besides choosing the right CRDTs to fit into SQLite through virtual

0:26:14 tables I think this, like, just picking SQLite to run in the browser quite a

0:26:21 few years ago, that's probably you've must have been running into a bunch of

0:26:25 challenges and issues along the way.

0:26:27 So I'm curious to hear first getting sort of like a broad overview of

0:26:31 the different challenges you were running into and then going into them.

0:26:35 Yeah, I think I wasted months of my life, like fighting SQLite issues.

0:26:39 Cause yeah, it was really early where the official SQLite WASM project

0:26:44 had kind of like just started.

0:26:46 And I guess some of the first problems I ran into was.

0:26:51 I guess compiling an extension into the WASM build.

0:26:55 So yeah, it's one thing to write a SQLite extension.

0:26:57 It's another thing to get it to run correctly in WASM.

0:27:03 It wasn't so hard when the extension was originally written in C, but like,

0:27:06 I don't know, like trying to code review contributions in C and make sure there's

0:27:10 no like memory issues was a nightmare.

0:27:13 So I eventually like bit the bullet, started learning Rust and re implemented

0:27:16 the extension in Rust, which I think was, yeah, it became like a third

0:27:21 or two thirds less code and just way faster to write after doing that.

0:27:24 I think it was worth the time.

0:27:26 But that introduced this whole new complication of like getting the

0:27:29 Rust compiler toolchain to compile an extension that can be linked to the

0:27:33 SQLite compiler toolchain that then goes through Emscripten to build a Wasm bundle.

0:27:39 And let me see what else was there.

0:27:42 Transactions was an interesting thing.

0:27:45 So the inner, the originally the interfaces to SQLite

0:27:49 in the browser were async.

0:27:51 And if you ever had Well, I guess there's two builds of SQLite.

0:27:55 The official SQLite build was always synchronous.

0:27:57 There's this unofficial build, which I think is better, called wa-sqlite, which

0:28:02 all the interfaces were asynchronous.

0:28:03 But it had these interesting caveats, like if you await two calls to

0:28:07 SQLite at once, it would deadlock.

0:28:09 So you had to make sure like there's only one Call to SQLite ever at once.

0:28:14 So, you know, working around that challenge.

0:28:16 Transaction interleaving was another, so like if it's asynchronous and you

0:28:20 start a transaction and you have all the statements in the transaction, okay.

0:28:24 You, you began the transaction and then you like do a read,

0:28:26 you're waiting the read.

0:28:28 Well, while you're waiting in the read, you delegate control of the event loop,

0:28:31 so somebody else could come in and, you know, kick off a new task, right?

0:28:36 A new statement to do a write or something.

0:28:39 Well, now that's somebody else's transaction that's trying to run while

0:28:41 your other transaction is running, so you get this, like, weird interleaving.

0:28:44 So, yeah trying to lock out transactions in the JavaScript layer that was

0:28:48 another annoying thing to deal with.

0:28:49 And then, yeah, this is not related to WASM, but just getting SQLite

0:28:56 with the extension built for all the platforms I wanted to target.

0:28:59 So, like, eventually Expo wants to include it in their Expo SDK, so figuring

0:29:04 out how to do Android and iOS builds in addition to, you know, the WASM build.

0:29:09 And Mac and Linux builds.

0:29:12 Those sound like quite the buffet of different challenges that some sounds

0:29:18 familiar to me as well as I think some you've been pioneering a tad earlier.

0:29:23 And by now, some of those are maybe a bit more.

0:29:27 A well trodden path, but I think there's probably even more

0:29:31 that we don't understand yet.

0:29:33 One more, I know one more interesting one I think more interesting than

0:29:36 the other ones is since SQLite in the browser, at least the one

0:29:41 I'm using, is IndexedDB backed.

0:29:43 IndexedDB has a very high cost for opening a transaction which

0:29:48 I was unaware of at the time.

0:29:49 So yeah, there's a lot of work.

0:29:51 And anytime you open a SQLite transaction, it would open an IndexedDB transaction.

0:29:56 So there's a lot of work making sure that, like, for reads Yeah, normally

0:30:00 you just issue reads, you wouldn't necessarily batch them into a transaction.

0:30:03 Like normally, you know, you'd batch your writes into a transaction.

0:30:06 But yeah, to make SQLite in the browser go fast if you're doing a whole bunch

0:30:09 of reads, like making sure a transaction was open automatically all the reads

0:30:14 would happen and then the transaction would close once all the reads are done.

0:30:16 So it's like way of batching that was like a 10x speed up in the browser for SQLite.

0:30:21 So once you've solved all of those performance challenges and also like

0:30:26 correctness, syncing challenges, et cetera, then it's also needs

0:30:30 to be competitive somehow with the typical developer experience

0:30:34 you have in a React app or in a different framework you're using.

0:30:40 How did you go about designing that?

0:30:42 Yeah, so, yeah, I've developed a set of hooks, so, I don't know, it's, I guess

0:30:47 like the React integration was probably the least amount of, I don't know,

0:30:51 I'd say it's the least amount of work.

0:30:52 Maybe not once I like think about all the strict code problems and

0:30:55 stuff but yeah, essentially the DevEx was like a use query hook.

0:30:59 So I think it's a lot easier to develop when your queries are

0:31:03 co located with your components.

0:31:04 So every component is responsible for getting the data it needs.

0:31:08 And this makes your app like super composable, right?

0:31:10 Like you can add and remove components to your app or your component tree.

0:31:15 And because they're responsible for getting their data, like, you

0:31:18 don't have to do any additional wiring or prop drilling.

0:31:20 And yeah, they're not going to break other components because suddenly,

0:31:26 Some data dependencies are missing.

0:31:28 So I really like that, yeah.

0:31:29 Co located queries, so every component you could use this hook called UseQuery.

0:31:34 And in that UseQuery, you just literally write your SQL that fetches

0:31:39 the data required by that component.

0:31:41 And that SQL could be as complex or as simple as you wanted.

0:31:44 So, you know, like, select from issue where id equals issue id or something

0:31:49 simple or, you know, something complex like select star from issue,

0:31:52 order by modify, join labels join owner, and yeah, all these things.

0:31:58 So you've managed to tame most challenges with SQLite in the browser, making sure

0:32:03 it works in Wasm, making sure that your own extension is working, et cetera.

0:32:08 And SQLite can be super, super fast for most workloads, but once your

0:32:14 cardinality grows, et cetera, your tables get really big then queries can

0:32:19 also start Taking a little bit longer.

0:32:22 And that is because SQLite always kind of, besides a little bit of caching,

0:32:26 always starts with your results for your queries from scratch, kind of.

0:32:31 And you know, I've been chatting about this for quite a while back then.

0:32:35 This is how I know about all of this context.

0:32:37 And that has led you to look into.

0:32:40 incrementally maintaining the query results and the views.

0:32:43 So I'm very curious to hear more what you've learned back then

0:32:46 and which path this led you down.

0:32:49 Yeah.

0:32:50 Yeah.

0:32:50 So I guess, you know, I was talking earlier about each component

0:32:53 fetches its own data, has this use query hook, or just, you know,

0:32:56 specifies a SQL query it's running.

0:32:58 So when you're developing like a rich client side application like this,

0:33:02 ideally your database is reactive, right?

0:33:04 So any, anytime some data changes.

0:33:07 It's like you, you work on Overtone, right?

0:33:09 The music app.

0:33:10 If somebody presses play, then, you know, all the components that care about

0:33:16 play state need to immediately update.

0:33:18 And yeah, as you're saying in SQLite, if you're doing this somebody changes some

0:33:22 state, well, you know, SQLite doesn't have any, you know, facilities for reactivity.

0:33:27 It has a few but they're not very good.

0:33:29 Like they don't give you many fine grained notions about what changed

0:33:32 or like what queries it impacts.

0:33:34 So what you're left doing is rerunning.

0:33:38 Essentially from scratch the queries that could have been affected.

0:33:42 So, like, the SQLite change notifications, they'll tell you the row ID that

0:33:46 changed, but not the contents, and they'll tell you, like, the table.

0:33:49 So it's usually not enough to figure out exactly which queries to invalidate.

0:33:52 And yeah, so you rerun tons of queries throughout the app.

0:33:56 And those are all rerunning from scratch, so like, if you have, like,

0:34:00 for your track list, you're showing a few hundred tracks, and that track list

0:34:03 has to do a number of joins, right?

0:34:05 Has to, for a track, has to join the album to get the album title, has to

0:34:08 join the artist to get the artist names.

0:34:12 Yeah, I don't know if there's any other joins.

0:34:13 Maybe there could be joins about, like, like status, or favorite status,

0:34:18 or something about the track, right?

0:34:20 And, yeah, rerunning that join to select, you know, 200 some items.

0:34:25 And yeah, maybe they just want to sort, and sorting it doing that from

0:34:28 scratch every time somebody mutates something severely limits I guess how

0:34:32 fast you can interact with the app.

0:34:34 And yeah, I'd worked, tried a bunch with like, okay, you know,

0:34:38 maybe a purely in memory SQLite, and that gets you pretty fast.

0:34:42 But yeah, somebody was saying, oh, like, The goal for a reactive

0:34:45 database is it's memory fast.

0:34:47 And then when I started, like, I don't know, that idea, a lot just

0:34:50 helped my brain and I started like actually benchmarking, you know,

0:34:54 if I'm just observing a value in JavaScript, how fast is that?

0:34:57 And if I'm like, Observing a query in SQLite and updating the

0:35:02 row and re running the query.

0:35:03 Like, what is that?

0:35:04 And I have a observable notebook somewhere that like compares all this

0:35:08 and like the difference was massive.

0:35:09 So like just seeing how far I was from memory fast, like started

0:35:13 making me a bit sad on SQLite.

0:35:15 And also like benchmarking the original strut and the new one, which use

0:35:20 SQLite and these use query hooks.

0:35:23 Yeah, like the original one I could throttle my Chrome, you know, but in

0:35:27 the dev tools you can like downgrade your CPU right to like 6x slower or

0:35:31 something and I could go all the way down to the slowest possible and everything

0:35:35 was buttery smooth, but in the new one, like if I downgraded it too much,

0:35:40 like you could see some visible lag.

0:35:42 SQLite

0:35:45 is fundamentally built in this request response style, right?

0:35:50 This is the era of the LAMP stack where, you know, databases request

0:35:55 response made sense, right?

0:35:56 A user would go to the website it would do a query against the

0:35:59 DB and it would render, right?

0:36:01 There was no, like, rich interactivity.

0:36:03 It was always a full page refresh every time.

0:36:04 So request response made sense.

0:36:06 Yeah, but for these rich apps, request response no longer makes sense.

0:36:09 And trying to fit a request response sort of DB designed around that,

0:36:13 it into this reactive scenario.

0:36:15 And I think maybe one day somebody can get there.

0:36:19 But I think right now it didn't seem to make much sense to me and the juice

0:36:23 didn't seem to be worth the squeeze.

0:36:25 So I started, I guess my love affair with SQLite has slowly been coming to a close.

0:36:30 And I started, you know, investigating other projects.

0:36:33 Yeah maybe one, one day port it if there's time or.

0:36:36 porting some of these projects back to SQLite and contributing

0:36:39 incremental data flow, differential data flow to them or something.

0:36:43 But yeah, I guess I should describe what these projects are, right?

0:36:45 , So I like the idea of queries, being able to like, Issue a complex declarative query

0:36:52 against some set of relational data to get back the data you need for your view.

0:36:56 So I wanted to like, how can I , incrementally maintain an

0:37:00 arbitrary query against some data?

0:37:02 So like that track list example you query the, Tracks and their

0:37:06 artists, and their albums, and the play state, and all these things.

0:37:10 How can we, when somebody does it right, rather than re running

0:37:13 the query to get the track list it knows exactly which queries.

0:37:17 Should be invalidated by that write, and rather than re running them,

0:37:22 knows exactly how to patch up the rows that are impacted by the write.

0:37:27 So, you know, at first this problem seems absurdly complex and hard.

0:37:32 And you're like, Oh, yeah, I want to like invalidate and

0:37:34 patch up any arbitrary SQL query.

0:37:37 But yeah, I started reading some of the papers.

0:37:39 There's like a paper called DBSP.

0:37:41 It's a differential data flow paper.

0:37:43 I think Materialize is based on that.

0:37:47 And then, yeah, started implementing it.

0:37:49 And then I realized like, I don't know, this, once you've read it and started

0:37:52 implementing it and you realize like, this is very similar to other stream

0:37:56 processing stuff , like, and other query, you know, builders I've done,, right,

0:38:02 where rather than, you know, you create a series of operations and rather than

0:38:06 them asking a database for the data, they're like taking streaming data in.

0:38:12 So yeah, like some of the work I'd done at Meta was on real time abuse systems

0:38:17 and that was all streaming systems.

0:38:20 So I was like, oh, like this problem isn't that mysterious.

0:38:22 Like I've done stuff like this before.

0:38:25 It seems tractable.

0:38:26 So I, you know.

0:38:27 went full on, decided like, I think we, I think I can implement this yeah, based

0:38:32 on prior experience, based on having done query languages before so yeah,

0:38:36 MaterialLite was born which is bringing differential data flow to JavaScript, so

0:38:43 you can compose, filter, map, reduce, and a join operator to create these pretty

0:38:48 rich queries and have them reactively updated anytime there's a write.

0:38:53 Right.

0:38:53 And so for those of the listeners who have not yet built their own database

0:38:58 and maybe are familiar with like SQL where in SQL, you have like select from

0:39:04 where, and all of those and JavaScript or other programming languages who you

0:39:08 might've like, Map, filter, et cetera.

0:39:11 They might have different names, but they're sort of

0:39:13 like conceptually very similar.

0:39:15 And this is where you're basically just now like trying to recreate the same

0:39:20 semantics that we have from SQL, where you can say select star from this where so,

0:39:25 and you can basically have like an array where it now say .Map .Filter, et cetera,

0:39:32 and where you flip the trade offs from before you owned in SQLite, you might get

0:39:38 this query just once and then it's done.

0:39:40 You no longer are interested in like subsequent changes and then like, smaller

0:39:45 updates, but in JavaScript where our app stays warm, we click the button.

0:39:49 Something's changed slightly.

0:39:52 You want to change the trade offs quite significantly.

0:39:57 Materialite

0:39:57 Yeah, so like just to build off on like how SQL concepts map to like filter,

0:40:02 map, and reduce, or like map represents select or function application, like

0:40:08 you're selecting some columns, well that's mapping the original object to some

0:40:12 other object with a subset of the fields.

0:40:14 Filter is the same as where Reduce is what you use for any aggregate

0:40:19 function, like group by, sum, count.

0:40:21 Yeah, and I guess one other operator would be concat.

0:40:24 So if you want to do or, you just, you fork your stream that you filtered

0:40:28 or mapped over, and then after you've filtered it, you concat the streams back

0:40:32 together, and then you run a distinct.

0:40:34 So you can do an or and then join, joins a special thing where essentially.

0:40:39 You kind of maintain what you've seen from one stream and then as you

0:40:43 see results from another stream, you link them together based on some key.

0:40:46 At first, it like, Seemed like a large barrier, but then I don't know, once you

0:40:51 get into it and you realize, Oh, like I've done all this before and filter,

0:40:53 map, reduce, like these have direct analogs in SQL, like it's not so bad.

0:40:58 So we've so far like touched the most common usage of SQL.

0:41:04 But SQL has, or SQLite, Postgres, et cetera, but like sticking with

0:41:08 SQLite for a moment there's quite a bit more than like the common

0:41:12 like select from where, et cetera.

0:41:15 There's like various kinds of aggregations, various kinds of ways

0:41:18 to combine queries, sub queries, joins, various kinds of joins.

0:41:23 Is the system that you've came up with MaterialLite, is that on feature

0:41:27 parity on what SQLite is able to do?

0:41:31 Or are there some parts where you say, okay, that's out of scope because

0:41:35 there's years worth, maybe decades worth of optimization have gone into

0:41:40 SQLite into making that fast and that's out of scope for Materialite.

0:41:44 Yeah.

0:41:44 So there's no window functions and there's no recursive queries.

0:41:48 And base Materialite is strictly for streaming.

0:41:52 So, like, if you're processing a stream of writes or events, like

0:41:57 Materialite is a great fit because it's gonna, you know, patch up your

0:42:01 query results as those events come in.

0:42:04 But say you have a table of a million items and you want to

0:42:09 query it from scratch, right?

0:42:11 Like you didn't see the events, you couldn't maintain any queries.

0:42:15 You're just like, I want to know what's in this table.

0:42:17 So, so you're going to have to run that query from scratch, even in Materialite.

0:42:21 But for Materialite, what that means running a query from scratch is like

0:42:24 setting up the Dataflow pipeline and then feeding literally every row,

0:42:27 all million rows into that pipeline.

0:42:30 Which, yeah, since everything's implemented with, like, filter and map,

0:42:34 where, like, it's opaque, like, when you do a filter, you provide a lambda.

0:42:38 It's opaque.

0:42:38 That's what fields you're filtering on which makes this from scratch case hard

0:42:42 because, like, Oh, if I don't know what fields you're filtering on, then I don't

0:42:45 know if I can apply some sort of index that you might have against this data.

0:42:49 So, yeah, I guess after working on Material 8 and getting it to

0:42:54 work pretty well, like, okay, that was the next problem.

0:42:57 And it's like, we need an actual query language because in a

0:43:01 query language is declarative.

0:43:02 You express, like, This is the field name I'm filtering on.

0:43:05 This is the operator being used and this is the value.

0:43:07 So the engine knows exactly where the fields are ordering by the tables you're

0:43:12 using the fields you're filtering on.

0:43:15 And once you have that knowledge, you can also make that like first query

0:43:19 case fast, where I guess you're doing the traditional database thing of like

0:43:24 figuring out what indices to pick.

0:43:26 I don't think we'll ever, Be as fast as SQLite for that case.

0:43:29 But I think, you know, For an app like, for a rich client

0:43:32 like, Overtone or Strut, right?

0:43:35 , Most of your queries are subscriptions, they're not this

0:43:37 like from scratch Sort of thing.

0:43:38 Like, You're gonna subscribe, you're gonna like, You're gonna like, Set up

0:43:42 a query that subscribes to the slide list, set up a query that subscribes

0:43:45 to the component list on a slide.

0:43:48 And then as writes happen, you just want to update those rather than,

0:43:52 yeah, always, like, rather than having to have first class support

0:43:55 for running a query from scratch.

0:43:58 So, yeah, so that latter thing needs to be just fast enough, but

0:44:00 it doesn't need to be, like, Yeah, the fastest thing in the world.

0:44:04 I totally agree that there's a very interesting, different kinds of trade

0:44:08 off that should be explored and should be built that makes so much more sense for a

0:44:14 live active application where everything stays up to date and then changes

0:44:20 quite minimally as opposed to request response, which is how SQLite works today.

0:44:26 That being said, you can still get quite a bit of mileage out of SQLite today.

0:44:31 If you are a little bit more mindful about like how you work with it.

0:44:36 So some patterns that I found to Makes SQLite work for me with Overtone.

0:44:42 Is that what makes SQLite the slowest in a browser context is shoveling data

0:44:48 from within the SQLite database into JavaScript land and back and forth as

0:44:54 this is where you need to cross the memory boundaries from memory staying

0:44:59 within Wasm to how it's being like marshaled into JavaScript objects.

0:45:05 And what I found as a pattern that helps quite a lot is just keeping that small.

0:45:11 So if you need like a, and this is where I think you can apply quite a few of the

0:45:15 tricks that you've now mentioned and you embrace with Materialite, you can actually

0:45:20 bring back and layer on top of SQLite.

0:45:23 Probably still not quite as fast as Materialite.

0:45:27 But probably fast enough to make a use case like Overtone work within that

0:45:31 paradigm as well, that when the list changes, instead of like, getting a

0:45:36 new list of 5, 000 tracks for that playlist, every time those 5, 000

0:45:40 tracks, you just get a change in terms of like, Oh, this new track was added.

0:45:46 And getting that SQLite, I think can also be done with a few tricks namely

0:45:52 also like a temporary table where you can save the previous result of your

0:45:58 query before and after the change.

0:46:01 And then you query that in a diffing way.

0:46:05 And then you still need to sort of like work with that

0:46:08 diff information in JavaScript.

0:46:11 But this is how you can work around that performance challenge.

0:46:14 But your system with Materialite.

0:46:17 All of that is absorbed from you, which is very attractive.

0:46:20 Yeah, and I think right, so Materialite, right now, it can be backed by

0:46:24 an in memory collection, or it's just some little interface that's

0:46:28 called a source that you implement.

0:46:30 So I think, Yeah, SQLite could be that source, and then that would

0:46:33 solve the like, first query problem.

0:46:36 So I just compile the Materialite queries to SQL queries.

0:46:40 and then the incremental updates are maintained by Materialite going forward.

0:46:44 Yeah, obviously you have to like, listen to the transaction failures

0:46:46 and somehow roll those back.

0:46:48 But yeah, I think it's doable to like, Marry those two together and

0:46:51 have kind of the best of both worlds.

0:46:54 And I know there's like a project called GRDB.

0:46:56 I think it's like a iOS SQLite wrapper which adds a bunch of reactivity.

0:47:02 it's not like completely general.

0:47:04 It doesn't support like completely arbitrary SQL queries, at

0:47:06 least in an efficient way.

0:47:08 But I think it gets you close enough with the queries it does support.

0:47:11 One more thing about like, moving from the request response style to

0:47:15 like, queries being subscriptions.

0:47:17 It gives you these really cool opportunities to optimize

0:47:19 queries at a global level.

0:47:22 So like, if all your queries are subscriptions, you can see

0:47:24 what subscriptions are open.

0:47:26 And then you can start seeing which queries are either

0:47:29 identical and deduplicate or which queries share operations.

0:47:34 So like, maybe you have a whole bunch of queries that all do the same join.

0:47:37 While in the streaming system.

0:47:39 You can optimize that so that join happens once and then the results of the join are

0:47:44 fanned out to all the queries that use it.

0:47:46 So yeah, you can do some really neat tricks at a global level once you

0:47:49 model your queries or subscriptions.

0:47:50 Yeah, that's super interesting.

0:47:52 And a few of those things I'm exploring for LiveStore myself where they're

0:47:58 basically the problem of reactivity.

0:48:01 And efficiency, trying to keep the performance as good as possible,

0:48:05 but also squeezing that into a frame budget since LiveStore and Overton

0:48:10 all run within the main thread, which I'm not sure that's probably

0:48:14 also the way how MaterialLite works.

0:48:16 There is Interesting to also see it through a lens of like a different

0:48:21 paradigm shift evolution in the JavaScript front end world which is about signals.

0:48:27 And signals is also about like, I think a more lighter weight reactivity system

0:48:32 that is all about being composable.

0:48:35 And and I think those worlds are now finding, like, a way together

0:48:39 within live store with the prior work with Jeffrey and Nicholas on Riffle.

0:48:45 This was like some early implementations based on a paper called Adapt On.

0:48:50 And from my understanding all of like the research that has gone into adapt on.

0:48:56 And sort of like in parallel, what's been evolving as signals.

0:49:00 Those are actually the same thing.

0:49:02 And I think now we can do some really interesting work on combining the

0:49:07 power that like more substantial state management primitives, Like a database

0:49:13 gives us or something like material light gives us and bring that together

0:49:17 with something like a signal system that also kind of out of the box address the

0:49:23 thing you've just mentioned, which is reusing parts of the computational graph.

0:49:28 The only thing signals is missing.

0:49:29 I'm surprised nobody's done this yet.

0:49:31 Is Incremental computation against collections, right?

0:49:34 So you think of like, I have an array, and I map the array, then I filter

0:49:37 the array, then I reduce the array.

0:49:39 That's like creating a new copy of the array each time.

0:49:42 And if you like, add one element to the array, you like, rerun all

0:49:45 those against the full array again.

0:49:47 So it's kind of like the query from scratch case, right?

0:49:49 Whereas Materialite, right, You add an element to the array, well,

0:49:52 it's only going to run the filter on that one element and the map

0:49:55 on that one element rather than running against the full array.

0:49:58 So there's some interesting benchmarks maybe I can share with you after of

0:50:01 like, yeah, incrementally maintaining a map filter reduce pipeline

0:50:05 against an array of a million items.

0:50:07 Yeah, so it'd be really cool if like the signals implementers would even

0:50:12 Rocicorp

0:50:12 So even though your love affair with SQLite has for now come to an end.

0:50:18 I think the problems and goals you're chasing after are still very

0:50:22 similar to what I think is feasible with SQLite, just maybe with like

0:50:27 a different performance footprint.

0:50:29 I think with SQLite you also get the benefit.

0:50:32 of just that being like a very familiar and trusted thing that's

0:50:35 like been proven and battle tested for literally decades now.

0:50:40 And even though you said like you were running into some performance

0:50:44 issues SQLite is still so ridiculously fast that it's very feasible to do

0:50:49 write those applications in the, in a browser context, et cetera.

0:50:52 And I think even if you now on a modern CPU straddle your your

0:50:57 CPU, it's still manageable.

0:51:00 But I think there's multiple paths to arrive at the same goal.

0:51:05 And so as my understanding is that the project Materialite opened a couple of

0:51:11 really interesting follow up conversations for you which is now opening a new

0:51:16 chapter for your professional life.

0:51:19 So do you want to tell us a bit more about that?

0:51:21 Yeah, I guess, yeah, Materialite is my least known project,

0:51:25 least complete project.

0:51:27 But yeah, it did pick up some interest by Rocicorp specifically Aaron, who I

0:51:32 met at the local-first conference in St.

0:51:35 Louis.

0:51:36 And yeah, we're talking about, you know, the model of subscript, those queries as

0:51:39 subscriptions and like having a reactive app and I think he's the one that said

0:51:42 like, oh, things need to be memory fast.

0:51:45 And yeah, so, so I started working on MaterialLite and we did some explorations

0:51:49 of like, if you powered one of their demo apps with MaterialLite, like they

0:51:54 have a linear clone that's a demo app and they want to see like, Oh,

0:51:56 how many issues can we scale this to?

0:51:58 If we like use this differential data flow approach.

0:52:02 Yeah.

0:52:02 And they really liked kind of the initial results.

0:52:05 And yeah, we worked together, I guess.

0:52:08 quite a bit over the last four slash six months.

0:52:10 There was like a two month break for me on paternity leave.

0:52:13 And yeah, so I'll be starting a job with Rocicorp here pretty soon as a partner.

0:52:18 Yeah, I'm super excited about that.

0:52:20 That's amazing.

0:52:21 I mean, we had Aaron on the show.

0:52:23 I think that the second show for localfirst.fm, and there were so many

0:52:28 deep insights there by Aaron, and I'm a big fan of the products that's already

0:52:33 been built there, Replicache, Reflect, and sounds like your work with Materialite

0:52:40 is also going to be incorporated into making the products even better.

0:52:44 So I'm curious to hear more of your thoughts on you know, what do you feel

0:52:49 like where's local-first going with the products you're working on with Rosicorp?

0:52:55 Local-first provides a superior DevX because you're no longer

0:52:59 worried about APIs of, like, how do I get data from the server?

0:53:03 Like, a sync engine, or your database is solving, you have a sync engine solving

0:53:07 that for you, right, where you're just coding against local data rather than

0:53:11 setting up all these weird REST APIs that you have to call to get the data.

0:53:14 And yeah, I think that's the key value of local-first and the main thing

0:53:19 that Rocicorp is pushing forward.

0:53:21 It's like making that sync engine experience as best, as

0:53:25 Picking the right tool

0:53:25 So you're now joining Rocicorp where you're building Replicache and Reflect,

0:53:30 but the local-first space has grown quite significantly to a point that

0:53:34 can be almost a bit disorienting.

0:53:36 For newcomers who are trying to first figure out what does it mean to build

0:53:41 the app local-first and then also choose the right tool for the job.

0:53:45 Can you provide a little bit of guidance which how someone should think about

0:53:49 use cases and picking the right tool?

0:53:52 Yeah.

0:53:52 So I guess maybe when you're somebody, when the first question somebody

0:53:56 asks, should ask themselves is, "Do they want servers involved at all?"

0:54:01 Is the first question.

0:54:02 So, like, if you want all sync to go through a central authoritative

0:54:06 server, and you want that server to be able to override decisions made

0:54:10 by the client apply data integrity rules revert changes made by a client

0:54:16 Replicash is a very good choice for that.

0:54:18 If you want something, Where, you know, you don't want to preclude the

0:54:22 ability for peer, for nodes to sync peer to peer, or maybe the data they're

0:54:27 syncing is end to end encrypted.

0:54:29 I think something like CR-SQLite actually shines there.

0:54:33 CR-SQLite lets you merge in any topology you want, whether it be

0:54:37 peer to peer or hub and spoke.

0:54:39 So we've seen a couple of companies that are, have like end to end encryption,

0:54:44 where they use CR-SQLite to sync, or they have literally a pub sub

0:54:48 channel, where all peers just publish their messages, and any peer who

0:54:51 receives the messages integrates it.

0:54:54 In those use cases and then I guess if you're really looking for, like,

0:54:57 if your data model is like document based, I guess going back to syncing,

0:55:01 so not just CR-SQLite for peer to peer, but also Yjs and AutoMerge

0:55:05 is capable of peer to peer sync.

0:55:07 They're also, of course, if you can sync peer to peer, you can

0:55:09 sync through central server too.

0:55:11 If you're looking for, you know, you just want to support a collaborative text

0:55:16 field collaborative text doc and you don't need, you know, ways to query over your

0:55:22 documents, like you're fine just storing this doc in a blob in your Whatever your

0:55:26 storage is you know, Yjs is a very good choice because it's primarily targeting

0:55:30 it at text editing, collaborative text but they don't have, you know,

0:55:33 facilities like built in for persistence or querying and these sorts of things.

0:55:38 So I haven't kept super up to date with AutoMerge, but from what I've seen if

0:55:42 you want like, the ability to fork and look at different portions of history.

0:55:47 So like, say people are editing a document somebody wants to fork their

0:55:50 document off, make a bunch of changes and merge the fork back in I mean, I see

0:55:55 AutoMerge talks about this concept a lot.

0:55:57 So I think if it's not there already, AutoMerge would be your best bet for some.

0:56:03 for that sort of behavior.

0:56:04 Yeah, then there's ElectricSQL.

0:56:06 So that is like a, you know, Postgres centric solution.

0:56:11 So if your database is Postgres and you're okay with, like, merging based on CRDT

0:56:17 rules You know, that's a good choice.

0:56:18 So yeah, so I say like your back end is Postgres and you're

0:56:21 okay like using CRDT rules.

0:56:23 Like yeah, what does that mean?

0:56:24 So a really interesting thing about Replicache is it can work with any

0:56:27 back end, pretty much any back end.

0:56:30 So Postgres, MySQL, all sorts of things.

0:56:34 And the way this works is that it uses sort of a rebase model where , the

0:56:39 developer, defines their mutations.

0:56:41 And those mutations get run in a specific order on the server and clients when

0:56:47 they receive updates from the server, any mutations they have outstanding.

0:56:50 They apply the server updates and any mutations that are

0:56:52 outstanding, they rebase on top.

0:56:54 And this is like super flexible in that you can write any, like, if you want to

0:56:59 change merge and conflict behavior, well, you just write your mutations however

0:57:03 you like to get the behavior you want.

0:57:05 Whereas something like CR-SQLite or electricsql to get the merge

0:57:09 behavior you want, you have to pick specific CRDT types.

0:57:12 And it, it seems a little bit less flexible since you can't

0:57:15 write like a custom mutation.

0:57:16 But ElectricSQL does seem to have a pretty good end to end story of like,

0:57:19 you set up your Postgres DB, that schema gets replicated down into a

0:57:22 local SQLite DB, and as you make schema changes upstream, they flow downstream.

0:57:27 So in terms of like, Having a good story between two specific

0:57:32 databases it seems pretty good.

0:57:34 And then I guess the last thing is PowerSync.

0:57:37 So PowerSync, uh, works on a different model that's not rebase or CRDTs.

0:57:43 So with PowerSync, a client will accumulate changes, and if a

0:57:47 client has local changes, it won't take any changes from the server.

0:57:50 So in the rebase model, if a client has changes, it'll take

0:57:53 changes from the server and then replay its local changes on top.

0:57:56 Uh, but in the PowerSync model, the client has local changes, it won't take any

0:57:59 server changes until those local changes have been incorporated by the server,

0:58:02 and then it can take server changes.

0:58:04 Um, So yeah, to me, I've never built an app with PowerSync, but this seems like

0:58:07 it'd be a lot higher latency, uh, for the syncing, because if you have any

0:58:12 outstanding local changes, you simply cannot take a change from the server.

0:58:15 Whereas rebase model, if you have outstanding changes, it's fine.

0:58:18 You get the server changes, you just replay your local changes

0:58:21 on top of the server changes.

0:58:22 That's a really great overview, and there's many other tools that

0:58:26 more more tools than we have time to exhaustively can cover here.

0:58:31 But I think that's a really great orientation, particularly with Asking

0:58:34 yourself the first question, like where should the authority originate from?

0:58:38 Do you want to have a server?

0:58:40 That's, I guess, more traditional how web apps work today.

0:58:43 And that can simplify things a lot because in that server you can enforce if there's

0:58:48 like a merge conflict or if there's like maybe a user has set some data in

0:58:54 a way That might be not compliant with the app that you want to build or for

0:58:59 other simplification reasons, et cetera.

0:59:02 So I think picking that, where does the authority live?

0:59:04 Is there an authority?

0:59:06 And , which role should a server play?

0:59:09 I think that's a really important one.

0:59:11 And then also the two kinds of like ways to go about syncing or

0:59:15 at least two ways going more about certain spacing, everything on CRDTs.

0:59:21 Or going with a event rebase model, our Replicash is working.

0:59:27 I'm curious how someone who has used neither and might have not really

0:59:32 a lot of intuition for what are the implications of choosing one or the other.

0:59:37 Can you provide a little bit of an intuition which kind of app use cases

0:59:42 are a good fit for one or the other?

0:59:44 Yeah.

0:59:44 So with CRDTs, you don't.

0:59:47 Have code on the right path, like the CRDT algorithm is going to

0:59:53 determine, I guess I should say you don't have code on the merge path.

0:59:55 The CRDT algorithm is going to determine how things merge.

0:59:58 And you're just picking a set of CRDT algorithms where something like Replicash,

1:00:02 where you write custom mutators.

1:00:04 You can write a set of mutators that run on the client, and you can write mutators

1:00:08 with the same name, same args, but do something totally different on the server.

1:00:11 Which, one it's, since you're writing code in the mutation, it's giving you control

1:00:16 over, kind of, how things merge and sync.

1:00:20 And two, since The code doesn't have to be identical on every node because

1:00:24 the server is the authoritative answer.

1:00:26 You could do something in the server mutation like check permissions or,

1:00:30 you know, set last modified times or something from the server's

1:00:33 time rather than the client's time.

1:00:35 Yeah, but I guess the one downside to that is a CRDT, like, you can sync messages in

1:00:40 any order, like, yeah, if you have message A, B, and C, if you sync them B, C, A,

1:00:46 or C, B, A, like, you always get the same state, which means that you don't have to

1:00:50 have a single server being the authority.

1:00:52 You can have every peer be on equal footing and essentially have no server

1:00:56 whereas like this, you know, Replicache model, you do have to have, Someone

1:01:00 that is that authority that's going to provide the total ordering of messages.

1:01:04 And to me, it sounds like at least the latter, the rebasing model with

1:01:09 that authority living in a server, for example, that is a bit easier to get

1:01:15 into less of a radical shift of like thinking, changing the way, how you

1:01:20 think about data modeling, et cetera, and it's probably the easier one to

1:01:25 get into and might be a better catch all solution for many app use cases.

1:01:30 Like if you look at yeah, I think it's the easier, it's like, it's more

1:01:33 familiar and if you have an existing app that you're trying to add multiplayer

1:01:38 to Like, yeah this can work on your existing data model without much

1:01:41 changes or with little to no changes.

1:01:44 Whereas, oh, you want to switch to CRDTs, like you might have to make a new data

1:01:47 model specifically for this collaboration thing and migrate your stuff over.

1:01:51 Outro

1:01:51 Right.

1:01:51 And I mean, the advice I would give on top of that is that.

1:01:55 There are now like so many interesting and easy to use options out there,

1:02:00 whether it's Replicache, whether it's AutoMerge or others just maybe try

1:02:04 building the same mini version of your app in different technology stacks,

1:02:09 see what works best for you, and you'll develop that intuition just by yourself.

1:02:14 And I think that's a really fun thing to do.

1:02:16 Cool.

1:02:17 Matt, thank you so much for sharing about like all those different

1:02:21 projects you've been involved with.

1:02:22 I'm really excited to see what you'll be building at Rosicorp.

1:02:27 Is there anything else you want to share with the audience now?

1:02:30 Yeah, just stay tuned.

1:02:31 We're building the next.

1:02:32 Kind of iteration of Replicache and Reflect.

1:02:34 And , it was exciting enough to make me like drop everything I was doing

1:02:39 and join the team and yeah, go back to, I yeah, work, working for somebody,

1:02:44 I guess, rather than just, you know, following whatever research interests,

1:02:49 wherever they took me from day to day.

1:02:50 So yeah, it's, I think what they're building is super compelling and

1:02:53 You'll hear more, you know, over the summer and coming months.

1:02:57 That's incredible.

1:02:59 I'm really excited to hear more about what you all have been working on.

1:03:03 And yeah, thank you so much for coming on the show.

1:03:06 Yeah.

1:03:06 Thanks for having me.

1:03:08 Thank you for listening to the localfirst.fm podcast.

1:03:11 If you've enjoyed this episode and haven't done so already, please subscribe and

1:03:15 leave a review wherever you're listening.

1:03:17 Please also tell your friends about it.

1:03:18 If you think they could be interested in local-first, if you have feedback,

1:03:22 questions or ideas for the podcast, please get in touch via hello at

1:03:26 localfirst.fm or use the feedback form on our website, special thanks to Expo and

1:03:31 Crab Nebula for supporting this podcast.