localfirst.fm
All episodes
June 12, 2024

#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance

#10 – Matt Wonlaw: cr-sqlite, syncing strategies and incremental view maintenance
Sponsored byExpoRocicorp
Show notes

Transcript

0:00:00 Intro
0:00:00 So I think it's a lot easier to develop when your queries are
0:00:03 co located with your components.
0:00:05 So every component is responsible for getting the data it needs.
0:00:08 And this makes your app like super composable, right?
0:00:11 Like you can add and remove components to your app or your component tree.
0:00:16 And because they're responsible for getting their data, like, you
0:00:19 don't have to do any additional wiring or prop drilling.
0:00:21 Welcome to the localfirst.fm podcast.
0:00:24 I'm your host, Johannes Schickling, and I'm a web developer, a
0:00:27 startup founder, and love the craft of software engineering.
0:00:30 For the past few years, I've been on a journey to build a modern, high quality
0:00:34 music app using web technologies.
0:00:37 And in doing so, I've been falling down the rabbit hole of local-first software.
0:00:41 This podcast is your invitation to join me in that journey.
0:00:44 In this episode, I'm speaking to Matt Wanlaw, a prolific local-first tool
0:00:48 builder who's behind projects such as Vlcn, CR-SQLite, and Materialite.
0:00:53 Most recently, Matt also joined Rocicorp to work on their new product.
0:00:57 In this conversation, we go deep on his projects covering CRDTs, SQLite,
0:01:02 and incremental view maintenance.
0:01:04 Before getting started, also a big thank you to Rocicorp and
0:01:08 Expo for supporting this podcast.
0:01:10 And now my interview with Matt.
0:01:13 Hey Matt, welcome to the show.
0:01:14 How are you doing?
0:01:16 Good.
0:01:16 Yeah.
0:01:17 Thanks for having me.
0:01:17 I'm super excited to have you on the show.
0:01:20 You and I have been collaborating on a few projects now over the
0:01:23 course of the last few years.
0:01:25 Most notably, you've been working on CR-SQLite and also
0:01:29 Materialite most recently.
0:01:31 But for those of you who don't know Matt would you briefly
0:01:35 mind introducing yourself?
0:01:36 Yeah um, Matt Wonlaw, and I guess I've been involved in the local-first
0:01:41 community for about two years now.
0:01:43 It originally started with me trying to resurrect an ancient project
0:01:46 of mine and trying to add, like, multiplayer and sync between devices.
0:01:51 And I saw, like, that's actually a pretty hard problem, and there weren't, like,
0:01:54 many off the shelf solutions, so then I just, like, dove into this , and that's
0:01:59 where all these projects like CR-SQLite, Materialite and other things came from.
0:02:02 Yeah, and I think this is also how you and I have met quite a while ago.
0:02:07 And before that, you've been working at Facebook for many years.
0:02:10 And even before that, I think you had your first engineering role
0:02:13 working on some software that by definition had to be very local.
0:02:18 There was no local-first yet.
0:02:20 Would you mind telling us a little bit more about what those first apps were that
0:02:26 Lockheed Martin
0:02:26 Yeah, so like my first job out of college was at a defense contractor
0:02:31 called Lockheed Martin, but essentially what we were doing was we were building
0:02:35 software for submarines, so like U.
0:02:37 S.
0:02:37 Navy submarines, and yeah, obviously a submarine has no internet connection,
0:02:41 everything has to be self contained, but even beyond that, so the software I built
0:02:45 was for, Like these thick clients for the workstations that sonar operators used.
0:02:50 And, you know, there, there are servers on the sub, you know, servers that
0:02:54 provide storage and some that provide the signal processing and whatnot.
0:02:58 But like an extra requirement was, you know, a sub is in a hostile environment.
0:03:03 Things can break.
0:03:04 So the work, workstations need to be able to work even if all the servers were down.
0:03:08 So, you know, they need to hold the last 24 hours of data.
0:03:10 So the sonar operators can, you know.
0:03:12 If every other system's down, they can at least work with the last
0:03:16 24 hours of data that they had.
0:03:18 So that, you know, I guess was my first brief introduction to local
0:03:23 apps that eventually do need to do some sort of syncing once connections
0:03:26 are re established and, you know, reconcile the changes that were made.
0:03:29 That's fascinating.
0:03:30 Have you learned some patterns back then that you still see out in the wild today?
0:03:36 Or do you think the approach that you've taken back then is entirely
0:03:39 different compared to today?
0:03:41 Uh, Back then the data was very well segmented.
0:03:44 So there was like never really conflicts.
0:03:46 But event sourcing was the main pattern.
0:03:49 So just have an event log of everything that has occurred.
0:03:52 And when connectivity is reestablished, do you know.
0:03:55 Play that event log forward from, you know, the time you lost
0:03:57 connection to the current time.
0:03:59 Yeah.
0:04:00 I mean, event sourcing, I feel like is still a super active topic.
0:04:04 I'm actually looking into it much more closely myself for the use case of
0:04:09 building Overtone, and I'm also exploring to embrace it a bit more for like syncing
0:04:14 and migration strategy for LiveStore.
0:04:16 But I also think that there might be a renaissance for event sourcing.
0:04:20 We had it for a while with Redux in a much more self contained space, but I
0:04:26 think there's a, that's an interesting topic for research and production as well.
0:04:31 Curious to hear your thoughts on event sourcing and how it
0:04:34 could fit into local-first.
0:04:36 I mean, I really like it.
0:04:37 , you have a source of truth, like of all the facts, all
0:04:41 the things that ever happened.
0:04:43 Whereas, you know, a regular app or state or database that
0:04:46 doesn't do event sourcing.
0:04:48 You're losing all the things that happen, right?
0:04:51 And you just have this snapshot of what things are right at this instant.
0:04:55 So yeah there's no way to rewind.
0:04:57 There's no way to, merge other people's changes if they did come in the past.
0:05:01 , so I guess event sourcing, you know, probably fell out of favor in some
0:05:05 ways because, you know, storage requirements, like are you really
0:05:08 going to store all the event logs?
0:05:10 And then also, you know, processing those event logs.
0:05:12 You do eventually need to process them into some you know, snapshot of
0:05:16 state so your app can work with it.
0:05:18 So I think, you know, people seeing that, oh, I have to process the events
0:05:20 and turn it into like these tables anyway, like why even do the event logs?
0:05:23 Why not just mutate in place?
0:05:25 Meta
0:05:25 Yeah, I think in case anyone of the listeners is interested in local-first
0:05:30 and event sourcing, please get in touch.
0:05:32 I think this is a very interesting topic to research a little bit After
0:05:37 building those apps for like submarines which sounds super fascinating, you've
0:05:43 at some point also decided to move on and join Facebook nowadays, Meta.
0:05:49 I'm curious to hear more what you've been working on there and
0:05:52 whether there was any intersection with your local-first topics now.
0:05:58 Yeah.
0:05:58 So Meta,
0:05:59 , I guess, like, it shaped the way I approached the local-first problems after
0:06:04 I left, but the problems there were very , , not related to local-first at all.
0:06:08 So I guess, yeah, what I worked on, it was like three different teams, but
0:06:11 they all kind of shared the same theme.
0:06:12 And a lot of what I was doing, we had a system called Download Your Information.
0:06:16 So this is like any Facebook user can go click a button and get
0:06:19 an archive of all their data.
0:06:21 So we'd have to like crawl your entire Facebook graph
0:06:25 and find Everything you own.
0:06:27 So we call it the ownership graph.
0:06:29 And then another thing one of my team's built was the deletion framework.
0:06:33 So this is like you delete your account.
0:06:35 So go delete every post you've ever uploaded, every like you've ever
0:06:38 made, every comment, every like ad you've ever run every message you've
0:06:43 ever sent across all the products, Instagram, Messenger, Facebook.
0:06:46 What's I guess not WhatsApp.
0:06:47 And then, yeah, so that, that was like traversing this deletion graph.
0:06:50 And then a third product, like, variation on all these themes was
0:06:54 like investigations and human review.
0:06:56 So, like, obviously there's lots of abuse on Facebook.
0:06:59 And we had to have a way for, like, every product that's ever created, like,
0:07:03 Can we make their content reviewable by like a human review team at Meta?
0:07:08 So this is like, yeah, trying to fan out from the content and traverse the graph
0:07:12 of things associated with it for review.
0:07:14 So I guess like, all this like graph walking was very, goes
0:07:18 very hand in hand with query languages and also schematization.
0:07:23 So, I guess I mentioned like three different areas of like
0:07:27 quote unquote compliance, right?
0:07:28 Deletion, download your information, and content review.
0:07:32 Well, if you're a product team at Meta, you don't want to
0:07:35 be slowed down with having to integrate into all these systems.
0:07:37 You don't have to learn about them.
0:07:38 You just want to like ship your product.
0:07:40 So like our key focus was how can we make it so developers can
0:07:43 just make their product and not have to understand these things.
0:07:46 So we kind of, we built it into the schema language.
0:07:49 So as you're defining your schema for your product, you can say like for a
0:07:53 given edge, like this is an ownership edge, or this is like a deletion edge.
0:07:57 And then just like by declaratively specifying all this stuff all these
0:08:01 other systems could just work.
0:08:03 And.
0:08:03 Yeah, eventually when I pivoted to local-first problem, I was like, wow, this
0:08:06 like developer experience of everything being declarative and the schema layer
0:08:12 down, taking care of it for you maybe we can do this for local-first too.
0:08:16 I couldn't agree more.
0:08:17 Did you eventually land on something where you got those declarative
0:08:24 CR-SQLite
0:08:24 Yeah, so, the project I ended up working on was called CR-SQLite,
0:08:29 so I was, like, taking these ideas and adding them to SQLite.
0:08:32 So, yeah, SQLite's already, you know, a relational database, and it's already
0:08:36 pretty declarative in terms of you can define your schema for your table,
0:08:40 and you can specify an index, and you can specify foreign key relations, and
0:08:43 the database manages all that for you.
0:08:45 But it didn't have any primitives for collaboration, right?
0:08:49 So, like What if I give you a copy of my database and I have a copy of my
0:08:54 own database and we both go offline and we both make a bunch of changes?
0:08:58 What if we want to like sync our changes together or merge our DBs?
0:09:01 So what I started adding to SQLite was this concept of CRDTs.
0:09:05 So you could say for a given table, what type of CRDT should we modeled as?
0:09:10 Like if it's a grow only set, or if it's like an add or move set, like
0:09:15 technical terms in CRDT literature.
0:09:17 And then for specific columns in the table, you could also say
0:09:21 like, what sort of CRDTs should be used to merge those columns.
0:09:24 So yeah, the idea was any developer who has an app backed by SQLite, they can
0:09:30 just go in and make some schema changes.
0:09:33 And then that app can become collaborative by allowing merging
0:09:36 databases with other collaborators.
0:09:39 Got it.
0:09:39 And I think the parallel there is that many things about SQL can be also seen
0:09:44 as rather declarative and the schema modeling, et cetera, those is where
0:09:49 I think what we gain with local-first software is the the collaborative
0:09:53 nature of the apps working together.
0:09:56 And I think this is where you.
0:09:58 Brought some of the nice experiences that you've seen at Meta, where there's
0:10:02 such a strong engineering culture for having a great developer experience,
0:10:07 bringing that to the new topic that you were interested in, local-first.
0:10:12 Meta had a really interesting philosophy when I got there.
0:10:15 I think like one of my first questions I asked was like, oh, how do we like
0:10:18 make sure people use our stuff that our teams, our security infrastructure?
0:10:22 And they're building things to make sure Meta's code was more secure.
0:10:25 That was my first team.
0:10:26 And like, security at Lockheed was always Everybody hated it.
0:10:29 It was like, nobody wants to do it.
0:10:31 And there's always this like mandated thing.
0:10:33 So I got to Facebook.
0:10:35 I'm like, how are we, I'm on security for, how are we going to
0:10:37 make sure people use our stuff?
0:10:39 And it was interesting.
0:10:40 Like the manager at the time was just like, yeah, like we can't do anything.
0:10:45 Facebook's bottom's up.
0:10:46 We can't do anything.
0:10:47 Make sure anybody uses our stuff.
0:10:48 The only way we can do it is make the safe way, the default, and easiest way.
0:10:53 Like, people are going to choose the secure way because
0:10:55 we've made it the easiest.
0:10:57 And like, there's no other way we can make them choose it.
0:10:59 So yeah, there's a huge, always a huge focus on everything we did from DevX.
0:11:03 So, that like, developers would just pick that solution.
0:11:07 Because it was easiest, not necessarily because it was secure, just because
0:11:09 it was the easiest thing around.
0:11:10 And then it was also secure.
0:11:12 Which is like the nice added benefit.
0:11:14 Strut.io
0:11:14 So after having worked for many years at Facebook, you've later then started
0:11:19 working on CR-SQLite, but you didn't arrive at CR-SQLite immediately.
0:11:24 I think you were for a little bit also still working on
0:11:28 bringing back an app you've been working on before called Strut.
0:11:32 So I'd be curious to hear a little bit more about that and particularly
0:11:36 given that you worked on it quite a bit before you joined Facebook and then
0:11:41 you started working on it again after.
0:11:43 So tell me more about the app and how that led you to local-first.
0:11:47 Yeah.
0:11:47 So it was my first, I guess, web app I was building.
0:11:51 So yeah, at Lockheed, I was doing all these like Java rich clients,
0:11:54 and I wanted to get into web dev.
0:11:56 So I started building this thing called Strut.io.
0:11:58 It's a presentation editor.
0:12:00 And I knew people's devices, you know, they have storage and
0:12:03 compute, and the browser had, you know, some storage back then.
0:12:07 And I didn't want to, like, run servers for people.
0:12:09 I just wanted them to be able to work and edit presentations if
0:12:13 they're online or offline and not have to sign up for an account.
0:12:17 Like you go to the site and you can immediately start doing stuff.
0:12:20 So yeah, like I guess without knowing it, I was building a presentation editor that
0:12:25 was, you know, local-first in some ways.
0:12:28 And at that time, it was like 2011, 2012, there was, you know, I guess the prequel
0:12:33 to the local-first movement, there's like remoteStorage.io and ownCloud.
0:12:37 And I started getting involved with those, but then, yeah, eventually
0:12:40 I got the Facebook offer and I just put Strut on the shelf to be
0:12:43 resurrected sometime in the future.
0:12:45 And then, yeah, when I was leaving Meta or Facebook, yeah, I didn't
0:12:49 know exactly what I was going to do.
0:12:50 So I figured, oh, I can just resurrect Strut.io real quick and you know,
0:12:54 it'll bring in some side income.
0:12:56 In the early days before I abandoned it, it had like 10, 000 monthly actives.
0:12:59 So I was like, Oh maybe a decade later we can get back to that again.
0:13:03 But yeah, as I resurrected it, kind of the landscape had changed a bit.
0:13:06 Like people had a lot more devices and also multiplayer was an expectation where
0:13:12 like, You know, Google Slides, right?
0:13:13 You can have multiple people on the same deck.
0:13:16 So I was, you know, looking into how I was going to solve that problem for Strut.
0:13:19 And, you know, rather than I guess I've been building frameworks for so long at
0:13:23 Meta, like, rather than making this a one off thing in Strut to solve the problem.
0:13:29 I was like, Oh, like, can I solve it at a lower level?
0:13:32 And that's when I went down the rabbit hole of CRDTs investigating SQLite,
0:13:36 adding stuff to SQLite and so on.
0:13:40 So the data architecture you had for the first version of Strut around, like,
0:13:45 yeah, before Facebook, you mentioned like around 2012 to when you've left
0:13:50 Facebook and what you've arrived at.
0:13:53 Very different.
0:13:54 So I'm curious, like how you went from like whether you took rather
0:13:59 some smaller steps or some much bigger steps altogether to rethink the data
0:14:05 architecture and how you went about that.
0:14:07 Yeah, I mean, I guess beyond just collaboration, I wanted to support
0:14:12 Yeah, arbitrarily large presentations.
0:14:15 So right, the original version of Strut, it would load everything off disk and
0:14:19 into memory, and then when you're closing the tab, or periodically, it would save.
0:14:23 A big dump of memory to disk and yeah, I think that works fine, probably
0:14:27 for the vast majority of cases.
0:14:29 But I wanted to be able to handle presentations with hundreds of slides
0:14:33 and tons of binary content or like images and videos and all sorts of things.
0:14:39 So like at first I was like, okay, how do I like lazily load this stuff off of disk?
0:14:43 And, you know, as you're starting to implement that in your own data
0:14:46 model, you're like, Oh, well, this is kind of like a database, which,
0:14:50 you know, knows how to page in and out things in and out for me.
0:14:54 So I think, you know, that was also an indication that, okay, like, something
0:14:58 more like a database would be good.
0:15:00 And then I don't know how I found that SQLite was better.
0:15:04 I guess I saw James Long's Absurd-SQL and Actual Budget.
0:15:08 That's when I first realized that, yeah, like, I mean, I'd always known Wasm was
0:15:12 a thing but that like SQLite was being compiled to Wasm and could run in the
0:15:16 browser and it was actually faster than IndexedDB when you were, I guess there's
0:15:21 some caveats there, but SQLite would store to IndexedDB, but it would store pages.
0:15:26 So it would store four kilobyte chunks.
0:15:28 IndexedDB, when you're like.
0:15:29 Reading things in and out individually, like single objects
0:15:32 by key value is like super slow.
0:15:34 And the fact that like, okay, you're using SQLite and reading in chunks
0:15:38 rather than individual objects, like it would be faster than IndexedDB.
0:15:42 Like if you change your IndexedDB access to be paged, okay, it
0:15:45 could be faster in SQLite.
0:15:48 SQLite, you can still do point queries and be faster than a point query in IndexedDB.
0:15:52 So that was something else that turned me on, like, oh, maybe I should use SQLite
0:15:55 in the browser rather than IndexedDB, so I don't have to implement all this
0:15:58 crazy stuff to make IndexedDB go fast.
0:16:00 And then, you know, there's all the problem of, like, collaboration still
0:16:04 and my justification for picking SQLite then was, you know, not only is SQLite
0:16:09 running the browser now, if I ever want to take this app and make it native
0:16:14 well, SQLite, Everything I build for collaboration in the SQLite, like it'll
0:16:18 work on native, it'll work on desktop it'll work in the cloud if you want
0:16:24 some like cloud peer or something rather than like a JavaScript only solution.
0:16:29 Choosing the right CRDT
0:16:29 So you made the decision, SQLite is the way forward, SQLite on top of IndexDB or
0:16:34 on top of like some persistence mechanism, but you now also need collaboration.
0:16:39 You looked a little bit into CRDTs, but you still decided
0:16:42 instead of going CRDTs first in terms of the developer experience.
0:16:46 You still want to give a developer yourself SQLite as the primary way to
0:16:52 think about the developer experience, but you started to use CRDTs or you
0:16:57 wanted to use CRDTs as a implementation detail to bring into SQLite.
0:17:02 So how did you go about that?
0:17:03 How did you solve that puzzle of making SQLite collaborative out of the box?
0:17:09 Yeah, so SQLite has a bunch of mechanisms for extending it.
0:17:13 You can make new virtual tables.
0:17:16 yeah, a virtual table is essentially a table.
0:17:18 It presents itself as a table in SQLite, except that The implementation of that
0:17:24 table is any arbitrary code you want to write, so C code, Rust code, whatever.
0:17:28 You can make functions, new functions, like a mat, you know, like min or max
0:17:32 or whatever function you want to define.
0:17:34 You can make new virtual file systems, so it's pretty extensible
0:17:38 from the plugin side of things.
0:17:39 And yeah, one of the teams I had worked with at Meta, they built this thing called
0:17:44 osquery way back, I think, You know, some other company may have seen osquery
0:17:48 now, but it basically lets you query all details about your OS through SQLite,
0:17:53 and they did that through virtual tables.
0:17:55 So yeah, for adding CRDTs to SQLite,
0:17:57 it was kind of a combination of a bunch of special functions.
0:18:01 That like when you create a table so, plus special functions
0:18:06 and special virtual tables.
0:18:07 So you can like, create a table, and then you can call a function that
0:18:11 converts that table to a CRR, or Conflict-Free Replicated Relation.
0:18:16 So just a fancy name for a table that's a CRDT.
0:18:20 Or you could use this like virtual table syntax where you're like, I want
0:18:22 to create this table as a CRR and then you list all the columns and in the
0:18:31 columns, so when you're creating a virtual table, like you can customize that
0:18:34 syntax all you want in your extension.
0:18:37 So I could like have a slot for custom data types.
0:18:40 So you can say this is a string.
0:18:42 And then like another addition to the data type is like the CRDT type.
0:18:46 So it's not only text, it's like a perryText, or it's like, not just an
0:18:50 int, it's an int that's a counter.
0:18:53 Or it's not just a date, it's a date that slash right wins.
0:18:56 Yeah, and then and then I guess the last thing, right, like, it's great
0:19:00 to have a database that's, you know, a CRDT but how do you merge efficiently?
0:19:06 . There had been some earlier work, actually by a guy named Iver and we actually worked
0:19:11 together pretty early on, on CR-SQLite.
0:19:15 He did his master's thesis on essentially adding CRDTs to SQLite but some of the,
0:19:20 like, Downsides of like that original approach was there's, there was no
0:19:24 way to incrementally merge databases.
0:19:25 You had to literally send the entire copy of the SQLiteDB to
0:19:29 somebody else, and it would like scan every row and merge them all.
0:19:32 So yeah, one of the key problems we had to solve was like how do we
0:19:34 incrementally sync And that was done essentially like implementing a global
0:19:41 version that gets tagged on every row and a virtual table that knows how
0:19:45 to find all the rows past the given version and just send you those to sync.
0:19:51 Was there any prior art to combining a relational database and CRDTs or
0:19:58 another mechanism to synchronize?
0:20:00 Yeah, so there's Iver's work, and then, I guess, James Long's work on
0:20:05 well, Absurd-SQL, but I guess Yeah, I don't know if he ever gave a name to
0:20:08 the layer above it that did the CRDTs.
0:20:10 But yeah, James work, if I recall correctly, was all in JavaScript.
0:20:15 So it wasn't built into SQLite itself.
0:20:17 You couldn't just run it anywhere SQLite could run.
0:20:20 It had to be in a JavaScript environment.
0:20:22 And he took a different approach.
0:20:24 I think he used hybrid logical clocks and Merkle DAG.
0:20:29 So yeah, I guess we started getting some trade offs of like,
0:20:32 Strut was the first use case.
0:20:34 Which actually informed a lot of design decisions for CR-SQLite and why I
0:20:37 didn't want to use something like James Long's approach or, I don't know other
0:20:41 solutions that were available at the time.
0:20:43 Yeah.
0:20:44 I'm curious to hear more about your your judgment of the trade
0:20:48 offs and what made it a good versus not so good fit for your use case.
0:20:52 Yeah.
0:20:52 So I think James's approach yeah, I hope I'm remembering everything correctly.
0:20:55 This was like a couple of years ago, but when I looked into it, every
0:20:58 single message for all time was stored.
0:21:01 So this is, you know, kind of like event sourcing, right?
0:21:03 But like, one of my key requirements, like, I was using Strut to drive
0:21:08 requirements for CR-SQLite.
0:21:10 And I wanted the ability for people to be offline as long as they wanted.
0:21:14 For it to be able to run on constrained, low memory devices.
0:21:18 And so those two in combination with the fact that it's like a presentation
0:21:22 editor where you can like, you can drag and drop components, you can spin them,
0:21:26 rotate them type, like I didn't want every single event for every keystroke
0:21:31 and every drag and all these things to be logged for all time, right?
0:21:35 Especially if you're offline indefinitely.
0:21:37 So if you're offline for like, I don't know, a couple days editing
0:21:41 a really big presentation, you come back offline, I want it to
0:21:43 be able to sync almost instantly.
0:21:45 So the approach I chose was the set of CRDTs that allow you to collapse history.
0:21:51 So, yeah, if you've been off The database size never grows more than
0:21:56 a constant factor of the base data.
0:21:59 So no matter how much you're editing, you're always going to be within a
0:22:01 constant factor of the regular DB size.
0:22:04 I don't know, I think that constant factor was like three or four times,
0:22:07 I don't recall the exact number.
0:22:09 Whereas the other systems, if you're the longer offline, the more and
0:22:12 more the data grew which was, you know, not a trade off I liked.
0:22:15 Yeah, I guess the downside to my approach is there's a few less
0:22:18 transactional guarantees and you can't, you know, rewind history.
0:22:22 But if somebody really wanted rewinding history, I was like, Oh,
0:22:24 they could do it in user space.
0:22:26 Like, that's not the problem I'm trying to solve.
0:22:28 That makes perfect sense.
0:22:29 And I think I really like, you focusing on the use case that you're most familiar
0:22:34 with Strut and the requirements around that, Since we're dealing with like
0:22:38 distributed systems and programming is hard enough, that you can't really
0:22:43 focus on all use cases at the same time.
0:22:47 And this way you design for the use case that you know of.
0:22:51 And I think this is where you have like a canvas app or something like that
0:22:55 where you move things around constantly.
0:22:58 If you work for multiple days, you don't really care exactly about
0:23:02 the change you've made over like five seconds, three days ago.
0:23:06 You just care about, okay, this is the state it is in right now and that
0:23:10 other users converge to the same thing.
0:23:13 Whereas I think a different set of requirements might be for a chat app
0:23:19 where you might not send around, like, which sort of keystrokes someone has
0:23:25 taken to arrive at the final message.
0:23:27 You send the final message, but then those final messages, you
0:23:30 might actually care about, okay, this came at this point in time.
0:23:32 One message does rarely override another one.
0:23:35 So this is, I think, where that might be a better fit.
0:23:39 And that makes also sense that in James use case with Actual Budget that's all
0:23:44 about, like, historic transactions.
0:23:46 You want exactly those like records and time.
0:23:49 There's probably not so many, and even if there are many,
0:23:52 they're all kind of worth it.
0:23:54 They need to be accurate.
0:23:56 So I think that makes perfect sense, and I think there will be like many different
0:23:59 solutions for many different use cases.
0:24:02 And I think there's even, you know, an approach how you could apply a bit of like
0:24:06 that compaction, even to event sourcing.
0:24:08 I think there's like a concept called subsumption, I think.
0:24:12 And that allows you to also like flatten some of those historic events.
0:24:16 So that's super interesting topic of research, I think.
0:24:20 . Yeah, I know there for certain CRDT algorithms and certain
0:24:23 event sourcing things, like.
0:24:26 Yeah, there are ways to compact history, but it did seem like, still
0:24:29 a pretty evolving area of research.
0:24:32 I think now it's a lot more understood especially in the,
0:24:35 like, list based CRDT space.
0:24:37 Sure, like, I don't know, I guess, Yjs kind of pioneered a lot of
0:24:41 that work of, like, run length encoding, everything, and then stuff.
0:24:45 Yeah, I guess what I want to quickly mention, so there's, when
0:24:47 I was first getting started, there was a community called Braid.org
0:24:51 which I got involved with early on.
0:24:54 And yeah, I don't know, that was an amazing community in terms
0:24:56 of like learning and being open.
0:24:59 So I just want to call them out, like they helped me a ton to understand
0:25:03 all these CRDT algorithms and when history can be compacted, when it can't.
0:25:07 What kind of weird circumstances you get into if you know you're emerging
0:25:11 changes peer-to-peer versus, you know, client server and whatnot.
0:25:14 Yeah.
0:25:15 There this entire community, I'm not yet a part of, but given that
0:25:19 this year I'm also looking much more into implementing, syncing this.
0:25:24 I feel like there's a lot I have to learn about and I would love
0:25:28 to also have some folks from that community here on the podcast.
0:25:32 Yeah they're trying to add.
0:25:34 Syncing to the HTTP protocol, so essentially adding a new verb rather
0:25:40 than get, put, and post to allow you to merge documents or resources together.
0:25:46 That sounds fascinating.
0:25:48 How far along is that effort?
0:25:49 They are part of IETF now.
0:25:52 I don't remember if they're a research, there's like two separate types of groups.
0:25:56 There's like research groups and maybe there's standards groups.
0:25:59 I think those are the two.
0:26:01 Yeah, I don't know if they're still in the research phase or if
0:26:03 they're on the standards track yet.
0:26:05 Challenges
0:26:05 Very interesting.
0:26:06 I have to look more into that.
0:26:08 So, besides choosing the right CRDTs to fit into SQLite through virtual
0:26:14 tables I think this, like, just picking SQLite to run in the browser quite a
0:26:21 few years ago, that's probably you've must have been running into a bunch of
0:26:25 challenges and issues along the way.
0:26:27 So I'm curious to hear first getting sort of like a broad overview of
0:26:31 the different challenges you were running into and then going into them.
0:26:35 Yeah, I think I wasted months of my life, like fighting SQLite issues.
0:26:39 Cause yeah, it was really early where the official SQLite WASM project
0:26:44 had kind of like just started.
0:26:46 And I guess some of the first problems I ran into was.
0:26:51 I guess compiling an extension into the WASM build.
0:26:55 So yeah, it's one thing to write a SQLite extension.
0:26:57 It's another thing to get it to run correctly in WASM.
0:27:03 It wasn't so hard when the extension was originally written in C, but like,
0:27:06 I don't know, like trying to code review contributions in C and make sure there's
0:27:10 no like memory issues was a nightmare.
0:27:13 So I eventually like bit the bullet, started learning Rust and re implemented
0:27:16 the extension in Rust, which I think was, yeah, it became like a third
0:27:21 or two thirds less code and just way faster to write after doing that.
0:27:24 I think it was worth the time.
0:27:26 But that introduced this whole new complication of like getting the
0:27:29 Rust compiler toolchain to compile an extension that can be linked to the
0:27:33 SQLite compiler toolchain that then goes through Emscripten to build a Wasm bundle.
0:27:39 And let me see what else was there.
0:27:42 Transactions was an interesting thing.
0:27:45 So the inner, the originally the interfaces to SQLite
0:27:49 in the browser were async.
0:27:51 And if you ever had Well, I guess there's two builds of SQLite.
0:27:55 The official SQLite build was always synchronous.
0:27:57 There's this unofficial build, which I think is better, called wa-sqlite, which
0:28:02 all the interfaces were asynchronous.
0:28:03 But it had these interesting caveats, like if you await two calls to
0:28:07 SQLite at once, it would deadlock.
0:28:09 So you had to make sure like there's only one Call to SQLite ever at once.
0:28:14 So, you know, working around that challenge.
0:28:16 Transaction interleaving was another, so like if it's asynchronous and you
0:28:20 start a transaction and you have all the statements in the transaction, okay.
0:28:24 You, you began the transaction and then you like do a read,
0:28:26 you're waiting the read.
0:28:28 Well, while you're waiting in the read, you delegate control of the event loop,
0:28:31 so somebody else could come in and, you know, kick off a new task, right?
0:28:36 A new statement to do a write or something.
0:28:39 Well, now that's somebody else's transaction that's trying to run while
0:28:41 your other transaction is running, so you get this, like, weird interleaving.
0:28:44 So, yeah trying to lock out transactions in the JavaScript layer that was
0:28:48 another annoying thing to deal with.
0:28:49 And then, yeah, this is not related to WASM, but just getting SQLite
0:28:56 with the extension built for all the platforms I wanted to target.
0:28:59 So, like, eventually Expo wants to include it in their Expo SDK, so figuring
0:29:04 out how to do Android and iOS builds in addition to, you know, the WASM build.
0:29:09 And Mac and Linux builds.
0:29:12 Those sound like quite the buffet of different challenges that some sounds
0:29:18 familiar to me as well as I think some you've been pioneering a tad earlier.
0:29:23 And by now, some of those are maybe a bit more.
0:29:27 A well trodden path, but I think there's probably even more
0:29:31 that we don't understand yet.
0:29:33 One more, I know one more interesting one I think more interesting than
0:29:36 the other ones is since SQLite in the browser, at least the one
0:29:41 I'm using, is IndexedDB backed.
0:29:43 IndexedDB has a very high cost for opening a transaction which
0:29:48 I was unaware of at the time.
0:29:49 So yeah, there's a lot of work.
0:29:51 And anytime you open a SQLite transaction, it would open an IndexedDB transaction.
0:29:56 So there's a lot of work making sure that, like, for reads Yeah, normally
0:30:00 you just issue reads, you wouldn't necessarily batch them into a transaction.
0:30:03 Like normally, you know, you'd batch your writes into a transaction.
0:30:06 But yeah, to make SQLite in the browser go fast if you're doing a whole bunch
0:30:09 of reads, like making sure a transaction was open automatically all the reads
0:30:14 would happen and then the transaction would close once all the reads are done.
0:30:16 So it's like way of batching that was like a 10x speed up in the browser for SQLite.
0:30:21 So once you've solved all of those performance challenges and also like
0:30:26 correctness, syncing challenges, et cetera, then it's also needs
0:30:30 to be competitive somehow with the typical developer experience
0:30:34 you have in a React app or in a different framework you're using.
0:30:40 How did you go about designing that?
0:30:42 Yeah, so, yeah, I've developed a set of hooks, so, I don't know, it's, I guess
0:30:47 like the React integration was probably the least amount of, I don't know,
0:30:51 I'd say it's the least amount of work.
0:30:52 Maybe not once I like think about all the strict code problems and
0:30:55 stuff but yeah, essentially the DevEx was like a use query hook.
0:30:59 So I think it's a lot easier to develop when your queries are
0:31:03 co located with your components.
0:31:04 So every component is responsible for getting the data it needs.
0:31:08 And this makes your app like super composable, right?
0:31:10 Like you can add and remove components to your app or your component tree.
0:31:15 And because they're responsible for getting their data, like, you
0:31:18 don't have to do any additional wiring or prop drilling.
0:31:20 And yeah, they're not going to break other components because suddenly,
0:31:26 Some data dependencies are missing.
0:31:28 So I really like that, yeah.
0:31:29 Co located queries, so every component you could use this hook called UseQuery.
0:31:34 And in that UseQuery, you just literally write your SQL that fetches
0:31:39 the data required by that component.
0:31:41 And that SQL could be as complex or as simple as you wanted.
0:31:44 So, you know, like, select from issue where id equals issue id or something
0:31:49 simple or, you know, something complex like select star from issue,
0:31:52 order by modify, join labels join owner, and yeah, all these things.
0:31:58 So you've managed to tame most challenges with SQLite in the browser, making sure
0:32:03 it works in Wasm, making sure that your own extension is working, et cetera.
0:32:08 And SQLite can be super, super fast for most workloads, but once your
0:32:14 cardinality grows, et cetera, your tables get really big then queries can
0:32:19 also start Taking a little bit longer.
0:32:22 And that is because SQLite always kind of, besides a little bit of caching,
0:32:26 always starts with your results for your queries from scratch, kind of.
0:32:31 And you know, I've been chatting about this for quite a while back then.
0:32:35 This is how I know about all of this context.
0:32:37 And that has led you to look into.
0:32:40 incrementally maintaining the query results and the views.
0:32:43 So I'm very curious to hear more what you've learned back then
0:32:46 and which path this led you down.
0:32:49 Yeah.
0:32:50 Yeah.
0:32:50 So I guess, you know, I was talking earlier about each component
0:32:53 fetches its own data, has this use query hook, or just, you know,
0:32:56 specifies a SQL query it's running.
0:32:58 So when you're developing like a rich client side application like this,
0:33:02 ideally your database is reactive, right?
0:33:04 So any, anytime some data changes.
0:33:07 It's like you, you work on Overtone, right?
0:33:09 The music app.
0:33:10 If somebody presses play, then, you know, all the components that care about
0:33:16 play state need to immediately update.
0:33:18 And yeah, as you're saying in SQLite, if you're doing this somebody changes some
0:33:22 state, well, you know, SQLite doesn't have any, you know, facilities for reactivity.
0:33:27 It has a few but they're not very good.
0:33:29 Like they don't give you many fine grained notions about what changed
0:33:32 or like what queries it impacts.
0:33:34 So what you're left doing is rerunning.
0:33:38 Essentially from scratch the queries that could have been affected.
0:33:42 So, like, the SQLite change notifications, they'll tell you the row ID that
0:33:46 changed, but not the contents, and they'll tell you, like, the table.
0:33:49 So it's usually not enough to figure out exactly which queries to invalidate.
0:33:52 And yeah, so you rerun tons of queries throughout the app.
0:33:56 And those are all rerunning from scratch, so like, if you have, like,
0:34:00 for your track list, you're showing a few hundred tracks, and that track list
0:34:03 has to do a number of joins, right?
0:34:05 Has to, for a track, has to join the album to get the album title, has to
0:34:08 join the artist to get the artist names.
0:34:12 Yeah, I don't know if there's any other joins.
0:34:13 Maybe there could be joins about, like, like status, or favorite status,
0:34:18 or something about the track, right?
0:34:20 And, yeah, rerunning that join to select, you know, 200 some items.
0:34:25 And yeah, maybe they just want to sort, and sorting it doing that from
0:34:28 scratch every time somebody mutates something severely limits I guess how
0:34:32 fast you can interact with the app.
0:34:34 And yeah, I'd worked, tried a bunch with like, okay, you know,
0:34:38 maybe a purely in memory SQLite, and that gets you pretty fast.
0:34:42 But yeah, somebody was saying, oh, like, The goal for a reactive
0:34:45 database is it's memory fast.
0:34:47 And then when I started, like, I don't know, that idea, a lot just
0:34:50 helped my brain and I started like actually benchmarking, you know,
0:34:54 if I'm just observing a value in JavaScript, how fast is that?
0:34:57 And if I'm like, Observing a query in SQLite and updating the
0:35:02 row and re running the query.
0:35:03 Like, what is that?
0:35:04 And I have a observable notebook somewhere that like compares all this
0:35:08 and like the difference was massive.
0:35:09 So like just seeing how far I was from memory fast, like started
0:35:13 making me a bit sad on SQLite.
0:35:15 And also like benchmarking the original strut and the new one, which use
0:35:20 SQLite and these use query hooks.
0:35:23 Yeah, like the original one I could throttle my Chrome, you know, but in
0:35:27 the dev tools you can like downgrade your CPU right to like 6x slower or
0:35:31 something and I could go all the way down to the slowest possible and everything
0:35:35 was buttery smooth, but in the new one, like if I downgraded it too much,
0:35:40 like you could see some visible lag.
0:35:42 SQLite
0:35:45 is fundamentally built in this request response style, right?
0:35:50 This is the era of the LAMP stack where, you know, databases request
0:35:55 response made sense, right?
0:35:56 A user would go to the website it would do a query against the
0:35:59 DB and it would render, right?
0:36:01 There was no, like, rich interactivity.
0:36:03 It was always a full page refresh every time.
0:36:04 So request response made sense.
0:36:06 Yeah, but for these rich apps, request response no longer makes sense.
0:36:09 And trying to fit a request response sort of DB designed around that,
0:36:13 it into this reactive scenario.
0:36:15 And I think maybe one day somebody can get there.
0:36:19 But I think right now it didn't seem to make much sense to me and the juice
0:36:23 didn't seem to be worth the squeeze.
0:36:25 So I started, I guess my love affair with SQLite has slowly been coming to a close.
0:36:30 And I started, you know, investigating other projects.
0:36:33 Yeah maybe one, one day port it if there's time or.
0:36:36 porting some of these projects back to SQLite and contributing
0:36:39 incremental data flow, differential data flow to them or something.
0:36:43 But yeah, I guess I should describe what these projects are, right?
0:36:45 , So I like the idea of queries, being able to like, Issue a complex declarative query
0:36:52 against some set of relational data to get back the data you need for your view.
0:36:56 So I wanted to like, how can I , incrementally maintain an
0:37:00 arbitrary query against some data?
0:37:02 So like that track list example you query the, Tracks and their
0:37:06 artists, and their albums, and the play state, and all these things.
0:37:10 How can we, when somebody does it right, rather than re running
0:37:13 the query to get the track list it knows exactly which queries.
0:37:17 Should be invalidated by that write, and rather than re running them,
0:37:22 knows exactly how to patch up the rows that are impacted by the write.
0:37:27 So, you know, at first this problem seems absurdly complex and hard.
0:37:32 And you're like, Oh, yeah, I want to like invalidate and
0:37:34 patch up any arbitrary SQL query.
0:37:37 But yeah, I started reading some of the papers.
0:37:39 There's like a paper called DBSP.
0:37:41 It's a differential data flow paper.
0:37:43 I think Materialize is based on that.
0:37:47 And then, yeah, started implementing it.
0:37:49 And then I realized like, I don't know, this, once you've read it and started
0:37:52 implementing it and you realize like, this is very similar to other stream
0:37:56 processing stuff , like, and other query, you know, builders I've done,, right,
0:38:02 where rather than, you know, you create a series of operations and rather than
0:38:06 them asking a database for the data, they're like taking streaming data in.
0:38:12 So yeah, like some of the work I'd done at Meta was on real time abuse systems
0:38:17 and that was all streaming systems.
0:38:20 So I was like, oh, like this problem isn't that mysterious.
0:38:22 Like I've done stuff like this before.
0:38:25 It seems tractable.
0:38:26 So I, you know.
0:38:27 went full on, decided like, I think we, I think I can implement this yeah, based
0:38:32 on prior experience, based on having done query languages before so yeah,
0:38:36 MaterialLite was born which is bringing differential data flow to JavaScript, so
0:38:43 you can compose, filter, map, reduce, and a join operator to create these pretty
0:38:48 rich queries and have them reactively updated anytime there's a write.
0:38:53 Right.
0:38:53 And so for those of the listeners who have not yet built their own database
0:38:58 and maybe are familiar with like SQL where in SQL, you have like select from
0:39:04 where, and all of those and JavaScript or other programming languages who you
0:39:08 might've like, Map, filter, et cetera.
0:39:11 They might have different names, but they're sort of
0:39:13 like conceptually very similar.
0:39:15 And this is where you're basically just now like trying to recreate the same
0:39:20 semantics that we have from SQL, where you can say select star from this where so,
0:39:25 and you can basically have like an array where it now say .Map .Filter, et cetera,
0:39:32 and where you flip the trade offs from before you owned in SQLite, you might get
0:39:38 this query just once and then it's done.
0:39:40 You no longer are interested in like subsequent changes and then like, smaller
0:39:45 updates, but in JavaScript where our app stays warm, we click the button.
0:39:49 Something's changed slightly.
0:39:52 You want to change the trade offs quite significantly.
0:39:57 Materialite
0:39:57 Yeah, so like just to build off on like how SQL concepts map to like filter,
0:40:02 map, and reduce, or like map represents select or function application, like
0:40:08 you're selecting some columns, well that's mapping the original object to some
0:40:12 other object with a subset of the fields.
0:40:14 Filter is the same as where Reduce is what you use for any aggregate
0:40:19 function, like group by, sum, count.
0:40:21 Yeah, and I guess one other operator would be concat.
0:40:24 So if you want to do or, you just, you fork your stream that you filtered
0:40:28 or mapped over, and then after you've filtered it, you concat the streams back
0:40:32 together, and then you run a distinct.
0:40:34 So you can do an or and then join, joins a special thing where essentially.
0:40:39 You kind of maintain what you've seen from one stream and then as you
0:40:43 see results from another stream, you link them together based on some key.
0:40:46 At first, it like, Seemed like a large barrier, but then I don't know, once you
0:40:51 get into it and you realize, Oh, like I've done all this before and filter,
0:40:53 map, reduce, like these have direct analogs in SQL, like it's not so bad.
0:40:58 So we've so far like touched the most common usage of SQL.
0:41:04 But SQL has, or SQLite, Postgres, et cetera, but like sticking with
0:41:08 SQLite for a moment there's quite a bit more than like the common
0:41:12 like select from where, et cetera.
0:41:15 There's like various kinds of aggregations, various kinds of ways
0:41:18 to combine queries, sub queries, joins, various kinds of joins.
0:41:23 Is the system that you've came up with MaterialLite, is that on feature
0:41:27 parity on what SQLite is able to do?
0:41:31 Or are there some parts where you say, okay, that's out of scope because
0:41:35 there's years worth, maybe decades worth of optimization have gone into
0:41:40 SQLite into making that fast and that's out of scope for Materialite.
0:41:44 Yeah.
0:41:44 So there's no window functions and there's no recursive queries.
0:41:48 And base Materialite is strictly for streaming.
0:41:52 So, like, if you're processing a stream of writes or events, like
0:41:57 Materialite is a great fit because it's gonna, you know, patch up your
0:42:01 query results as those events come in.
0:42:04 But say you have a table of a million items and you want to
0:42:09 query it from scratch, right?
0:42:11 Like you didn't see the events, you couldn't maintain any queries.
0:42:15 You're just like, I want to know what's in this table.
0:42:17 So, so you're going to have to run that query from scratch, even in Materialite.
0:42:21 But for Materialite, what that means running a query from scratch is like
0:42:24 setting up the Dataflow pipeline and then feeding literally every row,
0:42:27 all million rows into that pipeline.
0:42:30 Which, yeah, since everything's implemented with, like, filter and map,
0:42:34 where, like, it's opaque, like, when you do a filter, you provide a lambda.
0:42:38 It's opaque.
0:42:38 That's what fields you're filtering on which makes this from scratch case hard
0:42:42 because, like, Oh, if I don't know what fields you're filtering on, then I don't
0:42:45 know if I can apply some sort of index that you might have against this data.
0:42:49 So, yeah, I guess after working on Material 8 and getting it to
0:42:54 work pretty well, like, okay, that was the next problem.
0:42:57 And it's like, we need an actual query language because in a
0:43:01 query language is declarative.
0:43:02 You express, like, This is the field name I'm filtering on.
0:43:05 This is the operator being used and this is the value.
0:43:07 So the engine knows exactly where the fields are ordering by the tables you're
0:43:12 using the fields you're filtering on.
0:43:15 And once you have that knowledge, you can also make that like first query
0:43:19 case fast, where I guess you're doing the traditional database thing of like
0:43:24 figuring out what indices to pick.
0:43:26 I don't think we'll ever, Be as fast as SQLite for that case.
0:43:29 But I think, you know, For an app like, for a rich client
0:43:32 like, Overtone or Strut, right?
0:43:35 , Most of your queries are subscriptions, they're not this
0:43:37 like from scratch Sort of thing.
0:43:38 Like, You're gonna subscribe, you're gonna like, You're gonna like, Set up
0:43:42 a query that subscribes to the slide list, set up a query that subscribes
0:43:45 to the component list on a slide.
0:43:48 And then as writes happen, you just want to update those rather than,
0:43:52 yeah, always, like, rather than having to have first class support
0:43:55 for running a query from scratch.
0:43:58 So, yeah, so that latter thing needs to be just fast enough, but
0:44:00 it doesn't need to be, like, Yeah, the fastest thing in the world.
0:44:04 I totally agree that there's a very interesting, different kinds of trade
0:44:08 off that should be explored and should be built that makes so much more sense for a
0:44:14 live active application where everything stays up to date and then changes
0:44:20 quite minimally as opposed to request response, which is how SQLite works today.
0:44:26 That being said, you can still get quite a bit of mileage out of SQLite today.
0:44:31 If you are a little bit more mindful about like how you work with it.
0:44:36 So some patterns that I found to Makes SQLite work for me with Overtone.
0:44:42 Is that what makes SQLite the slowest in a browser context is shoveling data
0:44:48 from within the SQLite database into JavaScript land and back and forth as
0:44:54 this is where you need to cross the memory boundaries from memory staying
0:44:59 within Wasm to how it's being like marshaled into JavaScript objects.
0:45:05 And what I found as a pattern that helps quite a lot is just keeping that small.
0:45:11 So if you need like a, and this is where I think you can apply quite a few of the
0:45:15 tricks that you've now mentioned and you embrace with Materialite, you can actually
0:45:20 bring back and layer on top of SQLite.
0:45:23 Probably still not quite as fast as Materialite.
0:45:27 But probably fast enough to make a use case like Overtone work within that
0:45:31 paradigm as well, that when the list changes, instead of like, getting a
0:45:36 new list of 5, 000 tracks for that playlist, every time those 5, 000
0:45:40 tracks, you just get a change in terms of like, Oh, this new track was added.
0:45:46 And getting that SQLite, I think can also be done with a few tricks namely
0:45:52 also like a temporary table where you can save the previous result of your
0:45:58 query before and after the change.
0:46:01 And then you query that in a diffing way.
0:46:05 And then you still need to sort of like work with that
0:46:08 diff information in JavaScript.
0:46:11 But this is how you can work around that performance challenge.
0:46:14 But your system with Materialite.
0:46:17 All of that is absorbed from you, which is very attractive.
0:46:20 Yeah, and I think right, so Materialite, right now, it can be backed by
0:46:24 an in memory collection, or it's just some little interface that's
0:46:28 called a source that you implement.
0:46:30 So I think, Yeah, SQLite could be that source, and then that would
0:46:33 solve the like, first query problem.
0:46:36 So I just compile the Materialite queries to SQL queries.
0:46:40 and then the incremental updates are maintained by Materialite going forward.
0:46:44 Yeah, obviously you have to like, listen to the transaction failures
0:46:46 and somehow roll those back.
0:46:48 But yeah, I think it's doable to like, Marry those two together and
0:46:51 have kind of the best of both worlds.
0:46:54 And I know there's like a project called GRDB.
0:46:56 I think it's like a iOS SQLite wrapper which adds a bunch of reactivity.
0:47:02 it's not like completely general.
0:47:04 It doesn't support like completely arbitrary SQL queries, at
0:47:06 least in an efficient way.
0:47:08 But I think it gets you close enough with the queries it does support.
0:47:11 One more thing about like, moving from the request response style to
0:47:15 like, queries being subscriptions.
0:47:17 It gives you these really cool opportunities to optimize
0:47:19 queries at a global level.
0:47:22 So like, if all your queries are subscriptions, you can see
0:47:24 what subscriptions are open.
0:47:26 And then you can start seeing which queries are either
0:47:29 identical and deduplicate or which queries share operations.
0:47:34 So like, maybe you have a whole bunch of queries that all do the same join.
0:47:37 While in the streaming system.
0:47:39 You can optimize that so that join happens once and then the results of the join are
0:47:44 fanned out to all the queries that use it.
0:47:46 So yeah, you can do some really neat tricks at a global level once you
0:47:49 model your queries or subscriptions.
0:47:50 Yeah, that's super interesting.
0:47:52 And a few of those things I'm exploring for LiveStore myself where they're
0:47:58 basically the problem of reactivity.
0:48:01 And efficiency, trying to keep the performance as good as possible,
0:48:05 but also squeezing that into a frame budget since LiveStore and Overton
0:48:10 all run within the main thread, which I'm not sure that's probably
0:48:14 also the way how MaterialLite works.
0:48:16 There is Interesting to also see it through a lens of like a different
0:48:21 paradigm shift evolution in the JavaScript front end world which is about signals.
0:48:27 And signals is also about like, I think a more lighter weight reactivity system
0:48:32 that is all about being composable.
0:48:35 And and I think those worlds are now finding, like, a way together
0:48:39 within live store with the prior work with Jeffrey and Nicholas on Riffle.
0:48:45 This was like some early implementations based on a paper called Adapt On.
0:48:50 And from my understanding all of like the research that has gone into adapt on.
0:48:56 And sort of like in parallel, what's been evolving as signals.
0:49:00 Those are actually the same thing.
0:49:02 And I think now we can do some really interesting work on combining the
0:49:07 power that like more substantial state management primitives, Like a database
0:49:13 gives us or something like material light gives us and bring that together
0:49:17 with something like a signal system that also kind of out of the box address the
0:49:23 thing you've just mentioned, which is reusing parts of the computational graph.
0:49:28 The only thing signals is missing.
0:49:29 I'm surprised nobody's done this yet.
0:49:31 Is Incremental computation against collections, right?
0:49:34 So you think of like, I have an array, and I map the array, then I filter
0:49:37 the array, then I reduce the array.
0:49:39 That's like creating a new copy of the array each time.
0:49:42 And if you like, add one element to the array, you like, rerun all
0:49:45 those against the full array again.
0:49:47 So it's kind of like the query from scratch case, right?
0:49:49 Whereas Materialite, right, You add an element to the array, well,
0:49:52 it's only going to run the filter on that one element and the map
0:49:55 on that one element rather than running against the full array.
0:49:58 So there's some interesting benchmarks maybe I can share with you after of
0:50:01 like, yeah, incrementally maintaining a map filter reduce pipeline
0:50:05 against an array of a million items.
0:50:07 Yeah, so it'd be really cool if like the signals implementers would even
0:50:12 Rocicorp
0:50:12 So even though your love affair with SQLite has for now come to an end.
0:50:18 I think the problems and goals you're chasing after are still very
0:50:22 similar to what I think is feasible with SQLite, just maybe with like
0:50:27 a different performance footprint.
0:50:29 I think with SQLite you also get the benefit.
0:50:32 of just that being like a very familiar and trusted thing that's
0:50:35 like been proven and battle tested for literally decades now.
0:50:40 And even though you said like you were running into some performance
0:50:44 issues SQLite is still so ridiculously fast that it's very feasible to do
0:50:49 write those applications in the, in a browser context, et cetera.
0:50:52 And I think even if you now on a modern CPU straddle your your
0:50:57 CPU, it's still manageable.
0:51:00 But I think there's multiple paths to arrive at the same goal.
0:51:05 And so as my understanding is that the project Materialite opened a couple of
0:51:11 really interesting follow up conversations for you which is now opening a new
0:51:16 chapter for your professional life.
0:51:19 So do you want to tell us a bit more about that?
0:51:21 Yeah, I guess, yeah, Materialite is my least known project,
0:51:25 least complete project.
0:51:27 But yeah, it did pick up some interest by Rocicorp specifically Aaron, who I
0:51:32 met at the local-first conference in St.
0:51:35 Louis.
0:51:36 And yeah, we're talking about, you know, the model of subscript, those queries as
0:51:39 subscriptions and like having a reactive app and I think he's the one that said
0:51:42 like, oh, things need to be memory fast.
0:51:45 And yeah, so, so I started working on MaterialLite and we did some explorations
0:51:49 of like, if you powered one of their demo apps with MaterialLite, like they
0:51:54 have a linear clone that's a demo app and they want to see like, Oh,
0:51:56 how many issues can we scale this to?
0:51:58 If we like use this differential data flow approach.
0:52:02 Yeah.
0:52:02 And they really liked kind of the initial results.
0:52:05 And yeah, we worked together, I guess.
0:52:08 quite a bit over the last four slash six months.
0:52:10 There was like a two month break for me on paternity leave.
0:52:13 And yeah, so I'll be starting a job with Rocicorp here pretty soon as a partner.
0:52:18 Yeah, I'm super excited about that.
0:52:20 That's amazing.
0:52:21 I mean, we had Aaron on the show.
0:52:23 I think that the second show for localfirst.fm, and there were so many
0:52:28 deep insights there by Aaron, and I'm a big fan of the products that's already
0:52:33 been built there, Replicache, Reflect, and sounds like your work with Materialite
0:52:40 is also going to be incorporated into making the products even better.
0:52:44 So I'm curious to hear more of your thoughts on you know, what do you feel
0:52:49 like where's local-first going with the products you're working on with Rosicorp?
0:52:55 Local-first provides a superior DevX because you're no longer
0:52:59 worried about APIs of, like, how do I get data from the server?
0:53:03 Like, a sync engine, or your database is solving, you have a sync engine solving
0:53:07 that for you, right, where you're just coding against local data rather than
0:53:11 setting up all these weird REST APIs that you have to call to get the data.
0:53:14 And yeah, I think that's the key value of local-first and the main thing
0:53:19 that Rocicorp is pushing forward.
0:53:21 It's like making that sync engine experience as best, as
0:53:25 Picking the right tool
0:53:25 So you're now joining Rocicorp where you're building Replicache and Reflect,
0:53:30 but the local-first space has grown quite significantly to a point that
0:53:34 can be almost a bit disorienting.
0:53:36 For newcomers who are trying to first figure out what does it mean to build
0:53:41 the app local-first and then also choose the right tool for the job.
0:53:45 Can you provide a little bit of guidance which how someone should think about
0:53:49 use cases and picking the right tool?
0:53:52 Yeah.
0:53:52 So I guess maybe when you're somebody, when the first question somebody
0:53:56 asks, should ask themselves is, "Do they want servers involved at all?"
0:54:01 Is the first question.
0:54:02 So, like, if you want all sync to go through a central authoritative
0:54:06 server, and you want that server to be able to override decisions made
0:54:10 by the client apply data integrity rules revert changes made by a client
0:54:16 Replicash is a very good choice for that.
0:54:18 If you want something, Where, you know, you don't want to preclude the
0:54:22 ability for peer, for nodes to sync peer to peer, or maybe the data they're
0:54:27 syncing is end to end encrypted.
0:54:29 I think something like CR-SQLite actually shines there.
0:54:33 CR-SQLite lets you merge in any topology you want, whether it be
0:54:37 peer to peer or hub and spoke.
0:54:39 So we've seen a couple of companies that are, have like end to end encryption,
0:54:44 where they use CR-SQLite to sync, or they have literally a pub sub
0:54:48 channel, where all peers just publish their messages, and any peer who
0:54:51 receives the messages integrates it.
0:54:54 In those use cases and then I guess if you're really looking for, like,
0:54:57 if your data model is like document based, I guess going back to syncing,
0:55:01 so not just CR-SQLite for peer to peer, but also Yjs and AutoMerge
0:55:05 is capable of peer to peer sync.
0:55:07 They're also, of course, if you can sync peer to peer, you can
0:55:09 sync through central server too.
0:55:11 If you're looking for, you know, you just want to support a collaborative text
0:55:16 field collaborative text doc and you don't need, you know, ways to query over your
0:55:22 documents, like you're fine just storing this doc in a blob in your Whatever your
0:55:26 storage is you know, Yjs is a very good choice because it's primarily targeting
0:55:30 it at text editing, collaborative text but they don't have, you know,
0:55:33 facilities like built in for persistence or querying and these sorts of things.
0:55:38 So I haven't kept super up to date with AutoMerge, but from what I've seen if
0:55:42 you want like, the ability to fork and look at different portions of history.
0:55:47 So like, say people are editing a document somebody wants to fork their
0:55:50 document off, make a bunch of changes and merge the fork back in I mean, I see
0:55:55 AutoMerge talks about this concept a lot.
0:55:57 So I think if it's not there already, AutoMerge would be your best bet for some.
0:56:03 for that sort of behavior.
0:56:04 Yeah, then there's ElectricSQL.
0:56:06 So that is like a, you know, Postgres centric solution.
0:56:11 So if your database is Postgres and you're okay with, like, merging based on CRDT
0:56:17 rules You know, that's a good choice.
0:56:18 So yeah, so I say like your back end is Postgres and you're
0:56:21 okay like using CRDT rules.
0:56:23 Like yeah, what does that mean?
0:56:24 So a really interesting thing about Replicache is it can work with any
0:56:27 back end, pretty much any back end.
0:56:30 So Postgres, MySQL, all sorts of things.
0:56:34 And the way this works is that it uses sort of a rebase model where , the
0:56:39 developer, defines their mutations.
0:56:41 And those mutations get run in a specific order on the server and clients when
0:56:47 they receive updates from the server, any mutations they have outstanding.
0:56:50 They apply the server updates and any mutations that are
0:56:52 outstanding, they rebase on top.
0:56:54 And this is like super flexible in that you can write any, like, if you want to
0:56:59 change merge and conflict behavior, well, you just write your mutations however
0:57:03 you like to get the behavior you want.
0:57:05 Whereas something like CR-SQLite or electricsql to get the merge
0:57:09 behavior you want, you have to pick specific CRDT types.
0:57:12 And it, it seems a little bit less flexible since you can't
0:57:15 write like a custom mutation.
0:57:16 But ElectricSQL does seem to have a pretty good end to end story of like,
0:57:19 you set up your Postgres DB, that schema gets replicated down into a
0:57:22 local SQLite DB, and as you make schema changes upstream, they flow downstream.
0:57:27 So in terms of like, Having a good story between two specific
0:57:32 databases it seems pretty good.
0:57:34 And then I guess the last thing is PowerSync.
0:57:37 So PowerSync, uh, works on a different model that's not rebase or CRDTs.
0:57:43 So with PowerSync, a client will accumulate changes, and if a
0:57:47 client has local changes, it won't take any changes from the server.
0:57:50 So in the rebase model, if a client has changes, it'll take
0:57:53 changes from the server and then replay its local changes on top.
0:57:56 Uh, but in the PowerSync model, the client has local changes, it won't take any
0:57:59 server changes until those local changes have been incorporated by the server,
0:58:02 and then it can take server changes.
0:58:04 Um, So yeah, to me, I've never built an app with PowerSync, but this seems like
0:58:07 it'd be a lot higher latency, uh, for the syncing, because if you have any
0:58:12 outstanding local changes, you simply cannot take a change from the server.
0:58:15 Whereas rebase model, if you have outstanding changes, it's fine.
0:58:18 You get the server changes, you just replay your local changes
0:58:21 on top of the server changes.
0:58:22 That's a really great overview, and there's many other tools that
0:58:26 more more tools than we have time to exhaustively can cover here.
0:58:31 But I think that's a really great orientation, particularly with Asking
0:58:34 yourself the first question, like where should the authority originate from?
0:58:38 Do you want to have a server?
0:58:40 That's, I guess, more traditional how web apps work today.
0:58:43 And that can simplify things a lot because in that server you can enforce if there's
0:58:48 like a merge conflict or if there's like maybe a user has set some data in
0:58:54 a way That might be not compliant with the app that you want to build or for
0:58:59 other simplification reasons, et cetera.
0:59:02 So I think picking that, where does the authority live?
0:59:04 Is there an authority?
0:59:06 And , which role should a server play?
0:59:09 I think that's a really important one.
0:59:11 And then also the two kinds of like ways to go about syncing or
0:59:15 at least two ways going more about certain spacing, everything on CRDTs.
0:59:21 Or going with a event rebase model, our Replicash is working.
0:59:27 I'm curious how someone who has used neither and might have not really
0:59:32 a lot of intuition for what are the implications of choosing one or the other.
0:59:37 Can you provide a little bit of an intuition which kind of app use cases
0:59:42 are a good fit for one or the other?
0:59:44 Yeah.
0:59:44 So with CRDTs, you don't.
0:59:47 Have code on the right path, like the CRDT algorithm is going to
0:59:53 determine, I guess I should say you don't have code on the merge path.
0:59:55 The CRDT algorithm is going to determine how things merge.
0:59:58 And you're just picking a set of CRDT algorithms where something like Replicash,
1:00:02 where you write custom mutators.
1:00:04 You can write a set of mutators that run on the client, and you can write mutators
1:00:08 with the same name, same args, but do something totally different on the server.
1:00:11 Which, one it's, since you're writing code in the mutation, it's giving you control
1:00:16 over, kind of, how things merge and sync.
1:00:20 And two, since The code doesn't have to be identical on every node because
1:00:24 the server is the authoritative answer.
1:00:26 You could do something in the server mutation like check permissions or,
1:00:30 you know, set last modified times or something from the server's
1:00:33 time rather than the client's time.
1:00:35 Yeah, but I guess the one downside to that is a CRDT, like, you can sync messages in
1:00:40 any order, like, yeah, if you have message A, B, and C, if you sync them B, C, A,
1:00:46 or C, B, A, like, you always get the same state, which means that you don't have to
1:00:50 have a single server being the authority.
1:00:52 You can have every peer be on equal footing and essentially have no server
1:00:56 whereas like this, you know, Replicache model, you do have to have, Someone
1:01:00 that is that authority that's going to provide the total ordering of messages.
1:01:04 And to me, it sounds like at least the latter, the rebasing model with
1:01:09 that authority living in a server, for example, that is a bit easier to get
1:01:15 into less of a radical shift of like thinking, changing the way, how you
1:01:20 think about data modeling, et cetera, and it's probably the easier one to
1:01:25 get into and might be a better catch all solution for many app use cases.
1:01:30 Like if you look at yeah, I think it's the easier, it's like, it's more
1:01:33 familiar and if you have an existing app that you're trying to add multiplayer
1:01:38 to Like, yeah this can work on your existing data model without much
1:01:41 changes or with little to no changes.
1:01:44 Whereas, oh, you want to switch to CRDTs, like you might have to make a new data
1:01:47 model specifically for this collaboration thing and migrate your stuff over.
1:01:51 Outro
1:01:51 Right.
1:01:51 And I mean, the advice I would give on top of that is that.
1:01:55 There are now like so many interesting and easy to use options out there,
1:02:00 whether it's Replicache, whether it's AutoMerge or others just maybe try
1:02:04 building the same mini version of your app in different technology stacks,
1:02:09 see what works best for you, and you'll develop that intuition just by yourself.
1:02:14 And I think that's a really fun thing to do.
1:02:16 Cool.
1:02:17 Matt, thank you so much for sharing about like all those different
1:02:21 projects you've been involved with.
1:02:22 I'm really excited to see what you'll be building at Rosicorp.
1:02:27 Is there anything else you want to share with the audience now?
1:02:30 Yeah, just stay tuned.
1:02:31 We're building the next.
1:02:32 Kind of iteration of Replicache and Reflect.
1:02:34 And , it was exciting enough to make me like drop everything I was doing
1:02:39 and join the team and yeah, go back to, I yeah, work, working for somebody,
1:02:44 I guess, rather than just, you know, following whatever research interests,
1:02:49 wherever they took me from day to day.
1:02:50 So yeah, it's, I think what they're building is super compelling and
1:02:53 You'll hear more, you know, over the summer and coming months.
1:02:57 That's incredible.
1:02:59 I'm really excited to hear more about what you all have been working on.
1:03:03 And yeah, thank you so much for coming on the show.
1:03:06 Yeah.
1:03:06 Thanks for having me.
1:03:08 Thank you for listening to the localfirst.fm podcast.
1:03:11 If you've enjoyed this episode and haven't done so already, please subscribe and
1:03:15 leave a review wherever you're listening.
1:03:17 Please also tell your friends about it.
1:03:18 If you think they could be interested in local-first, if you have feedback,
1:03:22 questions or ideas for the podcast, please get in touch via hello at
1:03:26 localfirst.fm or use the feedback form on our website, special thanks to Expo and
1:03:31 Crab Nebula for supporting this podcast.