0:26:05 Very interesting.
0:26:06 I have to look more into that.
0:26:08 So, besides choosing the right CRDTs
to fit into SQLite through virtual
0:26:14 tables I think this, like, just picking
SQLite to run in the browser quite a
0:26:21 few years ago, that's probably you've
must have been running into a bunch of
0:26:25 challenges and issues along the way.
0:26:27 So I'm curious to hear first getting
sort of like a broad overview of
0:26:31 the different challenges you were
running into and then going into them.
0:26:35 Yeah, I think I wasted months of my
life, like fighting SQLite issues.
0:26:39 Cause yeah, it was really early where
the official SQLite WASM project
0:26:44 had kind of like just started.
0:26:46 And I guess some of the first
problems I ran into was.
0:26:51 I guess compiling an
extension into the WASM build.
0:26:55 So yeah, it's one thing to
write a SQLite extension.
0:26:57 It's another thing to get
it to run correctly in WASM.
0:27:03 It wasn't so hard when the extension
was originally written in C, but like,
0:27:06 I don't know, like trying to code review
contributions in C and make sure there's
0:27:10 no like memory issues was a nightmare.
0:27:13 So I eventually like bit the bullet,
started learning Rust and re implemented
0:27:16 the extension in Rust, which I think
was, yeah, it became like a third
0:27:21 or two thirds less code and just way
faster to write after doing that.
0:27:24 I think it was worth the time.
0:27:26 But that introduced this whole new
complication of like getting the
0:27:29 Rust compiler toolchain to compile an
extension that can be linked to the
0:27:33 SQLite compiler toolchain that then goes
through Emscripten to build a Wasm bundle.
0:27:39 And let me see what else was there.
0:27:42 Transactions was an interesting thing.
0:27:45 So the inner, the originally
the interfaces to SQLite
0:27:49 in the browser were async.
0:27:51 And if you ever had Well, I guess
there's two builds of SQLite.
0:27:55 The official SQLite build
was always synchronous.
0:27:57 There's this unofficial build, which I
think is better, called wa-sqlite, which
0:28:02 all the interfaces were asynchronous.
0:28:03 But it had these interesting caveats,
like if you await two calls to
0:28:07 SQLite at once, it would deadlock.
0:28:09 So you had to make sure like there's
only one Call to SQLite ever at once.
0:28:14 So, you know, working
around that challenge.
0:28:16 Transaction interleaving was another,
so like if it's asynchronous and you
0:28:20 start a transaction and you have all
the statements in the transaction, okay.
0:28:24 You, you began the transaction
and then you like do a read,
0:28:26 you're waiting the read.
0:28:28 Well, while you're waiting in the read,
you delegate control of the event loop,
0:28:31 so somebody else could come in and,
you know, kick off a new task, right?
0:28:36 A new statement to do
a write or something.
0:28:39 Well, now that's somebody else's
transaction that's trying to run while
0:28:41 your other transaction is running, so
you get this, like, weird interleaving.
0:28:44 So, yeah trying to lock out transactions
in the JavaScript layer that was
0:28:48 another annoying thing to deal with.
0:28:49 And then, yeah, this is not related
to WASM, but just getting SQLite
0:28:56 with the extension built for all
the platforms I wanted to target.
0:28:59 So, like, eventually Expo wants to
include it in their Expo SDK, so figuring
0:29:04 out how to do Android and iOS builds in
addition to, you know, the WASM build.
0:29:09 And Mac and Linux builds.
0:29:12 Those sound like quite the buffet of
different challenges that some sounds
0:29:18 familiar to me as well as I think some
you've been pioneering a tad earlier.
0:29:23 And by now, some of those
are maybe a bit more.
0:29:27 A well trodden path, but I
think there's probably even more
0:29:31 that we don't understand yet.
0:29:33 One more, I know one more interesting
one I think more interesting than
0:29:36 the other ones is since SQLite
in the browser, at least the one
0:29:41 I'm using, is IndexedDB backed.
0:29:43 IndexedDB has a very high cost
for opening a transaction which
0:29:48 I was unaware of at the time.
0:29:49 So yeah, there's a lot of work.
0:29:51 And anytime you open a SQLite transaction,
it would open an IndexedDB transaction.
0:29:56 So there's a lot of work making sure
that, like, for reads Yeah, normally
0:30:00 you just issue reads, you wouldn't
necessarily batch them into a transaction.
0:30:03 Like normally, you know, you'd batch
your writes into a transaction.
0:30:06 But yeah, to make SQLite in the browser
go fast if you're doing a whole bunch
0:30:09 of reads, like making sure a transaction
was open automatically all the reads
0:30:14 would happen and then the transaction
would close once all the reads are done.
0:30:16 So it's like way of batching that was like
a 10x speed up in the browser for SQLite.
0:30:21 So once you've solved all of those
performance challenges and also like
0:30:26 correctness, syncing challenges,
et cetera, then it's also needs
0:30:30 to be competitive somehow with
the typical developer experience
0:30:34 you have in a React app or in a
different framework you're using.
0:30:40 How did you go about designing that?
0:30:42 Yeah, so, yeah, I've developed a set of
hooks, so, I don't know, it's, I guess
0:30:47 like the React integration was probably
the least amount of, I don't know,
0:30:51 I'd say it's the least amount of work.
0:30:52 Maybe not once I like think about
all the strict code problems and
0:30:55 stuff but yeah, essentially the
DevEx was like a use query hook.
0:30:59 So I think it's a lot easier to
develop when your queries are
0:31:03 co located with your components.
0:31:04 So every component is responsible
for getting the data it needs.
0:31:08 And this makes your app like
super composable, right?
0:31:10 Like you can add and remove components
to your app or your component tree.
0:31:15 And because they're responsible
for getting their data, like, you
0:31:18 don't have to do any additional
wiring or prop drilling.
0:31:20 And yeah, they're not going to break
other components because suddenly,
0:31:26 Some data dependencies are missing.
0:31:28 So I really like that, yeah.
0:31:29 Co located queries, so every component
you could use this hook called UseQuery.
0:31:34 And in that UseQuery, you just
literally write your SQL that fetches
0:31:39 the data required by that component.
0:31:41 And that SQL could be as complex
or as simple as you wanted.
0:31:44 So, you know, like, select from issue
where id equals issue id or something
0:31:49 simple or, you know, something
complex like select star from issue,
0:31:52 order by modify, join labels join
owner, and yeah, all these things.
0:31:58 So you've managed to tame most challenges
with SQLite in the browser, making sure
0:32:03 it works in Wasm, making sure that your
own extension is working, et cetera.
0:32:08 And SQLite can be super, super fast
for most workloads, but once your
0:32:14 cardinality grows, et cetera, your
tables get really big then queries can
0:32:19 also start Taking a little bit longer.
0:32:22 And that is because SQLite always kind
of, besides a little bit of caching,
0:32:26 always starts with your results for
your queries from scratch, kind of.
0:32:31 And you know, I've been chatting about
this for quite a while back then.
0:32:35 This is how I know about
all of this context.
0:32:37 And that has led you to look into.
0:32:40 incrementally maintaining the
query results and the views.
0:32:43 So I'm very curious to hear more
what you've learned back then
0:32:46 and which path this led you down.
0:32:49 Yeah.
0:32:50 Yeah.
0:32:50 So I guess, you know, I was talking
earlier about each component
0:32:53 fetches its own data, has this
use query hook, or just, you know,
0:32:56 specifies a SQL query it's running.
0:32:58 So when you're developing like a rich
client side application like this,
0:33:02 ideally your database is reactive, right?
0:33:04 So any, anytime some data changes.
0:33:07 It's like you, you work
on Overtone, right?
0:33:09 The music app.
0:33:10 If somebody presses play, then, you
know, all the components that care about
0:33:16 play state need to immediately update.
0:33:18 And yeah, as you're saying in SQLite, if
you're doing this somebody changes some
0:33:22 state, well, you know, SQLite doesn't have
any, you know, facilities for reactivity.
0:33:27 It has a few but they're not very good.
0:33:29 Like they don't give you many fine
grained notions about what changed
0:33:32 or like what queries it impacts.
0:33:34 So what you're left doing is rerunning.
0:33:38 Essentially from scratch the queries
that could have been affected.
0:33:42 So, like, the SQLite change notifications,
they'll tell you the row ID that
0:33:46 changed, but not the contents, and
they'll tell you, like, the table.
0:33:49 So it's usually not enough to figure
out exactly which queries to invalidate.
0:33:52 And yeah, so you rerun tons
of queries throughout the app.
0:33:56 And those are all rerunning from
scratch, so like, if you have, like,
0:34:00 for your track list, you're showing a
few hundred tracks, and that track list
0:34:03 has to do a number of joins, right?
0:34:05 Has to, for a track, has to join the
album to get the album title, has to
0:34:08 join the artist to get the artist names.
0:34:12 Yeah, I don't know if
there's any other joins.
0:34:13 Maybe there could be joins about,
like, like status, or favorite status,
0:34:18 or something about the track, right?
0:34:20 And, yeah, rerunning that join to
select, you know, 200 some items.
0:34:25 And yeah, maybe they just want to
sort, and sorting it doing that from
0:34:28 scratch every time somebody mutates
something severely limits I guess how
0:34:32 fast you can interact with the app.
0:34:34 And yeah, I'd worked, tried a
bunch with like, okay, you know,
0:34:38 maybe a purely in memory SQLite,
and that gets you pretty fast.
0:34:42 But yeah, somebody was saying,
oh, like, The goal for a reactive
0:34:45 database is it's memory fast.
0:34:47 And then when I started, like, I
don't know, that idea, a lot just
0:34:50 helped my brain and I started like
actually benchmarking, you know,
0:34:54 if I'm just observing a value
in JavaScript, how fast is that?
0:34:57 And if I'm like, Observing a
query in SQLite and updating the
0:35:02 row and re running the query.
0:35:03 Like, what is that?
0:35:04 And I have a observable notebook
somewhere that like compares all this
0:35:08 and like the difference was massive.
0:35:09 So like just seeing how far I was
from memory fast, like started
0:35:13 making me a bit sad on SQLite.
0:35:15 And also like benchmarking the original
strut and the new one, which use
0:35:20 SQLite and these use query hooks.
0:35:23 Yeah, like the original one I could
throttle my Chrome, you know, but in
0:35:27 the dev tools you can like downgrade
your CPU right to like 6x slower or
0:35:31 something and I could go all the way down
to the slowest possible and everything
0:35:35 was buttery smooth, but in the new
one, like if I downgraded it too much,
0:35:40 like you could see some visible lag.
0:35:42 SQLite
0:35:45 is fundamentally built in this
request response style, right?
0:35:50 This is the era of the LAMP stack
where, you know, databases request
0:35:55 response made sense, right?
0:35:56 A user would go to the website
it would do a query against the
0:35:59 DB and it would render, right?
0:36:01 There was no, like, rich interactivity.
0:36:03 It was always a full
page refresh every time.
0:36:04 So request response made sense.
0:36:06 Yeah, but for these rich apps, request
response no longer makes sense.
0:36:09 And trying to fit a request response
sort of DB designed around that,
0:36:13 it into this reactive scenario.
0:36:15 And I think maybe one day
somebody can get there.
0:36:19 But I think right now it didn't seem
to make much sense to me and the juice
0:36:23 didn't seem to be worth the squeeze.
0:36:25 So I started, I guess my love affair with
SQLite has slowly been coming to a close.
0:36:30 And I started, you know,
investigating other projects.
0:36:33 Yeah maybe one, one day
port it if there's time or.
0:36:36 porting some of these projects
back to SQLite and contributing
0:36:39 incremental data flow, differential
data flow to them or something.
0:36:43 But yeah, I guess I should describe
what these projects are, right?
0:36:45 , So I like the idea of queries, being able
to like, Issue a complex declarative query
0:36:52 against some set of relational data to
get back the data you need for your view.
0:36:56 So I wanted to like, how can
I , incrementally maintain an
0:37:00 arbitrary query against some data?
0:37:02 So like that track list example
you query the, Tracks and their
0:37:06 artists, and their albums, and the
play state, and all these things.
0:37:10 How can we, when somebody does
it right, rather than re running
0:37:13 the query to get the track list
it knows exactly which queries.
0:37:17 Should be invalidated by that write,
and rather than re running them,
0:37:22 knows exactly how to patch up the
rows that are impacted by the write.
0:37:27 So, you know, at first this problem
seems absurdly complex and hard.
0:37:32 And you're like, Oh, yeah, I
want to like invalidate and
0:37:34 patch up any arbitrary SQL query.
0:37:37 But yeah, I started
reading some of the papers.
0:37:39 There's like a paper called DBSP.
0:37:41 It's a differential data flow paper.
0:37:43 I think Materialize is based on that.
0:37:47 And then, yeah, started implementing it.
0:37:49 And then I realized like, I don't know,
this, once you've read it and started
0:37:52 implementing it and you realize like,
this is very similar to other stream
0:37:56 processing stuff , like, and other query,
you know, builders I've done,, right,
0:38:02 where rather than, you know, you create
a series of operations and rather than
0:38:06 them asking a database for the data,
they're like taking streaming data in.
0:38:12 So yeah, like some of the work I'd done
at Meta was on real time abuse systems
0:38:17 and that was all streaming systems.
0:38:20 So I was like, oh, like this
problem isn't that mysterious.
0:38:22 Like I've done stuff like this before.
0:38:25 It seems tractable.
0:38:26 So I, you know.
0:38:27 went full on, decided like, I think we,
I think I can implement this yeah, based
0:38:32 on prior experience, based on having
done query languages before so yeah,
0:38:36 MaterialLite was born which is bringing
differential data flow to JavaScript, so
0:38:43 you can compose, filter, map, reduce, and
a join operator to create these pretty
0:38:48 rich queries and have them reactively
updated anytime there's a write.
0:38:53 Right.
0:38:53 And so for those of the listeners who
have not yet built their own database
0:38:58 and maybe are familiar with like SQL
where in SQL, you have like select from
0:39:04 where, and all of those and JavaScript
or other programming languages who you
0:39:08 might've like, Map, filter, et cetera.
0:39:11 They might have different
names, but they're sort of
0:39:13 like conceptually very similar.
0:39:15 And this is where you're basically just
now like trying to recreate the same
0:39:20 semantics that we have from SQL, where you
can say select star from this where so,
0:39:25 and you can basically have like an array
where it now say .Map .Filter, et cetera,
0:39:32 and where you flip the trade offs from
before you owned in SQLite, you might get
0:39:38 this query just once and then it's done.
0:39:40 You no longer are interested in like
subsequent changes and then like, smaller
0:39:45 updates, but in JavaScript where our
app stays warm, we click the button.
0:39:49 Something's changed slightly.
0:39:52 You want to change the trade
offs quite significantly.