Someone on the internet…

sqlx is really nice, but you definitely take a hit to compile times

- Random people on the internet

It’s something that really resonated with me after heavily using sqlx at my old job. Even with a Ryzen 3700, cargo check times climbed from 5 to 10 to 20 seconds, and cargo sqlx prepareing off a remote database was a good excuse to take a coffee break. There’s got to be something we can do…

Before we dive into all that I’d like to have a quick aside to cover some sqlx basics. Feel free to skip if you’re already familiar with sqlx’s macros and offline builds.

Disclaimer: This whole post was done on sqlx v0.5-v0.6. Some parts are already inaccurate with v0.7, and I’m sure that trend will continue

Primer: sqlx macros 101

One of sqlx’s main selling points is that it can perform compile-time query checking against an actual database. That means that if you have some code like so

use sqlx::{Result, SqliteConnection};

struct User {
    id: i64,
    name: String,
}

async fn get_user_by_id(db: &mut SqliteConnection, id: i64) -> Result<User> {
    sqlx::query_as!(
        User,
        "SELECT id, name FROM User WHERE id = ?",
        id,
    )
    .fetch_one(db)
    .await
}

and compile with the DATABASE_URL env var set to your database URL then it will connect to the database at compile time to verify that both the query is valid, and that the Rust types match the database’s returned types. If your query is invalid, or your database types don’t match their Rust counterparts then you’ll end up with a compile-time error.

11c11
<         "SELECT id, name FROM User WHERE id = ?",
---
>         "SELECT id, name FROM User WHERE id = ? AND NOT deleted",
$ DATABASE_URL='sqlite:blog.db' cargo check
    Checking sqlx_blog v0.1.0 (/.../sqlx_blog)
error: error returned from database: (code: 1) no such column: deleted
  --> src/lib.rs:9:5
   |
9  | /     sqlx::query_as!(
10 | |         User,
11 | |         "SELECT id, name FROM User WHERE id = ? AND NOT deleted",
12 | |         id,
13 | |     )
   | |_____^
   |
   = note: this error originates in the macro ...

error: could not compile `sqlx_blog` due to previous error

Whoops. Forgot to add that column :)

Of course, requiring a database connection every time you build can be a hassle (e.g. in CI), so sqlx also provides the option to do offline builds. You just add the "offline" feature to sqlx, and now you can prepare an sqlx-data.json file that describes all of your queries using cargo-sqlx from sqlx-cli.

$ DATABASE_URL='sqlite:blog.db' cargo sqlx prepare
   Compiling sqlx_blog v0.1.0 (/.../sqlx_blog)
    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
query data written to `sqlx-data.json` in the current directory; please ...

$ bat --plain sqlx-data.json
{
  "db": "SQLite",
  "{query_hash}": {
    "describe": {
      ... // A lot of information on the query
    },
    "query": "SELECT id, name FROM User WHERE id = ?"
  }
  ... // More entries for all other queries
}

Now your project can build using the information in sqlx-data.json instead of needing to connect to a live database.

Connecting to databases at compile time is the kind of proc-macro (ab)use that people usually bring up when talking about sqlx. It’s equally beautiful and horrifying 😄.

Peering through the looking glass

You can probably guess from this blog post… ya know… existing and all that sqlx ended up being the main culprit, but I really had no clue when I started this journey. All I knew was that my old job’s 60k+ sloc mono-crate code base was slowly spiraling into a very unhealthy dev-feedback loop, and I was wanting out.

I was rooting through my usual toolbox of cargo check --timings, cargo-llvm-lines, and summarize to get a better idea of what was blowing up the cargo check times. cargo check --timings showed that, unsurprisingly, it was the massive mono-crate taking up all the time. cargo-llvm-lines pointed to a tossup between large sqlx macros, serde’s #[derive(Deserialize, Serialize)]s, and some large functions. Last, but certainly not least, summarize had something very interesting to note (edited for narrow screens).

$ cargo +nightly rustc -- -Z self-profile

$ summarize summarize {redacted}-{pid}.mm_profdata
+--------------------------+-----------+-----------------+----------+
| Item                     | Self time | % of total time | Time     |
+--------------------------+-----------+-----------------+----------+
| expand_crate             | 14.75s    | 67.516          | 14.83s   |
+--------------------------+-----------+-----------------+----------+
| monomorphization_coll... | 1.11s     | 5.074           | 2.55s    |
+--------------------------+-----------+-----------------+----------+
| hir_lowering             | 419.93ms  | 1.923           | 419.93ms |
+--------------------------+-----------+-----------------+----------+
... many many elided rows

67.5% of the time was taken up by expand_crate! What even is expand_crate? Well lucky for all of us living in the future this appears to now get reported in the much more appropriate item: expand_proc_macro, which makes things pretty obvious (probably thanks to this PR). That’s okay though, a quick rg on the compiler source at the time suggested the same thing.

Well that seems to point a reeeally big spotlight on sqlx, but what was it doing that was taking up so much time? And then my eyes fell on that ~500 KiB sqlx-data.json file…

500 KiB isn’t that big

It could be getting handled really poorly?

The JSON is pretty printed. Maybe if we compact it…

$ hyperfine --warmup 1 --prepare 'touch src/lib.rs' 'cargo check'
Benchmark 1: cargo check
  Time (mean ± σ):      19.273 s ±  0.202 s    [User: 18.977 s, System: 0.542 s]
  Range (minmax):    18.839 s19.524 s    10 runs

$ mv sqlx-data.json sqlx-data.json.pretty

$ cat sqlx-data.json.pretty | jq -c > sqlx-data.json

$ hyperfine --warmup 1 --prepare 'touch src/lib.rs' 'cargo check'
Benchmark 1: cargo check
  Time (mean ± σ):      14.965 s ±  0.294 s    [User: 14.685 s, System: 0.531 s]
  Range (minmax):    14.449 s15.368 s    10 runs

(Note: All timings are run on a laptop with an i5-1135G7 because I wiped my desktop after it was having GPU issues :c)

Several. Seconds. Faster.

A -22% change for something so simple is huge, but now that leaves the question of why the JSON parsing is so slow. Even if sqlx was doing something really bad like re-reading the whole file for each macro then there are still only a couple hundred queries. 497 KiB * 203 queries comes out to 98.5 MiB which should be nothing for an optimized JSON par- 🤦. Wait… this is a debug build… it won’t be an optimized JSON parser.

Down the rabbit hole

Now that we found the culprit we can try to fix things. cargo makes it easy enough to change how we build dependencies. Why not set it to do an optimized build for all proc-macro related bits? That should speed things up a lot.

Consulting the docs we see that we can do an optimized build for all proc-macros and their dependencies by adding

[profile.dev.build-override]
opt-level = 3

to our Cargo.toml file.

We’re back to our pretty-printed sqlx-data.json now. How are we looking?

$ hyperfine --warmup 1 --prepare 'touch src/lib.rs' 'cargo check'
Benchmark 1: cargo check
  Time (mean ± σ):      7.021 s ±  0.077 s    [User: 6.696 s, System: 0.539 s]
  Range (minmax):    6.887 s 7.176 s    10 runs

12 seconds faster? One word: dopamine.

On top of that one of my coworkers pointed out that we get most of the benefit still when doing an optimized build of just sqlx-macros a la

[profile.dev.package.sqlx-macros]
opt-level = 3

Pulchritudinous parsing

Bless you

Oh, I didn’t sneeze. It actually means beautiful

Huh. Pretty ugly looking word for beautiful

So we’re done, right? Ship it, and all that? No. We may have put a nice band-aid on things, but I’m not going to call it good here. Let’s fix-up sqlx, so that everyone can benefit from faster builds. Poking around the source for a bit we find DynQueryData::from_data_file() (edited for brevity).

// from sqlx-macros/src/query/data.rs

#[derive(serde::Deserialize)]
pub struct DynQueryData {
    #[serde(skip)]
    pub db_name: String,
    pub query: String,
    pub describe: serde_json::Value,
    #[serde(skip)]
    pub hash: String,
}

impl DynQueryData {
    /// Find and deserialize the data table for this query from a shared
    /// `sqlx-data.json` file. The expected structure is a JSON map keyed by
    /// the SHA-256 hash of queries in hex.
    pub fn from_data_file(path: &Path, query: &str) -> crate::Result<Self> {
        let this = serde_json::Deserializer::from_reader(BufReader::new(
            File::open(path).map_err(...)?
        ))
        .deserialize_map(DataFileVisitor {
            query,
            hash: hash_string(query),
        })?;

        // ...

        Ok(this)
    }
}

// lazily deserializes only the `QueryData` for the query we're looking for
struct DataFileVisitor<'a> {
    query: &'a str,
    hash: String,
}

impl<'de> Visitor<'de> for DataFileVisitor<'_> {
    type Value = DynQueryData;

    fn expecting(...) -> fmt::Result { ... }

    fn visit_map<A>(
        self,
        mut map: A
    ) -> Result<Self::Value, <A as MapAccess<'de>>::Error>
    where
        A: MapAccess<'de>,
    {
        let mut db_name: Option<String> = None;

        let query_data = loop {
            // -- 8< -- Get db name and query info then break -- 8< --
        };

        // Serde expects us to consume the whole map; fortunately they've got a
        // convenient type to let us do just that
        while let Some(_) = map.next_entry::<IgnoredAny, IgnoredAny>()? {}

        Ok(query_data)
    }
}

It’s a decent chunk of code, but we can see that they do, in fact, deserialize the whole file for each query. They’re smart about it and tell serde to ignore most of the values, but serde_json still has to parse the full thing to ensure it’s valid JSON.

That leaves an easy fix though. The sqlx-data.json file shouldn’t change while we’re compiling, so we can just deserialize the whole thing once upfront and then pass out the data for each query as needed. Something like

static OFFLINE_DATA_CACHE: Lazy<Mutex<BTreeMap<PathBuf, OfflineData>>> =
    Lazy::default();

#[derive(serde::Deserialize)]
struct BaseQuery {
    query: String,
    describe: serde_json::Value,
}

#[derive(serde::Deserialize)]
struct OfflineData {
    db: String,
    #[serde(flatten)]
    hash_to_query: BTreeMap<String, BaseQuery>,
}

impl OfflineData {
    fn get_query_from_hash(&self, hash: &str) -> Option<DynQueryData> {
        self.hash_to_query.get(hash).map(|base_query| DynQueryData {
            db_name: self.db.clone(),
            query: base_query.query.to_owned(),
            describe: base_query.describe.to_owned(),
            hash: hash.to_owned(),
        })
    }
}

#[derive(serde::Deserialize)]
pub struct DynQueryData { ... }

impl DynQueryData {
    pub fn from_data_file(path: &Path, query: &str) -> crate::Result<Self> {
        let query_data = {
            let mut cache = OFFLINE_DATA_CACHE
                .lock()
                .unwrap_or_else(/* reset the cache */);

            if !cache.contains_key(path) {
                let offline_data_contents = fs::read_to_string(path)
                    .map_err(...)?;
                let offline_data: OfflineData =
                    serde_json::from_str(&offline_data_contents)?;
                let _ = cache.insert(path.to_owned(), offline_data);
            }

            let offline_data = cache
                .get(path)
                .expect("Missing data should have just been added");

            let query_data = offline_data
                .get_query_from_hash(&hash_string(query);
                .ok_or_else(...)?;

            if query != query_data.query {
                return Err(/* hash collision error */);
            }

            query_data
        };

        // ...

        Ok(query_data)
    }
}

As you can see we deserialize all the sqlx-data.json data into an OFFLINE_DATA_CACHE which stores it in a BTreeMap<PathBuf, OfflineData> (A BTreeMap is needed because there can actually be multiple sqlx-data.json files in use, so the path maps to its deserialized data). From there we can just build and return DynQueryDatas from OfflineData on the fly. Not too bad, and we get to scrap all the custom deserializer logic as a bonus.

PR: launchbadge/sqlx#1684

How’s the time looking now?

(Note: Still keeping the build-override from before)

$ hyperfine --warmup 1 --prepare 'touch src/lib.rs' 'cargo check'
Benchmark 1: cargo check
  Time (mean ± σ):      5.614 s ±  0.064 s    [User: 5.349 s, System: 0.501 s]
  Range (minmax):    5.489 s 5.739 s    10 runs

Over a full second shaved off from the ~7 seconds before!

And you get a cache, and you get a cache

That covers the check times mentioned in the opening quote. What else was there? Something about coffee?

cargo sqlx prepareing off a remote database was a good excuse to take a coffee break.

Oh yeah, preparing off remote databases! (Also I don’t really drink coffee, but I do have a dog to walk :) )

Diving back into the code yields us this snippet for preparing off a remote database.

// from sqlx-macros/src/query/mod.rs

fn expand_from_db(
    input: QueryMacroInput,
    db_url: &str
) -> crate::Result<TokenStream> {
    let db_url = Url::parse(db_url)?;
    match db_url.scheme() {
        #[cfg(feature = "postgres")]
        "postgres" | "postgresql" => {
            let data = block_on(async {
                let mut conn = sqlx_core::postgres::PgConnection::connect(
                    db_url.as_str()
                ).await?;
                QueryData::from_db(&mut conn, &input.sql).await
            })?;

            expand_with_data(input, data, false)
        },

        #[cfg(not(feature = "postgres"))]
        "postgres" | "postgresql" => Err(
            "database URL has the scheme of a PostgreSQL database but the \
            `postgres` feature is not enabled".into()
        ),

        // -- 8< -- Same thing for other dbs. Soooo many `cfg`s -- 8< --

        item => Err(format!("Missing expansion needed for: {:?}", item).into())
    }
}

Very similar to the last section, but this time we’re creating a fresh database connection for each query macro instead of parsing a file. This is not cheap for remote databases! Quoting sqlx’s docs

1. Overhead of Opening a Connection

Opening a database connection is not exactly a cheap operation.

For SQLite, it means numerous requests to the filesystem and memory allocations, while for server-based databases it involves performing DNS resolution, opening a new TCP connection and allocating buffers.

Each connection involves a nontrivial allocation of resources for the database server, usually including spawning a new thread or process specifically to handle the connection, both for concurrency and isolation of faults.

Additionally, database connections typically involve a complex handshake including authentication, negotiation regarding connection parameters (default character sets, timezones, locales, supported features) and upgrades to encrypted tunnels.

Couldn’t have put it better myself! The fix is largely the same as last time. Instead of opening a new connection for each query we cache a single connection that gets reused. I’ll spare you the code since it’s the same idea, so let’s jump straight to the numbers instead (Taking them from the PR description since it’d be a pain to reproduce now).

PR: launchbadge/sqlx#1782 which caused an issue with SQLite, so it was excluded from the caching logic

# of queries: 332

sqlx-data.json size: 705 KiB

Setup main (f858138) This PR % Change
MariaDB remote 61.421 s ± 1.985 s 12.694 s ± 0.153 s -79%
MariaDB local 5.291 s ± 0.086 s 5.050 s ± 0.064 s -5%

That’s almost a -80% change when preparing off a remote MariaDB instance! On top of that the original ~1 minute is actually after two non-sqlx changes to my old work’s mono-crate that drastically reduced the time. It used to be ~8 minutes before we:

  1. Removed an unused ormx-based helper crate that seemed to add a lot more queries
  2. Split the main mono-crate into several crates in a workspace

Well… the second one helped a lot if you ignore the full rebuild…

That whole can of worms deserves a blog post of it’s own though (spoilers: All the work in sqlx has already been done)

Conclusion

Everything is sprawling, so in short the changes were

  1. Caching the parsed sqlx-data.json file when building the macros (cargo check/build/run) in offline mode
  2. Reusing the same database connections when using a remote database. This impacts cargo check/build/run when not using offline mode and cargo sqlx prepare when you are

Now I’m sure something that is on at least some of your all’s minds is, “when will all of this be released?!” The answer is that it already has been… for like 6 months! It turns out that I’m really slow at writing blog posts. Hopefully later ones will go by much faster now that I have a lot of non-recurring engineering work done.

Big thanks to:

  • LaunchBadge for open-sourcing sqlx, so that I can submit my hot trash for them to maintain 😌
  • Rust for giving me enough compile-time guarantees that I don’t want to cry when hacking on a new codebase
  • Rust tooling for being insanely good at times
  • My previous coworker for always finding things that I manage to miss