Discussion Anyone know why a SQLite select query can run 50-100x faster using the SQLite CLI client than using the C library?

CMDR Vulkarius · Jan 9, 2025

Not a directly Elite question, planning to use it in an Elite utility though, and kinda hoping someone will know

I have a query which is kinda nested/involved but still only needs ~150 ms total to return ~100 rows using the sqlite3 command line client.
If I put the same exact SQL into a C program and use the low level sqlite_prepare_v2 and sqlite_step functions, it needs ~100 ms PER ROW, ~10000 ms total.
That's ~100 ms per call to sqlite_step, that call is what blocks per row iterated when using this query.

Query itself looks like this...

SQL:

select dat.id, dat.k, dat.name, sum(c.ww) weight, sum(c.ww*n.c_norm) norm

from (select c_id, w_ids, 1.0/json_array_length(w_ids) ww from clist where w_ids!='[]') c
join (select c_id, 1.0/count() c_norm from clist group by c_id) n on n.c_id=c.c_id
left join json_each(c.w_ids) w
join dat on w.value=dat.id

group by dat.id

There's nothing wrong with the query logic, it's doing exactly what I want, grouped results with normalisation factors available.
Between aggregate functions, sub queries, joins, JSON functions and groupings it's probably the groupings crippling it somehow.
I'm open to optimisations, but that's still totally independent from understanding why the API can't beat the CLI using THAT query

For some numbers it's ~100000 stored rows getting grouped down to ~100 result rows.

I've tried different SQLite versions, different C compiler flags (I'm using -O3), different SQLite compile options.
I've also tried different libraries/ports for other runtimes, they all just wrap/link the original SQLite binary and inherit the same problem.
Speed differences from anything like that are a few % at most, nowhere near the order of magnitude speedup with the sqlite3 client.

The dumb part is that I can write my program to run the sqlite3 client as a sub process with the file and query as command line args then parse the human formatted output to get the results.
It's literally 50-100x faster to go via that sub process than to use the actual SQLite C API from a C program.
I really don't want that to be the solution though, and I can't figure out what's the active difference here

Markov Chaney · Jan 9, 2025

I am not an expert on these particular components but that's the kind of thing you get if the two implementions use very different calling patterns. The C API likely has some tunables on how it fetches large results and they're all different by default to what the SQLite binary does. (C API is probably engineered to assume a typical three-tier use case where the caller is not on the same machine as the database)

Neilski · Jan 9, 2025

CMDR Vulkarius said:
That's ~100 ms per call to sqlite_step, that call is what blocks per row iterated when using this query.

It does sound a little as if you're accidentally having it kinda re-run the whole query each time you call it, even though that sounds a bit weird.
You can probably tell that I've not used that API but I'm guessing/hoping that there's a way to get it to return lots of rows in one hit rather than making 100 calls to sqlite_step for 100 rows.
Or at least: some way to make it cache the result of the query a little more effectively, and let you fetch the next result cheaply each time? (I could go and read the docs I suppose, might do that now

)

Neilski · Jan 9, 2025

Oh, one other thing comes to mind: you could cheat, and check out how the CLI is doing it

(I mean: which API calls it's using, and the flags it chose.)
If on Linux, the ltrace command oughta work. On Windows, Process Monitor should work.

CMDR Vulkarius · Jan 9, 2025

Markov Chaney said:
The C API likely has some tunables on how it fetches large results and they're all different by default to what the SQLite binary does. (C API is probably engineered to assume a typical three-tier use case where the caller is not on the same machine as the database)

SQLite isn't client-server at least so it knows it's just a file (at worst on a NAS, but it's not that).
I'm definitely hoping it's something tunable - although I've already tried with the main recommended compile options listed on the SQLite site

Neilski said:
It does sound a little as if you're accidentally having it kinda re-run the whole query each time you call it, even though that sounds a bit weird.
You can probably tell that I've not used that API but I'm guessing/hoping that there's a way to get it to return lots of rows in one hit rather than making 100 calls to sqlite_step for 100 rows.
Or at least: some way to make it cache the result of the query a little more effectively, and let you fetch the next result cheaply each time? (I could go and read the docs I suppose, might do that now )

I looked for a bulk result function as well, all it has is a deprecated "sqlite3_get_table" which they implement using sqlite_step anyway.
This has the basic usage/flow - https://sqlite.org/cintro.html
I'm calling open once, prepare once with that query, skipping bind because query has no parameters, then step in a loop ("advance to next result row").
Even if I remove the column calls and just print a row counter instead it still takes 100 ms for the step

Neilski said:
Oh, one other thing comes to mind: you could cheat, and check out how the CLI is doing it (I mean: which API calls it's using, and the flags it chose.)
If on Linux, the ltrace command oughta work. On Windows, Process Monitor should work.

Now there's an idea, the CLI source is also open and could yield something

Actually a real quick test could be just to compile my own version of it and test that for speed, at least that'll tell me if the problem is compile options or runtime calls.
Fast means it's using the API in some way I should be able to find by reading the source, slow means I'm back to looking for more clues about how it's compiled.
Coffee first then I'll throw their shell.c at GCC along with the same compiled sqlite.o object I linked with my program, should be a fair test.

Clicker · Jan 9, 2025

Perhaps try your question over at the EDCD Discord?

Join the EDCD: Community Developers Discord Server!

Check out the EDCD: Community Developers community on Discord – hang out with 5650 other members and enjoy free voice and text chat.

discord.gg

Hope this helps.
Clicker

Neilski · Jan 10, 2025

CMDR Vulkarius said:
Actually a real quick test could be just to compile my own version of it and test that for speed, at least that'll tell me if the problem is compile options or runtime calls.

I would bet a LOT of money that it's not the compiler options

Looking forward to hearing how you solve it, any which way.
(But ltrace is a dream for this kind of thing, since it sounds like you're on Linux.)

Neilski · Jan 10, 2025

Oh, btw: the (slightly icky) documentation for the API does mention that in certain situations it will recompile the query every time you call the _step function. It didn't seem likely to trigger though, given the situations they mention, but you never know.

Nowski · Jan 10, 2025

CMDR Vulkarius said:
It's literally 50-100x faster to go via that sub process than to use the actual SQLite C API from a C program.

Now that you've sent me down this fine rabbit hole I hope you'll report back if/when you make some progress!

CMDR Vulkarius · Jan 10, 2025

Neilski said:
I would bet a LOT of money that it's not the compiler options Looking forward to hearing how you solve it, any which way.
(But ltrace is a dream for this kind of thing, since it sounds like you're on Linux.)

Nowski said:
Now that you've sent me down this fine rabbit hole I hope you'll report back if/when you make some progress!

Result from compiling the CLI myself..... slowdown!
Looking more like it's compiler options or SQLite compile flags - either that or the OS package maintainers have written a super-speed patch

Also unless I'm missing something, ltrace won't show anything unless I'm linking a shared object, whereas SQLite is built statically into the program here (could be another way to try it though).

Neilski said:
Oh, btw: the (slightly icky) documentation for the API does mention that in certain situations it will recompile the query every time you call the _step function. It didn't seem likely to trigger though, given the situations they mention, but you never know.

Wondered what you meant for a moment, found it on the page for sqlite3_prepare.
It says a recompile only happens if there's a schema change or a WHERE parameter change - got nothing altering tables and the query has no parameters at all.

I don't want to put anyone to any heavy investigation, was sort of a long shot just in case it'd be something common/obvious.
While clicking around I managed to find the actual SQLite forum and post it there as a pretty big performance issue on the part of following the suggested build example

gazelle · Jan 10, 2025

Have you tried with EXPLAIN QUERY PLAN to get a look if the two cli versions behave differently? They shouldn't but you never know.

CMDR Vulkarius · Jan 10, 2025

gazelle said:
Have you tried with EXPLAIN QUERY PLAN to get a look if the two cli versions behave differently? They shouldn't but you never know.

This was it exactly - the plans are VERY different

The SQLite community asked me to check the same thing, along with PRAGMA compile_options showing a lot of differences.
OP / query plans / version tests from that thread.
99% sure it's in the plan choice differences between versions

It's looking like an actual legit regression in SQLite they'll need to fix, which I've narrowed somewhere between 3.31.1 and 3.42.0.
Given I'm using JSON functions and SQLITE_ENABLE_JSON1 is deprecated in 3.38.0, I think I'll take a punt on 3.37 being the last good release.
If they fiddled with whether JSON is enabled by default, I reckon that increases the chance they changed how JSON functions work with GROUP BY.

Those query plan differences:

Code:

SQLite version 3.47.0
QUERY PLAN
|--CO-ROUTINE n
|  |--SCAN clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN dat
|--SCAN n
|--BLOOM FILTER ON clist (c_id=?)
|--SEARCH clist USING AUTOMATIC PARTIAL COVERING INDEX (c_id=?)
`--SCAN w VIRTUAL TABLE INDEX 1: LEFT-JOIN

Code:

SQLite version 3.31.1
QUERY PLAN
|--MATERIALIZE 2
|  |--SCAN TABLE clist
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN SUBQUERY 2 AS n
|--SEARCH TABLE clist USING AUTOMATIC COVERING INDEX (c_id=?)
|--SCAN TABLE json_each AS w VIRTUAL TABLE INDEX 1:
|--SEARCH TABLE dat USING INTEGER PRIMARY KEY (rowid=?)
`--USE TEMP B-TREE FOR GROUP BY

Neilski · Jan 10, 2025

CMDR Vulkarius said:
Result from compiling the CLI myself..... slowdown!
Looking more like it's compiler options or SQLite compile flags - either that or the OS package maintainers have written a super-speed patch

Uhoh...

CMDR Vulkarius said:
Also unless I'm missing something, ltrace won't show anything unless I'm linking a shared object, whereas SQLite is built statically into the program here (could be another way to try it though).

Darn, yes I'm so used to nearly everything being dynamically linked that I forget that static linking breaks the magic of ltrace

CMDR Vulkarius said:
99% sure it's in the plan choice differences between versions

Ah phew, sounds like I don't need to eat my hat after all

(in that the compiler isn't the problem, I mean)
But wow, seems like you had a pretty weird and freaky ride there. Glad it's now effectively fixed!

CMDR Vulkarius · Jan 10, 2025

Even better, this particular rabbit hole actually has an end to it

The pointers from the SQLite peeps lead me to this query planner list - https://sqlite.org/queryplanner-ng.html#howtofix
Indexing and running ANALYZE not only doesn't help, it's actually HARMING the query plan since version 3.33.0

What's happening is all the indexed tables are now getting promoted above the json_each call, which is a tabled-valued function.
The planner is allowed to do that, reorder the SQL join.
The trouble is that putting json_each last means it's the most nested part of the recursion so it's getting called thousands of times with the same input.
Old query planners were staying closer to the join order in the query, where json_each isn't affected by anything to its right.

Reading below the line where the documentation literally says not to read if possible, the CROSS JOIN is what solves it.
In SQLite there's no actual join difference there, but it forces the nested iteration order and cures the problem.
Technically it's a confirmed regression if you have to use any hacks below that Stop Reading line, low priority though I reckon (offered to help with test case though).

Query all good on newer SQLite versions now

Iron Body · Jan 10, 2025

I can't resist telling my own experience with query planners ...

Never trust a planner with complex queries. Make your own plan with a number of simpler queries into temporary tables.
Delete the temporary tables after checking they contain what you expect.

There, I said it.

Mind you, I expect query planners have improved since the 20th century.

Nowski · Jan 11, 2025

CMDR Vulkarius said:
Reading below the line where the documentation literally says not to read if possible, the CROSS JOIN is what solves it.

Pretty funny, and a heartwarming ending to the tale

I honestly expected that thing in the forum comment about explicitly using MATERIALIZE to be the solution.. Various things about the 3.31.1 query plan looked better than the very different 3.47.0 one.

As someone wholly ignorant of SQLite internals, this has been a very interesting read, thanks!